Model Delegation: How OpenClaw Cuts AI Costs 10x

Running Claude as your AI assistant is great. Until you see the bill.

At $3 per million input tokens and $15 per million output tokens, every complex task costs real money.

OpenClaw solves this with model delegation. The premium model handles coordination and tool orchestration. Meanwhile, a free local model does the heavy lifting right on your hardware.

Why API-only?

Anthropic restricts automated access to Claude through consumer subscriptions. The terms are clear: bots, scripts, and agents must use the API.

Anthropic Consumer Terms state in Section 3:

"Except when you are accessing our Services via an Anthropic API Key or where we otherwise explicitly permit it, to access the Services through automated or non-human means, whether through a bot, script, or otherwise."

This is a legal constraint, not a technical one. If you want Claude powering your agents, you need API access.

The Problem

Claude excels at complex reasoning, multi-step planning, and tool orchestration. But most assistant tasks are simpler: writing configs, generating code, drafting documentation, summarizing logs.

These don't need Claude's full power. They just need good text generation.

The Architecture

I built a two-role system:

OPUS (Claude) — Coordinator

Receives your requests
Decides what needs to happen
Spawns subagents for text generation tasks
Executes tools: file operations, shell commands, API calls
Validates results and reports back

GRANITE (IBM Granite 3.3 8B) — Generator

Runs locally on Ollama
Handles all text generation: Ansible playbooks, Terraform configs, Docker compose files, documentation
No tool access. Pure text output.
Zero cost per token

[IBM Granite 3.3 8B](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) is a compact, capable model available on [Ollama](https://ollama.com/library/granite3.3). It's designed for exactly this kind of work: instruction following and code generation without the overhead of a frontier model.

How It Works

Say you want to configure AWX with Authentik SSO.

You ask: "Configure AWX with Authentik SSO."
OPUS recognizes this as a config generation task.
OPUS spawns a GRANITE subagent.
GRANITE generates the Ansible playbook and Authentik provider config.
OPUS reviews the output.
OPUS writes files, runs ansible-playbook, verifies the setup.
OPUS confirms success.

In OpenClaw, spawning a subagent looks like this:

sessions_spawn({
  task: "Generate an Ansible playbook to configure AWX with Authentik SSO...",
  label: "granite-awx-sso",
  agentId: "granite"
})

GRANITE runs in isolation. No tools. No network access. Just text generation. When it finishes, OPUS picks up the output and executes it.

The Benefits

Cost efficiency. Text generation tasks cost $0 instead of roughly $0.30 per task with Claude alone. Over hundreds of tasks, this adds up fast.

Data security. Credentials, hostnames, internal configs stay on your network. Nothing sensitive goes to external APIs for generation tasks.

Right model for the job. Claude thinks. GRANITE writes. Both do what they're best at.

The Numbers

Here's what the cost breakdown looks like in practice:

Without delegation, every task uses Claude tokens. A typical day of infrastructure automation could easily hit $5-10 in API costs.

With delegation, GRANITE handles the bulk of text generation locally. Claude only burns tokens on coordination. The same workload drops to under $1.

That's roughly a 10x reduction in costs.

In Practice

This setup runs my entire infrastructure automation workflow. GRANITE produces Ansible playbooks, Docker configurations, documentation, and log summaries. OPUS orchestrates everything: spawning tasks, writing files, running commands, keeping me informed.

Claude's token usage stays minimal. GRANITE runs on free local compute. Every task costs a fraction of what it would otherwise.

Model delegation isn't just about saving money. It's about using each model for what it does best. And keeping your data where it belongs.