Running Claude as your AI assistant is great. Until you see the bill.
At $3 per million input tokens and $15 per million output tokens, every complex task costs real money.
OpenClaw solves this with model delegation. The premium model handles coordination and tool orchestration. Meanwhile, a free local model does the heavy lifting right on your hardware.
Why API-only?
Anthropic restricts automated access to Claude through consumer subscriptions. The terms are clear: bots, scripts, and agents must use the API.
Anthropic Consumer Terms state in Section 3:
"Except when you are accessing our Services via an Anthropic API Key or where we otherwise explicitly permit it, to access the Services through automated or non-human means, whether through a bot, script, or otherwise."
This is a legal constraint, not a technical one. If you want Claude powering your agents, you need API access.
The Problem
Claude excels at complex reasoning, multi-step planning, and tool orchestration. But most assistant tasks are simpler: writing configs, generating code, drafting documentation, summarizing logs.
These don't need Claude's full power. They just need good text generation.

The Architecture
I built a two-role system:
OPUS (Claude) — Coordinator
- Receives your requests
- Decides what needs to happen
- Spawns subagents for text generation tasks
- Executes tools: file operations, shell commands, API calls
- Validates results and reports back
GRANITE (IBM Granite 3.3 8B) — Generator
- Runs locally on Ollama
- Handles all text generation: Ansible playbooks, Terraform configs, Docker compose files, documentation
- No tool access. Pure text output.
- Zero cost per token
[IBM Granite 3.3 8B](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) is a compact, capable model available on [Ollama](https://ollama.com/library/granite3.3). It's designed for exactly this kind of work: instruction following and code generation without the overhead of a frontier model.
How It Works
Say you want to configure AWX with Authentik SSO.
- You ask: "Configure AWX with Authentik SSO."
- OPUS recognizes this as a config generation task.
- OPUS spawns a GRANITE subagent.
- GRANITE generates the Ansible playbook and Authentik provider config.
- OPUS reviews the output.
- OPUS writes files, runs ansible-playbook, verifies the setup.
- OPUS confirms success.
In OpenClaw, spawning a subagent looks like this:
sessions_spawn({
task: "Generate an Ansible playbook to configure AWX with Authentik SSO...",
label: "granite-awx-sso",
agentId: "granite"
})GRANITE runs in isolation. No tools. No network access. Just text generation. When it finishes, OPUS picks up the output and executes it.


The Benefits
Cost efficiency. Text generation tasks cost $0 instead of roughly $0.30 per task with Claude alone. Over hundreds of tasks, this adds up fast.
Data security. Credentials, hostnames, internal configs stay on your network. Nothing sensitive goes to external APIs for generation tasks.
Right model for the job. Claude thinks. GRANITE writes. Both do what they're best at.
The Numbers
Here's what the cost breakdown looks like in practice:
Without delegation, every task uses Claude tokens. A typical day of infrastructure automation could easily hit $5-10 in API costs.
With delegation, GRANITE handles the bulk of text generation locally. Claude only burns tokens on coordination. The same workload drops to under $1.
That's roughly a 10x reduction in costs.
In Practice
This setup runs my entire infrastructure automation workflow. GRANITE produces Ansible playbooks, Docker configurations, documentation, and log summaries. OPUS orchestrates everything: spawning tasks, writing files, running commands, keeping me informed.
Claude's token usage stays minimal. GRANITE runs on free local compute. Every task costs a fraction of what it would otherwise.
Model delegation isn't just about saving money. It's about using each model for what it does best. And keeping your data where it belongs.
