Stop API Bill Anxiety: How to Save 80% on OpenClaw Costs with Model Routing
The moment I saw that Anthropic bill last month, I stared at the number on my screen for several seconds in disbelief.
Three times higher than I expected. My finger hovered over the trackpad, wondering if I’d misread the decimal point. My OpenClaw assistant was only handling mundane tasks—replying to a few emails, organizing notes, occasionally writing some code snippets… How did it burn through that much?
I couldn’t sleep that night. Tossing and turning in bed, wondering: Where did I go wrong? Later, digging through the logs, I discovered the culprit—by default, every single request goes through the most expensive Claude Opus 4.6. Heartbeat checks, simple queries, file operations, all treated equally. And when sub-agents run in parallel, each one is “burning money.”
Honestly, I was pretty devastated.
Then I spent a weekend researching OpenClaw’s model routing features and discovered that through intelligent tiering, I could let “cheap stuff” handle simple work, reserving the “expensive guys” only for tasks that truly require deep thinking. A month later, my bill dropped to $68.
Understanding OpenClaw’s Cost Black Hole
Why Is the Default Configuration So Expensive?
Let’s look at some eye-opening numbers:
| Model | Input Price ($/MTok) | Output Price ($/MTok) | Use Case |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | Complex reasoning, long document analysis |
| Claude Sonnet 4.5 | $0.80 | $4.00 | General tasks, code generation |
| Claude Haiku 3.5 | $0.25 | $1.25 | Simple queries, quick responses |
| Llama 3 (Local) | $0 | $0 | Heartbeats, file operations, basic Q&A |
MTok = Million Tokens, 1 million tokens
Let’s do some quick math. Say you send 100 messages per day, averaging 500 tokens each:
If everything goes through Opus: 100 × 500 × $5 / 1,000,000 = $0.25/day, or $7.5/month.
Sounds reasonable, right?
The problem—this calculation is way too naive. OpenClaw’s system prompts alone take up 2k-4k tokens, plus tool calls, retry mechanisms… Actual consumption is 3-5x the bare calculation.
Hidden Cost Traps
Trap 1: Heartbeat Requests
A heartbeat check every 30 seconds means 2,880 times per day. Even when there’s no actual content, each heartbeat carries the full system prompt.
This is pure “token tax.”
Trap 2: Sub-agents
When running parallel tasks, each sub-agent uses the main model. Something as simple as “check my calendar” going through Opus? Ouch.
Trap 3: Retry Mechanisms
Automatic retries during network fluctuations—the tokens for failed requests are already consumed, but you get no results. Money spent, nothing gained.
Three-Layer Model Routing Strategy
Core Concept: Task Tiering
Not every request deserves the most expensive model.
We need a three-tier system:
┌─────────────────────────────────────────────┐
│ Layer 1: Local Models (Llama 3 / Qwen) │
│ → Heartbeats, file operations, simple Q&A, │
│ status checks │
│ → Cost: $0 │
├─────────────────────────────────────────────┤
│ Layer 2: Lightweight Cloud │
│ (Claude Haiku / GPT-4o-mini) │
│ → Daily conversations, email drafting, │
│ simple coding │
│ → Cost: $0.25/MTok │
├─────────────────────────────────────────────┤
│ Layer 3: Heavy Artillery │
│ (Claude Opus / GPT-4o) │
│ → Complex architecture design, deep │
│ analysis, creative writing │
│ → Cost: $5/MTok (but rarely used) │
└─────────────────────────────────────────────┘
Bottom line: let the right tool do the right job.
Configuration in Practice: OpenClaw + Ollama Local Models
Step 1: Install and Start Ollama
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download installer, then
ollama serve
# Pull suitable models
ollama pull llama3.2:latest # Lightweight, good for simple tasks
ollama pull qwen2.5:14b # Stronger, supports tool calling
Step 2: Configure OpenClaw to Use Local Models
Edit ~/.openclaw/openclaw.json:
{
"models": {
"defaults": {
"model": "ollama/qwen2.5:14b",
"fallbacks": [
"anthropic/claude-sonnet-4-5",
"anthropic/claude-opus-4-6"
]
},
"providers": {
"ollama": {
"type": "openai-compatible",
"baseUrl": "http://127.0.0.1:11434/v1",
"apiKey": "ollama"
}
}
}
}
Key points:
baseUrl: Ollama runs on port 11434 by defaultcontext window: OpenClaw needs at least 64k context, choose models accordinglytool calling: Not all local models support this—qwen2.5 or mistral-nemo recommended
Advanced Routing: Task-Based Intelligent Allocation
Using OpenRouter Auto Model:
{
"models": {
"defaults": {
"model": "openrouter/openrouter/auto",
"fallbacks": [
"anthropic/claude-sonnet-4-5"
]
}
}
}
OpenRouter’s Auto mode automatically selects the cheapest suitable model based on prompt complexity. Simple.
Custom Routing Rules (iblai-openclaw-router):
For finer control, use the open-source iblai-openclaw-router:
{
"routing": {
"enabled": true,
"tiers": {
"free": {
"models": ["ollama/llama3.2"],
"keywords": ["heartbeat", "status", "ping", "check"]
},
"cheap": {
"models": ["anthropic/claude-haiku-3-5"],
"maxCostPerRequest": 0.001
},
"standard": {
"models": ["anthropic/claude-sonnet-4-5"]
},
"premium": {
"models": ["anthropic/claude-opus-4-6"],
"keywords": ["architect", "design", "analyze deeply", "complex"]
}
}
}
}
Real-World Case: One Month Cost Comparison
Pre-Optimization Bill Breakdown
Typical monthly usage from a developer in the community:
| Usage | Requests | Est. Tokens | Model | Cost |
|---|---|---|---|---|
| Daily conversations | 800 | 400k | Opus 4.6 | $10.00 |
| Code assistance | 200 | 600k | Opus 4.6 | $18.00 |
| Heartbeat checks | 86,400 | 172M | Opus 4.6 | $860.00 |
| File operations | 150 | 75k | Opus 4.6 | $1.88 |
| Sub-agent tasks | 300 | 450k | Opus 4.6 | $13.50 |
| Total | $903.38 |
See that heartbeat check cost? $860. That’s the biggest culprit.
Post-Optimization Bill
After implementing three-tier routing:
| Usage | Requests | Est. Tokens | Model | Cost |
|---|---|---|---|---|
| Daily conversations | 800 | 400k | Sonnet 4.5 | $1.60 |
| Code assistance | 200 | 600k | Opus 4.6 | $18.00 |
| Heartbeat checks | 86,400 | 172M | Llama 3 (Local) | $0 |
| File operations | 150 | 75k | Llama 3 (Local) | $0 |
| Sub-agent tasks | 300 | 450k | Sonnet 4.5 | $1.80 |
| Total | $21.40 |
Of course, this is an extreme example—that developer’s heartbeat ratio was unusually high. Typical savings are usually 70-80%, depending on your specific usage patterns.
Expected Savings by Scenario
| Usage Scenario | Original Monthly Cost | After Optimization | Savings Rate |
|---|---|---|---|
| Light user (<100 msgs/day) | $50-80 | $15-25 | 70% |
| Moderate user (100-500 msgs/day) | $200-400 | $50-100 | 75% |
| Heavy user (>500 msgs/day + sub-agents) | $500-1000 | $100-250 | 80% |
Troubleshooting Guide: Common Issues and Solutions
Local Model Not Responding or Erroring
Symptoms:
Error: Connection refused
Or model returns empty content
Troubleshooting steps:
- Confirm Ollama is running:
ollama list - Check the port:
curl http://127.0.0.1:11434/api/tags - Confirm model is downloaded:
ollama pull qwen2.5:14b - Increase context window: Some models default to 4k, OpenClaw needs 64k+
Recommended high-value model combinations:
ollama pull qwen2.5:14b-instruct # Supports tool calling, Chinese-friendly
ollama pull mistral-nemo:latest # Balanced performance
ollama pull glm-4.7-flash # Lightweight, fast
Tool Calling Failures
Cause: Not all local models support function calling.
Solution:
- Use models explicitly supporting tool use (qwen2.5, mistral-nemo)
- Disable tool calling for specific models in config:
{
"models": {
"ollama/llama3.2": {
"supportsTools": false
}
}
}
Fallback Chain Configuration Errors
Common mistake:
// Wrong: When Anthropic is rate-limited, Sonnet and Opus may both be unavailable
"fallbacks": [
"anthropic/claude-sonnet-4-5",
"anthropic/claude-opus-4-6"
]
// Correct: Cross-provider fallback
"fallbacks": [
"anthropic/claude-sonnet-4-5",
"openai/gpt-4o",
"google/gemini-pro"
]
What If Quality Drops?
If local models can’t handle certain tasks:
- Gradual escalation: Local → Haiku → Sonnet → Opus
- Keyword triggers: Explicitly mark task complexity in prompts
- Human review: Require confirmation before executing important tasks
Summary and Action Checklist
So, to sum up the core points:
- Costs mainly come from heartbeats and simple queries, not the “big tasks” you might expect
- Local models can absolutely handle daily chores, don’t waste Opus quota on them
- Configure fallback chains across providers, avoiding single points of failure
- Start small: Route heartbeats through local models first, and you’ll see immediate savings
Three Things You Can Do This Week
- Install Ollama and pull a lightweight model (llama3.2 or qwen2.5:7b)
- Edit
~/.openclaw/openclaw.jsonto point the default model to local - Monitor your bill for a week, then fine-tune your routing strategy
Advanced Exploration
- Try iblai-openclaw-router for intelligent task tiering
- Combine with Prompt Caching to further reduce costs for repeated calls
- Monitor success rates and response times for each model, continuously optimizing your configuration
Have you optimized your OpenClaw bill? What strategy did you use? Share your experience in the comments, or ask questions about configuration issues—I’ll do my best to respond.
FAQ
Will OpenClaw model routing configuration affect response quality?
What hardware specs do I need for local models?
What's the strategy for fallback chain ordering?
How much can I typically save after optimization?
6 min read · Published on: Feb 26, 2026 · Modified on: Mar 3, 2026
Related Posts
AI Marketing Automation Guide: Build a One-Click Content Pipeline with OpenClaw
AI Marketing Automation Guide: Build a One-Click Content Pipeline with OpenClaw
AI Tools for Developers: OpenClaw + Claude Code 24/7 Auto Bug Fix
AI Tools for Developers: OpenClaw + Claude Code 24/7 Auto Bug Fix
Building Your Second Brain: OpenClaw & Obsidian/Notion Deep Memory Sync Guide

Comments
Sign in with GitHub to leave a comment