Caveman Mode and 6 Other Ways to Cut Your Claude Token Cost by 65-87%
7 token-wasting mistakes most developers make with Claude - and the exact fixes. From Caveman Mode to CLAUDE.md, context management to prompt structure.

AI & Web Consultant · April 11, 2026

Most developers using Claude are burning through tokens without realizing it.
Every "Certainly!", every "I'd be happy to help!", every step-by-step explanation you didn't ask for - that's not intelligence. That's your money evaporating.
I spend $1,500+/month on AI tools and API costs. Optimizing token usage isn't optional at that scale - it's survival. These are the 7 biggest token-wasting mistakes I see developers make, and the exact fix for each one.
Mistake 1: Letting Claude Talk Too Much
This is the biggest one. Claude's default mode is helpful, thorough, and verbose. Great for learning. Terrible for your wallet.
Here's a real example. I asked Claude to fix a React re-render bug. Normal response: 1,180 tokens. The same fix, same code quality, compressed: 159 tokens.
That's not a small difference. That's 87% of the response being filler you didn't need.
The Fix: Caveman Mode
Caveman Mode is an open-source skill created by Julius Brussee that strips all filler from Claude's responses. The philosophy is simple: "Why use many token when few token do trick."
Articles, pleasantries, hedging, unnecessary explanations - gone. What's left is the actual answer.
Install it in one line:
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@cavemanHow Caveman Mode Actually Works
It rewrites Claude's communication style through a system prompt that says: drop articles, filler, pleasantries, and hedging. Keep technical substance exact. Code, URLs, and commands stay untouched - only the prose compresses.
It has four intensity levels:
- Lite - Removes filler, keeps professional grammar. Good starting point if caveman-speak feels too weird at first
- Full - The default. Fragments, no articles, maximum compression. This is what I run daily
- Ultra - Telegraphic style. Abbreviations. Extreme terseness. Use this for rapid-fire debugging sessions
- Wenyan - Classical Chinese literary compression. Experimental but fascinating for extreme efficiency
You activate it by typing /caveman or just saying "talk like caveman" or "less tokens please." It stays on until you say "stop caveman" or "normal mode."
Why You Should Write Like This Too
Here's the part most people miss: Caveman Mode only compresses Claude's output. Your input tokens still cost the same.
If you're writing prompts like "Hey Claude, could you please take a look at this function and let me know if there's anything wrong with the way I'm handling the authentication flow?" - that's 30+ input tokens when "why does auth fail on line 23?" does the same job in 8.
Train yourself to write shorter prompts:
- Before: "Can you help me understand why this component re-renders every time I change the input field?"
- After: "Why does this re-render on every keystroke?"
- Before: "I'd like you to review my database schema and suggest improvements for performance"
- After: "Review schema. Suggest perf improvements."
The tokens you save on input add up fast - especially when you're sending dozens of messages per session.
Mistake 2: Not Using CLAUDE.md
Every time you start a new session without a CLAUDE.md file, you're re-explaining your project from scratch. Claude reads your entire prompt, generates a response based on zero project context, and you spend 3-4 messages just getting it up to speed.
Those messages cost tokens. Every single session.
The Fix
Create a CLAUDE.md file in your project root. Claude reads it automatically at the start of every session.
Include:
- Tech stack - frameworks, languages, versions
- Project structure - where things live
- Coding conventions - naming, patterns, style rules
- What NOT to do - common mistakes Claude makes in your codebase
- Build/test commands - so Claude can verify its own work
# CLAUDE.md
- Stack: Next.js 16, TypeScript, Tailwind v4, Prisma
- All components in src/components/, pages in src/app/
- Use named exports, not default exports
- Never add comments to code unless logic is non-obvious
- Run "npm run typecheck" before considering any task done
- Tests live next to the file they test: foo.ts -> foo.test.ts
- API routes use zod validation on all inputsPro tip: Nest your CLAUDE.md files
You can have a root-level CLAUDE.md for the whole project AND sub-directory CLAUDE.md files for specific areas. Claude picks up whichever is closest to the file it's working on:
project/
CLAUDE.md # global rules
src/api/CLAUDE.md # API-specific conventions
src/components/CLAUDE.md # component patternsMistake 3: Pasting Entire Files
"Here's my 500-line file, the bug is somewhere in here."
Claude will read all 500 lines. Process all 500 lines. Reason about all 500 lines. You just paid for 500 lines of input tokens when the bug was on line 47.
The Fix
Be surgical. Paste only what's relevant:
- The function that's breaking
- The error message
- The 10-20 lines of surrounding context
If you're not sure where the bug is, tell Claude to read the file itself using its tools rather than pasting it inline. Tool-based file reading is more token-efficient than dumping everything into the prompt.
Mistake 4: Asking Claude to Explain What You Already Know
"Explain this code to me" when you actually just want to know why line 23 throws an error.
Claude will give you a full walkthrough of every function, every import, every variable - because you asked it to "explain." That's potentially thousands of tokens of information you already had.
The Fix
Ask specific questions. The more precise your question, the shorter the answer, the fewer tokens burned.
- Bad: "Explain this authentication middleware"
- Good: "Why does line 23 throw a 403 when the token is valid?"
- Bad: "How does this database query work?"
- Good: "Will this query hit the index on user_id or trigger a full table scan?"
- Bad: "Review this component"
- Good: "Is there a memory leak in the useEffect cleanup on line 15?"
Specificity is the single best prompt habit for cost reduction. A vague question gets a broad answer. A precise question gets a precise answer. The token difference can be 10x.
Mistake 5: Ignoring /compact and /clear
Claude has a limited context window - think of it as short-term memory. Everything it reads, analyzes, or generates takes up space. When the window fills up:
- 1Output quality drops - Claude loses track of earlier instructions and starts contradicting itself
- 2You start fresh and re-explain everything from scratch, paying for all that context again
Most people just let the context window overflow and wonder why Claude suddenly "got dumb." It didn't get dumb. It ran out of room.
The Fix: /compact
The /compact command compresses your conversation history. It summarizes everything that happened into a fraction of the tokens while keeping the important context. Use it proactively:
- After completing a major task within a session
- When you notice Claude forgetting things you told it earlier
- Every 20-30 messages as a habit
The Fix: /clear
The /clear command wipes the context entirely and starts fresh. Use it when:
- You're switching to a completely different task
- The conversation has gone so far off track that compacting won't save it
- You want a clean slate with just your CLAUDE.md context
/compact to save context, then start the next task. If the next task is totally unrelated, use /clear instead - your CLAUDE.md will re-load automatically, so you lose nothing important.Subagents: Parallel Context Windows
For big tasks, use subagents. Each subagent gets its own context window:
- One researches the codebase
- Another writes tests
- Another updates the schema
They work in parallel without cluttering your main session. 12 seconds instead of 45 - and your primary context window stays clean.
Mistake 6: Re-prompting Instructions Every Session
"Always use TypeScript. Always use named exports. Never add comments. Use Tailwind not CSS modules..."
If you're typing this at the start of every session, you're paying for it every session.
The Fix
CLAUDE.md handles the basics (Mistake 2), but there's a second layer: Skills and slash commands.
Save your recurring workflows as reusable commands:
# .claude/commands/review.md
Review the code I've changed. Check for:
- Type safety issues
- Missing error handling at API boundaries
- Performance regressions
- Accessibility problems
Report only real issues, not style nitpicks.Now instead of re-typing your review criteria every time, you type /review. One word replaces 50+ tokens of instruction.
Other commands I use daily:
/draft-post- my full content creation workflow/publish- archives and logs a published post/brainstorm- generates ideas from persona + pillars
Each one encodes complex, multi-step instructions into a single slash command. Zero repeated tokens.
Mistake 7: Using the Wrong Model for the Job
Not every task needs Opus. Not every task needs Sonnet.
A quick file rename? That's a Haiku job. A complex architectural refactor? That's Opus. Using Opus for everything is like driving a Ferrari to the grocery store - it works, but you're paying for horsepower you don't need.
The Fix
Match the model to the task:
- Haiku - Simple edits, formatting, file operations, boilerplate generation. Cheapest and fastest
- Sonnet - Day-to-day coding, bug fixes, feature implementation. The workhorse
- Opus - Complex architecture decisions, multi-file refactors, nuanced problem-solving. The heavy hitter
The Full Stack: All 7 Fixes Together
Here's what a fully optimized Claude workflow looks like in practice:
- 1CLAUDE.md in every project - context set once, used forever
- 2Caveman Mode installed - 65%+ savings on every response
- 3Terse prompts - write like you text, not like you email. Short inputs = fewer input tokens
- 4Surgical questions - specific beats vague. 10x token difference
- 5/compact and /clear as habits - keep the context window clean, not overflowing
- 6Skills/commands for recurring workflows - never re-type instructions
- 7Right model for the job - Haiku for simple, Sonnet for standard, Opus for complex
Individually, each fix saves 10-30%. Stacked together, you're looking at 65-87% fewer tokens with the same (or better) output quality. That's not optimization. That's a fundamentally different cost structure.
One Last Thing
Token optimization isn't just about saving money. It's about getting better output.
A Claude session with a clean context window, clear instructions, and no filler produces higher quality code than a bloated session where Claude is drowning in context and guessing what you want.
Fewer tokens in = more focused reasoning = better results out.
The cheapest token is the one you never needed to send.
The 1-Page Claude Token Cheat Sheet
All 7 fixes on a single printable page. Pin it next to your monitor and stop wasting tokens from day one. Enter your email and I'll send it straight to your inbox.
Want help implementing this?
I help B2B companies implement AI solutions that actually move metrics — not science projects. If this guide resonated, let's talk about what it looks like for your business.
Get in touch