FreeBuilder's Perspective12 min read

Caveman Mode and 6 Other Ways to Cut Your Claude Token Cost by 65-87%

7 token-wasting mistakes most developers make with Claude - and the exact fixes. From Caveman Mode to CLAUDE.md, context management to prompt structure.

Marcus Volsted

AI & Web Consultant · April 11, 2026

Caveman Mode and 6 Other Ways to Cut Your Claude Token Cost by 65-87%

Most developers using Claude are burning through tokens without realizing it.

Every "Certainly!", every "I'd be happy to help!", every step-by-step explanation you didn't ask for - that's not intelligence. That's your money evaporating.

I spend $1,500+/month on AI tools and API costs. Optimizing token usage isn't optional at that scale - it's survival. These are the 7 biggest token-wasting mistakes I see developers make, and the exact fix for each one.

Mistake 1: Letting Claude Talk Too Much

This is the biggest one. Claude's default mode is helpful, thorough, and verbose. Great for learning. Terrible for your wallet.

Here's a real example. I asked Claude to fix a React re-render bug. Normal response: 1,180 tokens. The same fix, same code quality, compressed: 159 tokens.

That's not a small difference. That's 87% of the response being filler you didn't need.

87%

React bug fix savings

83%

Auth middleware savings

84%

Database setup savings

65%

Average across all tasks

The Fix: Caveman Mode

Caveman Mode is an open-source skill created by Julius Brussee that strips all filler from Claude's responses. The philosophy is simple: "Why use many token when few token do trick."

Articles, pleasantries, hedging, unnecessary explanations - gone. What's left is the actual answer.

Install it in one line:

bash

claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman

How Caveman Mode Actually Works

It rewrites Claude's communication style through a system prompt that says: drop articles, filler, pleasantries, and hedging. Keep technical substance exact. Code, URLs, and commands stay untouched - only the prose compresses.

It has four intensity levels:

Lite - Removes filler, keeps professional grammar. Good starting point if caveman-speak feels too weird at first
Full - The default. Fragments, no articles, maximum compression. This is what I run daily
Ultra - Telegraphic style. Abbreviations. Extreme terseness. Use this for rapid-fire debugging sessions
Wenyan - Classical Chinese literary compression. Experimental but fascinating for extreme efficiency

You activate it by typing /caveman or just saying "talk like caveman" or "less tokens please." It stays on until you say "stop caveman" or "normal mode."

💡 Tip

Start with Full mode. You'll be surprised how readable it actually is when you're scanning code explanations. Your code stays pristine - only the surrounding text compresses. After a day or two, normal Claude responses start feeling bloated.

Why You Should Write Like This Too

Here's the part most people miss: Caveman Mode only compresses Claude's output. Your input tokens still cost the same.

If you're writing prompts like "Hey Claude, could you please take a look at this function and let me know if there's anything wrong with the way I'm handling the authentication flow?" - that's 30+ input tokens when "why does auth fail on line 23?" does the same job in 8.

Train yourself to write shorter prompts:

Before: "Can you help me understand why this component re-renders every time I change the input field?"
After: "Why does this re-render on every keystroke?"

Before: "I'd like you to review my database schema and suggest improvements for performance"
After: "Review schema. Suggest perf improvements."

The tokens you save on input add up fast - especially when you're sending dozens of messages per session.

Mistake 2: Not Using CLAUDE.md

Every time you start a new session without a CLAUDE.md file, you're re-explaining your project from scratch. Claude reads your entire prompt, generates a response based on zero project context, and you spend 3-4 messages just getting it up to speed.

Those messages cost tokens. Every single session.

The Fix

Create a CLAUDE.md file in your project root. Claude reads it automatically at the start of every session.

Include:

Tech stack - frameworks, languages, versions
Project structure - where things live
Coding conventions - naming, patterns, style rules
What NOT to do - common mistakes Claude makes in your codebase
Build/test commands - so Claude can verify its own work

markdown

# CLAUDE.md
- Stack: Next.js 16, TypeScript, Tailwind v4, Prisma
- All components in src/components/, pages in src/app/
- Use named exports, not default exports
- Never add comments to code unless logic is non-obvious
- Run "npm run typecheck" before considering any task done
- Tests live next to the file they test: foo.ts -> foo.test.ts
- API routes use zod validation on all inputs

🎯 Result

Zero re-explaining. Fewer correction rounds. Claude gets it right on the first try more often, which means fewer back-and-forth messages, which means fewer tokens.

Pro tip: Nest your CLAUDE.md files

You can have a root-level CLAUDE.md for the whole project AND sub-directory CLAUDE.md files for specific areas. Claude picks up whichever is closest to the file it's working on:

project/
  CLAUDE.md              # global rules
  src/api/CLAUDE.md      # API-specific conventions
  src/components/CLAUDE.md  # component patterns

Mistake 3: Pasting Entire Files

"Here's my 500-line file, the bug is somewhere in here."

Claude will read all 500 lines. Process all 500 lines. Reason about all 500 lines. You just paid for 500 lines of input tokens when the bug was on line 47.

The Fix

Be surgical. Paste only what's relevant:

The function that's breaking
The error message
The 10-20 lines of surrounding context

If you're not sure where the bug is, tell Claude to read the file itself using its tools rather than pasting it inline. Tool-based file reading is more token-efficient than dumping everything into the prompt.

💡 Tip

Get in the habit of saying "read src/auth/middleware.ts and find the bug" instead of copy-pasting the file contents. Claude's Read tool is smarter about only pulling what it needs.

Mistake 4: Asking Claude to Explain What You Already Know

"Explain this code to me" when you actually just want to know why line 23 throws an error.

Claude will give you a full walkthrough of every function, every import, every variable - because you asked it to "explain." That's potentially thousands of tokens of information you already had.

The Fix

Ask specific questions. The more precise your question, the shorter the answer, the fewer tokens burned.

Bad: "Explain this authentication middleware"
Good: "Why does line 23 throw a 403 when the token is valid?"

Bad: "How does this database query work?"
Good: "Will this query hit the index on user_id or trigger a full table scan?"

Bad: "Review this component"
Good: "Is there a memory leak in the useEffect cleanup on line 15?"

Specificity is the single best prompt habit for cost reduction. A vague question gets a broad answer. A precise question gets a precise answer. The token difference can be 10x.

Mistake 5: Ignoring /compact and /clear

Claude has a limited context window - think of it as short-term memory. Everything it reads, analyzes, or generates takes up space. When the window fills up:

1Output quality drops - Claude loses track of earlier instructions and starts contradicting itself
2You start fresh and re-explain everything from scratch, paying for all that context again

Most people just let the context window overflow and wonder why Claude suddenly "got dumb." It didn't get dumb. It ran out of room.

The Fix: /compact

The /compact command compresses your conversation history. It summarizes everything that happened into a fraction of the tokens while keeping the important context. Use it proactively:

After completing a major task within a session
When you notice Claude forgetting things you told it earlier
Every 20-30 messages as a habit

The Fix: /clear

The /clear command wipes the context entirely and starts fresh. Use it when:

You're switching to a completely different task
The conversation has gone so far off track that compacting won't save it
You want a clean slate with just your CLAUDE.md context

💡 Tip

The combo move: finish a task, run /compact to save context, then start the next task. If the next task is totally unrelated, use /clear instead - your CLAUDE.md will re-load automatically, so you lose nothing important.

Subagents: Parallel Context Windows

For big tasks, use subagents. Each subagent gets its own context window:

One researches the codebase
Another writes tests
Another updates the schema

They work in parallel without cluttering your main session. 12 seconds instead of 45 - and your primary context window stays clean.

Mistake 6: Re-prompting Instructions Every Session

"Always use TypeScript. Always use named exports. Never add comments. Use Tailwind not CSS modules..."

If you're typing this at the start of every session, you're paying for it every session.

The Fix

CLAUDE.md handles the basics (Mistake 2), but there's a second layer: Skills and slash commands.

Save your recurring workflows as reusable commands:

markdown

# .claude/commands/review.md
Review the code I've changed. Check for:
- Type safety issues
- Missing error handling at API boundaries
- Performance regressions
- Accessibility problems
Report only real issues, not style nitpicks.

Now instead of re-typing your review criteria every time, you type /review. One word replaces 50+ tokens of instruction.

Other commands I use daily:

/draft-post - my full content creation workflow
/publish - archives and logs a published post
/brainstorm - generates ideas from persona + pillars

Each one encodes complex, multi-step instructions into a single slash command. Zero repeated tokens.

Mistake 7: Using the Wrong Model for the Job

Not every task needs Opus. Not every task needs Sonnet.

A quick file rename? That's a Haiku job. A complex architectural refactor? That's Opus. Using Opus for everything is like driving a Ferrari to the grocery store - it works, but you're paying for horsepower you don't need.

The Fix

Match the model to the task:

Haiku - Simple edits, formatting, file operations, boilerplate generation. Cheapest and fastest
Sonnet - Day-to-day coding, bug fixes, feature implementation. The workhorse
Opus - Complex architecture decisions, multi-file refactors, nuanced problem-solving. The heavy hitter

ℹ️ Info

In Claude Code, you can switch models mid-conversation. Default to Sonnet for most work, drop to Haiku for simple tasks, escalate to Opus when you actually need deep reasoning. The cost difference between Haiku and Opus is roughly 60x per token.

The Full Stack: All 7 Fixes Together

Here's what a fully optimized Claude workflow looks like in practice:

1CLAUDE.md in every project - context set once, used forever
2Caveman Mode installed - 65%+ savings on every response
3Terse prompts - write like you text, not like you email. Short inputs = fewer input tokens
4Surgical questions - specific beats vague. 10x token difference
5/compact and /clear as habits - keep the context window clean, not overflowing
6Skills/commands for recurring workflows - never re-type instructions
7Right model for the job - Haiku for simple, Sonnet for standard, Opus for complex

Individually, each fix saves 10-30%. Stacked together, you're looking at 65-87% fewer tokens with the same (or better) output quality. That's not optimization. That's a fundamentally different cost structure.

One Last Thing

Token optimization isn't just about saving money. It's about getting better output.

A Claude session with a clean context window, clear instructions, and no filler produces higher quality code than a bloated session where Claude is drowning in context and guessing what you want.

Fewer tokens in = more focused reasoning = better results out.

The cheapest token is the one you never needed to send.

The 1-Page Claude Token Cheat Sheet

All 7 fixes on a single printable page. Pin it next to your monitor and stop wasting tokens from day one. Enter your email and I'll send it straight to your inbox.

Want help implementing this?

I help B2B companies implement AI solutions that actually move metrics, not science projects. If this guide resonated, let's talk about what it looks like for your business.

Get in touch