26 Token Optimization Techniques: Quick Reference
High-Impact Strategies (40%+ savings)
- Replace PDFs with Markdown – Convert PDFs to Google Docs, export as .md. Saves 85–90% vs raw PDF.
- Use Projects for Shared Files – Upload once, reference across multiple chats. Saves 80%.
- Batch Tasks Into One Message – Ask 3 things at once instead of 3 messages. Saves 56%.
- Trim Personal Context to <2K Words – Bloated context files waste 10% of every conversation. Saves 70%.
- Compress Intermediate Outputs – After analysis, summarize into bullet points for reference in follow-ups. Saves 67%.
- Batch Similar Queries with Caching – Ask all related questions about one system in one chat. Saves 87.5%.
Medium-Impact Strategies (25–40% savings)
- Right-Size Models – Use Haiku for simple tasks, Sonnet for standard work, Opus only for deep reasoning. Saves 50% when applied systematically.
- Write Short Prompts (<30 words) – Brief, clear prompts reduce re-read overhead. Saves 33%.
- Specify Output Format Upfront – "JSON table with columns X, Y, Z" prevents reformatting requests. Saves 60%.
- Show Your Thinking First – Ask Claude to self-critique in initial response, reducing revision cycles. Saves 40–50%.
- Specify Constraints Upfront – "Under 500 words," "3 bullet points," "1-paragraph summary" prevents scope creep. Saves 73%.
- Use Checkpoints Every 5–7 Messages – Ask "Are we on track?" in complex conversations to catch wrong paths early. Saves 83%.
- Edit Instead of Correcting – Click Edit on your message, fix it, regenerate. Don't stack "Actually, I meant…" messages.
- Use New Chats for Different Topics – One topic per chat. Separate chats avoid re-reading irrelevant context. Saves 40% in multi-topic conversations.
- Disable Tools by Default – Tools add 200–400 token overhead per exchange even when unused.
- Restart Conversations Every 15–20 Messages – Long conversations accumulate re-read overhead. Saves 55%.
- Search Before Asking – Use conversation search to find past solutions. Saves 67–75% if found.
- Crop Screenshots Tightly – Crop to only the relevant portion. Full screenshot = 1,300 tokens; tight crop = 50 tokens.
- Chain Tasks in One Message – "Analyze this data, then write a summary from your analysis" instead of separate messages. Saves 30%.
- Use "Assume You Know" References – After establishing context, reference it: "Assume you know the CONVERGE-01 trial from earlier." Saves 75%.
- Use Negative Constraints – "Explain without covering basics I already know" is clearer than restating what you know.
- Outline Mode Before Full Detail – Ask for pseudocode/outline first, expand only necessary sections. Saves 20–40%.
- Pre-Process Data Externally – Clean data before uploading (Excel, Python). Saves 60–75%.
- Build Conditional Templates – Create reusable templates with [IF: condition] sections for different use cases. Saves 40–50%.
- Project Status Summaries – At session end, ask Claude to write a status summary. Paste it next session instead of re-explaining. Saves 72%.
Implementation Roadmap
Week 1: Quick Wins (Save ~30%)
- Technique 7: Right-size models
- Technique 8: Write shorter prompts
- Technique 18: Crop screenshots
Week 2: Process Changes (Additional 20%)
- Technique 3: Batch tasks
- Technique 14: Separate chats for topics
- Technique 13: Edit instead of correcting
Week 3: Structural Setup (Additional 25%)
- Technique 1: PDF → Markdown
- Technique 2: Projects for shared files
- Technique 4: Trim personal context
Week 4: Advanced Optimization (Additional 15%)
- Technique 15: Tool management
- Technique 24: Prompt templates with conditions
- Technique 9: Restart long conversations
Expected total improvement: 70–80% token efficiency gain
The Core Principle
Token efficiency is a systems problem, not a single-query problem. Efficient workflows:
- Build around one Project per major work area (IPCSG research, technical analysis, civic policy)
- Use persistent templates and shared files across chats
- Create continuity with status summaries and checkpoints
- Batch related work together to leverage prompt caching
Individual tips help. But combining them into a system-level workflow is what really multiplies savings across months of work.
Quick Wins Summary
| Technique | Savings | Effort |
|---|---|---|
| Replace PDFs with .md | 85–90% | Low |
| Use Projects | 80% | Low |
| Batch tasks | 56% | Low |
| Right-size models | 50% | Medium |
| Trim context | 70% | Medium |
| Short prompts | 33% | Low |
| Compress outputs | 67% | Low |
| Checkpoints | 83% | Low |
| Batch with caching | 87.5% | Medium |
| Pre-process data | 60–75% | Low |
Start with the "Low Effort" column. You'll hit 50%+ savings in Week 1.
Techniques to Stop Hitting Claude's Limits: Details
Claude's token limits aren't arbitrary walls—they're guardrails that force discipline. Every token you waste on redundant uploads, verbose prompts, or context bloat is a token stolen from actual work. This article translates raw optimization techniques into a workflow that scales.
The Problem: How Users Burn Tokens
Most Claude users operate at 30–50% efficiency. A 200K token limit sounds generous until you realize:
A 10-page PDF = 15,000–30,000 tokens gone before you type anything
A 400-word prompt gets re-read 20+ times across a conversation
Three sequential messages force Claude to re-tokenize the entire history three times
A single bloated personal context file (20K words) loads into every session
Tools left enabled burn tokens on every exchange, even when unused
For teams, this compounds catastrophically. One poorly optimized workflow × 50 users × 20 chats/month = token hemorrhage that looks like a feature problem when it's actually a process problem.
Technique 1: Replace PDFs with Markdown via Google Docs
The Problem PDFs are opaque to token counting. A single page burns 1,500–3,000 tokens depending on layout complexity, images, and formatting. A 20-page technical document = 30,000–60,000 tokens before analysis begins.
The Solution
Paste PDF text into a Google Doc
Clean up formatting (remove headers, footers, duplicate spacing)
Download as .md
Upload the markdown file
Token Cost Comparison
PDF (20 pages): 30,000–60,000 tokens
Markdown equivalent: 3,000–5,000 tokens
Savings: 85–90%
Why It Works Markdown is plain text. Claude tokenizes it at ~0.25 tokens per word. PDFs include invisible rendering information, font metadata, and positioning data that all get tokenized. Google Docs' export strips that noise.
When to Use This
Technical reports, whitepapers, research papers
Legal documents, contracts, policy briefs
Any document longer than 3 pages
Documents with complex formatting or images
When Not To
Documents requiring exact visual layout (posters, forms with specific spacing)
Scanned PDFs (use OCR first, then convert)
Single-page quick references (just copy-paste text directly)
Technique 2: Right-Size the Model for the Task
The Problem Opus costs 5x more per token than Haiku and 3x more than Sonnet. Using Opus for summarization or simple coding is like hiring a surgeon to check your blood pressure.
Model Economics | Task | Right Choice | Why | |------|-------------|-----| | Summarize a document | Haiku | 90% accuracy, 1/5 cost | | Write a simple script | Sonnet | Handles most coding, 1/3 Opus cost | | Debug complex reasoning | Opus | Deep chains need depth | | Brainstorm ideas | Haiku | Ideation doesn't need reasoning depth | | Multi-step analysis | Opus | Benefit from extended reasoning | | Customer service reply | Haiku | Template matching, not reasoning |
Decision Tree
Does this task require multi-step reasoning across 5+ inference steps? → Opus
Does it need deep technical knowledge but straightforward logic? → Sonnet
Is it straightforward task execution? → Haiku
Token Budget Impact A team running 50 daily chats:
All Opus: 50 × 3,000 tokens/chat = 150,000 tokens (expensive baseline)
Right-sized mix (60% Haiku, 30% Sonnet, 10% Opus): 50 × 1,500 tokens avg = 75,000 tokens
Daily savings: 75,000 tokens (50% of budget)
Technique 3: Batch Tasks Into Single Messages
The Problem Every new message forces Claude to re-read the entire conversation history before responding. Three sequential messages = three full re-reads of context.
Example: The Inefficient Way
Message 1: "Can you summarize this report?"
[Claude responds, tokens consumed]
Message 2: "Now extract the key metrics"
[Claude re-reads entire conversation + new message]
[Claude responds, tokens consumed]
Message 3: "Format those metrics as a table"
[Claude re-reads entire conversation again]
[Claude responds, tokens consumed]
Token Cost: Each message re-reads full history. With a 20-message conversation, message 21 retokenizes all 20 previous exchanges.
The Efficient Way
Message 1: "Do three things:
1. Summarize this report in 3 sentences
2. Extract the top 5 metrics
3. Format those metrics as a table with columns: Metric, Value, Trend"
[Claude responds with all three outputs]
Token Savings
Inefficient (3 messages): 8,000 tokens (context re-read overhead included)
Efficient (1 message): 3,500 tokens
Savings: 56%
How to Batch Effectively
List all tasks upfront with numbers
Specify output format for each (table, list, paragraph, JSON)
Set constraints (word counts, detail level) per task
Use one message to capture all context needed
When Not to Batch
Tasks require Claude's output from task #1 to inform task #2
Second task is fundamentally different in scope
You need to iterate on one task before moving to the next
Technique 4: Edit Instead of Stacking Corrections
The Problem Users write a message, realize mid-reply they misspoke, and send a follow-up: "Actually, I meant…" This creates bad history that Claude must re-read forever.
Example: Poor Practice
Message 1: "Analyze this dataset with regression analysis"
[Claude responds]
Message 2: "Wait, I said regression but I meant clustering"
[Claude re-reads both messages, applies fix]
Message 3: "Also, use k-means specifically"
[Conversation now has 3 messages for what should be 1]
The Right Way
Click the Edit button on your original message
Fix the prompt
Click Regenerate
Result: Original bad message disappears. Conversation history stays clean. No token waste on corrections.
Token Impact
Stack of 3 messages with corrections: 5,000 tokens (includes overhead of re-reading bad context)
Single edited message: 2,000 tokens
Savings: 60%
Technique 5: Use New Chats for New Topics
The Problem One chat drifts across 4 different topics (analyzing a dataset, then drafting an email, then brainstorming ideas, then debugging code). Claude must re-read everything above before every response.
Example: Conversation Bloat
Messages 1-5: Analyze Q3 sales data
Messages 6-10: Draft investor email (unrelated)
Messages 11-15: Brainstorm product features (unrelated)
Messages 16-20: Debug Python script (unrelated)
Message 21: New question about the Python script
[Claude must re-read all 20 previous messages, 80% of which are irrelevant]
Token Cost: Message 21 tokenizes 20 previous exchanges even though only 5 are relevant.
The Right Way
New topic = new chat
One chat = one focused problem
Token Impact
Bloated single chat (4 topics, 20 messages): Each new message re-reads ~8,000 tokens of irrelevant context
Four separate chats (5 messages each): Each new message re-reads ~2,000 tokens of relevant context
Savings across 20 total messages: 40,000 tokens (60% of original)
Bonus: Organization becomes much easier. Your chat history is searchable and scannable.
Technique 6: Write Short, Clear Prompts (Under 30 Words)
The Problem A 400-word prompt gets re-read dozens of times across a conversation. Each follow-up question forces re-tokenization of the entire prompt.
Example: Inefficient Prompt
"I'm working on a customer support dashboard for our SaaS platform. We need to
display metrics like average response time, customer satisfaction scores, and
ticket volume trends. The interface should be mobile-responsive and include
filters for date range, department, and customer segment. We're using React
and want it to match our existing design system which uses Tailwind CSS. Can
you help me build this?"
[357 words]
Every follow-up question re-reads all 357 words.
Efficient Version
"Build a customer support dashboard in React: metrics (response time,
satisfaction, volume), filters (date, dept, segment), mobile-responsive, Tailwind CSS."
[27 words]
Claude asks clarifying questions if needed. You provide details only for what's unclear.
Token Cost Comparison
Long prompt + 10 follow-ups: Prompt gets re-read 10+ times = 3,000+ tokens of re-reads alone
Short prompt + 10 clarifications: Clarifications cost ~200 tokens each, total ~2,000
Savings: 33%
Structure for Short Prompts
Action verb (Build, Analyze, Compare, Draft)
Deliverable (React component, SQL query, essay outline)
Key constraints (3 bullets max)
Format (JSON, markdown table, code block)
When to Break This Rule
First message to a new chat (more context helps)
Highly specialized domains where brevity creates ambiguity
Chats where you've established context already
Technique 7: Use Projects to Share Files Across Chats
The Problem You upload the same document to 5 different chats. That document gets tokenized in full for each chat. A 10K-token document = 50K tokens burned unnecessarily.
Example: Inefficient Workflow
Chat 1: Upload quarterly_report.pdf → analyze revenue
[10,000 tokens to tokenize document]
Chat 2: Upload quarterly_report.pdf → analyze expenses
[10,000 tokens to tokenize same document again]
Chat 3: Upload quarterly_report.pdf → extract metrics
[10,000 tokens to tokenize same document again]
Total: 30,000 tokens for same document
Projects Solution
Create a Project called "Q3 Analysis"
Upload quarterly_report.pdf once to the project
Every chat in that project references the document automatically
Document tokenized once, referenced in every chat
Token Cost
Without Projects (5 chats with same document): 50,000 tokens
With Projects: 10,000 tokens (document tokenized once)
Savings: 80%
Bonus Features
Team collaboration (everyone in project sees same files)
Shared context (no redundant uploads)
Better organization (related chats grouped)
Prompt caching (reused prompts inside projects don't re-tokenize)
Project Structure Example
Project: "San Diego Transit Analysis"
├─ Files: SANDAG_2024_data.md, MTS_budget.csv, transit_report.pdf
└─ Chats:
├─ "Q1 ridership analysis"
├─ "Budget efficiency comparison"
├─ "Future capacity planning"
└─ "Funding mechanisms"
All chats reference the same files. No redundant uploads.
Technique 8: Disable Tools and Connectors When Not In Use
The Problem Tools consume tokens on every exchange, even when inactive. Web search, calculator, file operations—if enabled, Claude considers them on every response.
Token Cost of Enabled Tools Enabling 3 tools (web search, code execution, file creation) adds ~200–400 tokens overhead per exchange, even when unused.
20-message chat with tools enabled: 20 × 300 = 6,000 tokens overhead
Same chat with tools disabled: 0 tokens overhead
Savings: 6,000 tokens per 20-message chat
Best Practice
Disable all tools by default
Enable only the specific tool(s) needed for current task
Disable when task completes
Tools to Keep Disabled Most of the Time
Web search (enable only when asking current events)
Code execution (enable only during debugging/testing)
File creation (enable for artifact generation, disable for Q&A)
Connectors (enable only when accessing Gmail/Calendar/Drive)
Tools Worth Keeping On
Within Projects that specifically need them
During focused work sessions where they're used consistently
Technique 9: Restart Conversations Every 15–20 Messages
The Problem At message 25, Claude re-reads all 24 previous messages before responding. By message 50, the context window overhead becomes significant.
Context Re-Read Cost
Message 10: Re-read ~4,000 tokens of context
Message 20: Re-read ~8,000 tokens of context
Message 30: Re-read ~12,000 tokens of context
Message 50: Re-read ~20,000 tokens of context
Solution: Refresh Every 15–20 Messages
At ~message 15, summarize key points in a new message: "Summary: We've analyzed X, decided on Y, next step is Z"
Start a new chat with that summary as context
New chat begins fresh without re-reading all history
Token Impact
Single 50-message conversation: ~100,000 tokens (with re-read overhead)
Two 25-message conversations: ~60,000 tokens (less re-read overhead)
Three 17-message conversations: ~45,000 tokens (minimal re-read overhead)
Savings: 55%
When to Restart
Task fundamentally shifts direction
Conversation length approaches 30+ messages
New day/session (fresh start feels cleaner)
When Not To
You need full context from all prior messages for current task
You're iterating on something that needs complete history
Technique 10: Crop Screenshots to Only Relevant Portions
The Problem Users upload full 1000×1000 pixel screenshots when a 200×300 pixel crop would work. Full screenshots tokenize at ~1,300 tokens; crops can drop below 100.
Token Cost of Screenshots
Full screenshot (1000×1000): ~1,300 tokens
Medium crop (400×400): ~200 tokens
Tight crop (200×200): ~50 tokens
Potential savings: 96%
Example: The Inefficient Way User pastes full desktop screenshot showing:
Entire taskbar
Application menu
Status bar
And the actual error dialog in bottom right
Claude tokenizes all of it.
The Efficient Way Crop to just the error dialog:
[Cropped to 250×150 pixels]
Claude gets the information without the noise.
Cropping Checklist
Remove any UI chrome (taskbars, menus) unless relevant
Remove whitespace margins
Crop to the minimal bounding box that includes the issue
Keep just enough context for understanding (one surrounding line/button)
Tools
Windows: Snip & Sketch (Win+Shift+S)
Mac: Cmd+Shift+4 (drag to select area)
Linux: Flameshot or built-in tool
Online: Snipping tools in browser
Technique 11: Build and Reuse Prompt Templates
The Problem Users rewrite similar prompts from scratch repeatedly. Each rewrite is slightly different, prevents caching, and burns mental energy.
Example: Inefficient Rewriting
Chat 1: "Write a technical analysis of the MQ-9B SeaGuardian focusing on
operational range, sensor capabilities, and integration with naval systems."
Chat 2 (weeks later): "Can you analyze the GA-ASI Gambit system? I want to
understand its operational capabilities, sensor suite, and how it fits into
the broader defense architecture."
Chat 3 (another week): "Technical overview of the V-22 Osprey: what it does,
what sensors it has, and how it works with other military systems."
Same structure, different words each time. Prevents caching.
Prompt Template Approach Create a template in a document:
# Technical System Analysis Template
Analyze [SYSTEM_NAME] and cover:
1. Operational range and endurance
2. Sensor suite and detection capabilities
3. Integration with broader force architecture
4. Notable operational history or incidents
5. Key limitations or known issues
Format as: summary section + detailed technical breakdown +
comparison table with similar systems.
Reuse for every similar analysis:
Chat 1: Analyze MQ-9B SeaGuardian [use template]
Chat 2: Analyze GA-ASI Gambit [use template]
Chat 3: Analyze V-22 Osprey [use template]
Token Benefit Prompt caching (available in Projects) means repeated prompts aren't fully re-tokenized.
Manual rewriting each time: Each prompt re-tokenized in full
Template reuse in Projects: First use tokenizes template, subsequent uses get cached hit (90% cost reduction)
Creating Your Prompt Library
Identify 5–10 recurring tasks (analysis, drafting, coding, summarization)
Write a template for each with [VARIABLE] placeholders
Store templates in a Project-level document
Reuse same template structure across similar tasks
Template Examples
Technical analysis (systems, weapons, platforms)
Policy briefs (problem, current approach, alternatives, recommendation)
Code review (architecture, security, performance, maintainability)
Content drafting (outline, research questions, audience, tone)
Technique 12: Keep Personal Context Under 2,000 Words
The Problem A 20,000-word personal context file loads into every single conversation. That's 20K tokens of overhead before you type your first question.
Real Impact A 20K-word context file in a 200K limit:
10% of your token budget consumed by context alone
Every conversation starts 20K tokens in the hole
A 1-hour work session might be 50% context + 50% actual work
Example: Bloated Context
[USER PROFILE: 20,000 words covering]
- Entire work history (every job description)
- Complete family tree and relationships
- Full list of 50+ projects and their outcomes
- Every skill and certification
- All medical history and preferences
- Complete reading list and book summaries
- Full financial situation
- Detailed hobby list
[END: 20,000 tokens burned before starting]
The Trimmed Version
[USER PROFILE: 1,500 words covering]
- Current role: Retired Senior Engineer, radar systems
- Key expertise: Signal processing, C4ISR, AMASS
- Current projects: IPCSG advocacy, technical writing
- Key context for Claude: Prostate cancer patient-advocate,
uses pseudonym "Pseudo Publius" for civic policy work
- Preferences: HTML for newsletters, Markdown for analysis
[END: 1,500 tokens used for genuine context]
What to Keep in Context (1,500 words max)
Current professional role
3–5 core skills Claude needs to know about
1–2 active projects
Key preferences (format, tone, communication style)
Any ongoing work Claude should reference
What to Cut
Complete work history (mention only current role)
Family relationships unless directly relevant
Completed projects (list only active ones)
Medical details beyond "patient advocate in X field"
Reading lists, hobby catalogs, exhaustive skill inventories
Historical context that isn't actively shaping current work
How to Structure Trimmed Context
# Claude Context (Keep Under 2,000 Words)
## Professional
- Role: Retired radar systems engineer, 20+ years
- Current focus: Technical writing, IPCSG patient advocacy
- Key expertise: Signal processing, SAR/GMTI, C4ISR systems
## Active Projects
1. IPCSG newsletter (prostate cancer research translation)
2. Naval Institute-style technical analysis (defense systems)
3. San Diego civic policy research (transit, water, governance)
## Preferences for Claude
- HTML output for IPCSG content (avoid .docx)
- Markdown for technical analysis
- Cite sources for health/policy content
- Flag when content needs fact-checking
## Key Context
- Patient with 11+ years prostate cancer history
- Uses "Pseudo Publius" pseudonym for civic policy writing
- Lives in San Diego, familiar with local transit/healthcare systems
- Enrolled in CONVERGE-01 actinium-225 PSMA trial at UCSD
Token Savings
20K context file: 20,000 tokens per conversation
2K context file: 2,000 tokens per conversation
10 conversations/week: 180,000 token savings per week
Monthly savings: 720,000 tokens (enough for 3-4 complex analysis projects)
13. Leverage Conversation Search Before Asking
The Problem You ask Claude a question you've already solved in a previous chat. Claude re-answers from scratch, consuming tokens for work already done.
Solution Use the conversation search tool to find past relevant chats before messaging Claude. If you find the answer, you're done (zero tokens). If not, you have context for a more targeted question.
Example Instead of: "How do I configure AMASS for multi-sensor fusion?" Search first. If found in past chat, copy the answer. If not found, ask: "I've searched my past work—it's not there. Here's what I tried last time [specific detail]. What's the next step?"
Token Cost
- Re-asking and re-answering: 2,000–3,000 tokens
- Searching + targeted follow-up: 500 tokens
- Savings: 67–75%
14. Use Structured Output Formats to Reduce Back-and-Forth
The Problem You ask a question, Claude gives prose, you ask for it in table format, Claude reformats. Two exchanges for one deliverable.
Solution Specify the exact output format upfront: "JSON with keys: name, value, unit" or "Markdown table: columns are X, Y, Z" or "CSV format" or "Numbered list with 1-sentence descriptions."
Example
-
Inefficient: "List the key features of the MQ-9B"
- Claude responds with prose paragraph
- You: "Can you make that a table?"
- Claude reformats (2 exchanges, 4,000+ tokens)
-
Efficient: "List MQ-9B key features as a markdown table with columns: Feature, Specification, Operational Impact"
- Claude responds with table in one go (1 exchange, 1,500 tokens)
Token Savings: 60%
15. Use Claude's "Drafts" or Internal Reasoning to Reduce Revisions
The Problem You ask Claude to write something, it's close but needs tweaks, you ask for revision, Claude rewrites. One task becomes three messages.
Solution In the initial prompt, ask Claude to "show your thinking first" or "provide a draft + notes on what could improve it." Claude self-critiques, reducing revision cycles.
Example
-
Inefficient: "Write a technical brief on the Constellation-class frigate cancellation"
- Claude writes
- You: "It's good but needs more detail on cost overruns"
- Claude revises (2 exchanges minimum)
-
Efficient: "Write a technical brief on the Constellation-class frigate cancellation. Include: executive summary, cost breakdown, timeline of delays, political context. Flag any sections that feel weak or incomplete."
- Claude writes with self-critique built in
- You get a more complete product on first try (1 exchange)
Token Savings: 40–50% (fewer revision cycles)
16. Reuse Outputs as Inputs (Chaining Without Re-Prompting)
The Problem You ask Claude to analyze data, then ask it to write a summary of that analysis. Claude re-reads both the original data and its analysis.
Solution When Claude produces output you'll use as input for another task, say so upfront: "Analyze this data, then use your analysis to draft a one-paragraph summary."
This chains tasks in a single message, avoiding re-reads.
Example
-
Inefficient:
- Message 1: "Analyze Q3 sales by region" (Claude analyzes)
- Message 2: "Summarize that analysis for a board memo" (Claude re-reads data + analysis)
- (2,500 + 2,500 = 5,000 tokens)
-
Efficient:
- Message 1: "Analyze Q3 sales by region, then summarize findings in one paragraph for a board memo"
- (3,500 tokens, single pass)
Token Savings: 30%
17. Specify Constraint Limits Upfront to Avoid Scope Creep
The Problem You ask for "an analysis," Claude writes 2,000 words because no constraint was given. You then ask "can you make it shorter?" Claude re-writes. Wasted tokens.
Solution Always specify: word count, depth level, or section count upfront.
Examples
- "Summarize in 3 sentences"
- "Brief analysis (under 500 words)"
- "Outline only—no prose, just 5 bullet points per section"
- "Executive summary format: 1 page max"
Token Impact
- Unconstrained ask: 3,500 tokens → You request trim → 2,000 tokens for re-work (5,500 total)
- Constrained ask: 1,500 tokens (right size on first try)
- Savings: 73%
18. Cache Repeated Context by Using "Assume You Know" Statements
The Problem Every time you chat, you re-explain your domain, your current project, or your constraints.
Solution Once you've established context in a chat, use "assume you know" statements to avoid re-explaining:
- "Assume you know the San Diego MTS budget structure from our earlier discussion"
- "Assume you're familiar with the CONVERGE-01 trial protocol"
- "Assume you know the PIRAN radiation belt software context"
This signals Claude to reference prior exchanges without restating everything.
Token Cost
- Restating context every time: 800+ tokens per message
- Using "assume" reference: 200 tokens per message
- Savings: 75%
19. Use Negative Constraints (What NOT to Include)
The Problem Specifying what you want is harder than specifying what you don't. "Don't explain basic concepts I already know" saves tokens better than "explain advanced concepts."
Solution Frame prompts with what to exclude:
- "Analyze this without explaining what a PSMA scan is"
- "Write the technical section without introductory material"
- "List only the novel findings—skip anything in standard literature"
Example
- Inefficient: "Explain the latest prostate cancer biomarkers" (Claude might explain what biomarkers are, burning tokens on known info)
- Efficient: "Explain novel prostate cancer biomarkers, assuming I know what biomarkers are and how standard testing works"
Token Savings: 20–30%
20. Compress Intermediate Outputs via Summarization Prompts
The Problem You ask Claude to do a deep analysis (3,000 tokens), then ask questions about it. Claude must re-read the full analysis for each question.
Solution After the analysis, immediately ask Claude to produce a "compressed summary for reference." You then use the summary for follow-ups, not the full analysis.
Example
- Message 1: "Analyze the 50-page NTSB docket on the LaGuardia collision" (3,000 tokens)
- Message 2: "Compress that into a 10-point summary I can reference for follow-ups" (500 tokens)
- Messages 3+: Ask questions referencing the summary, not the original analysis (saves 2,000+ tokens per follow-up)
Token Impact
- With compression: Original (3,000) + summary (500) + 5 follow-ups using summary (2,500) = 6,000 total
- Without compression: Original (3,000) + 5 follow-ups re-reading full analysis (15,000) = 18,000 total
- Savings: 67%
21. Use Pseudocode or Outline Mode for Complex Tasks
The Problem You ask Claude to solve a complex problem in full detail. It writes long explanations. You ask for just the outline. It re-writes.
Solution Ask for "pseudocode" or "outline mode" first, then expand only sections you need.
Example
- Inefficient: "Help me design a system for analyzing satellite megaconstellation fragmentation" (Claude writes 2,000-word design doc)
- Efficient:
- Message 1: "Outline only: system architecture for analyzing satellite megaconstellation fragmentation" (Claude: 300-word outline)
- Message 2: "Expand section 3 (data pipeline) to full technical detail"
- (1,000 + 1,500 = 2,500 vs 2,000 + potential revisions)
Token Savings: 20–40% (you pay only for sections you need)
22. Pre-Process Data Externally Before Uploading
The Problem You upload raw messy data (10K tokens), Claude cleans it, then you ask questions. Claude must re-read the messy + cleaned data.
Solution Clean/process data before uploading. Use a spreadsheet tool, Python script, or other lightweight processing first.
Example
- Inefficient: Upload 500-row CSV with duplicates, formatting issues, irrelevant columns (8,000 tokens) → Claude cleans and analyzes
- Efficient: Clean in Excel/Python locally (2 min, no tokens) → Upload cleaned 200-row CSV (2,000 tokens) → Claude analyzes
Token Savings: 60–75%
23. Use Checkpoints: "Are We On Track?" Mid-Conversation
The Problem You work through a complex analysis with Claude, go down the wrong path for 10 messages, then realize the approach is wrong. All 10 messages must be re-read going forward.
Solution Every 5–7 messages in complex tasks, insert a checkpoint: "Summarize progress so far and confirm we're on the right track before continuing."
If wrong track, you catch it early. If right track, you've created a compressed summary for future reference.
Token Cost
- Wrong path after 10 messages: Wasted 6,000 tokens, plus future re-reads (12,000 total over conversation)
- Checkpoint at message 5: Catch early, save 10 messages of wasted work (10,000 tokens)
- Savings: 83%
24. Leverage Templates with Conditional Sections
The Problem (Similar to #11, but more sophisticated) You have a template, but different uses require different sections. You still re-write parts.
Solution Build templates with conditional markers. Example:
# Technical Analysis Template
## Executive Summary (always)
[1 paragraph]
## [IF: System is military] Operational History
[relevant section]
## [IF: System has sensors] Sensor Capabilities
[relevant section]
## Key Metrics (always)
[data table]
## [IF: System is controversial] Safety/Incident History
[relevant section]
When reusing, you fill only the sections relevant to the specific system.
Token Benefit
- Manual rewriting each time: 100% re-tokenization
- Template with conditionals: Reusable frame (cached) + conditional sections only
- Savings: 40–50% on repeated similar analyses
25. Use "Status Check" Outputs for Ongoing Projects
The Problem You're working on a multi-week project. Each new chat, you brief Claude on what's been done. That briefing is always ~1,000 tokens.
Solution At the end of each session, ask Claude to generate a "project status summary" (500 words). Start next chat by pasting that summary instead of re-explaining.
Example After session 1 on IPCSG newsletter research:
- You: "Create a 300-word status summary for my next session: what we've covered, what's pending, open questions"
- Claude: [Status summary] (800 tokens)
Next session:
- You: "Here's the status from last session: [paste]. Continue with the next section on ADT cardiovascular risk"
- Claude: (Uses summary, no re-explanation needed)
Token Savings
- Re-explaining each session: 1,000 tokens/session × 10 sessions = 10,000 tokens
- Status summary approach: 800 + 200×10 = 2,800 tokens
- Savings: 72%
26. Batch Similar Queries to Use Prompt Caching
The Problem You ask 10 different questions about the same system (MQ-9B). Each question re-reads the full context.
Solution In Projects, ask all related questions about the same system in one session before moving to a new system. Prompt caching means the context gets tokenized once, reused for all questions.
Example
- Chat 1: "Answer all questions about MQ-9B SeaGuardian" + [list 10 questions]
- Questions about same context leverage caching
- Chat 2 (different day, same project): Ask 10 questions about Gambit CCA
- New context, but again leveraging caching within session
Token Benefit
- Separate chats for each question: 10 questions × 2,000 tokens = 20,000 tokens
- Batched in one chat with caching: 2,000 (context) + 500 (questions) = 2,500 tokens
- Savings: 87.5%
Summary: All 26 Techniques by Impact
Highest Impact (40%+ savings each)
- #1: Replace PDFs with markdown (85–90%)
- #7: Use Projects to avoid redundant uploads (80%)
- #3: Batch tasks (56%)
- #12: Trim personal context (70% when compounded)
- #20: Compress intermediate outputs (67%)
- #26: Batch similar queries with caching (87.5%)
High Impact (25–40% savings)
- #2: Right-size models (50% when applied systematically)
- #6: Short prompts (33%)
- #10: Crop screenshots (96% but narrow use case)
- #15: Show thinking first (40–50%)
- #17: Specify constraints upfront (73%)
- #23: Checkpoints (83%)
Medium Impact (15–25% savings)
- #4: Edit instead of stacking (60% but single-message impact)
- #5: New chats for new topics (40% for multi-topic conversations)
- #8: Disable tools (varies by tool usage)
- #9: Restart conversations (55% but only if you hit 50+ messages)
- #13: Search before asking (67% but only if found)
- #14: Specify output format (60% but narrow use case)
- #16: Chain tasks (30%)
- #18: "Assume you know" statements (75% but only after context established)
- #19: Negative constraints (20–30%)
- #21: Pseudocode mode (20–40%)
- #22: Pre-process data (60–75% but narrow use case)
- #24: Conditional templates (40–50%)
- #25: Status summaries (72%)
The Strategic Layer: What Most People Miss
Beyond these 26 techniques, there's one meta-insight:
Token efficiency is a systems problem, not a tips-and-tricks problem.
Most users treat Claude like a search engine: ask question, get answer, move on. That model inherently wastes tokens because there's no continuity.
Efficient users treat Claude like a long-term collaborator:
- One Project per major body of work (IPCSG research, Naval analysis, San Diego civic work)
- Persistent templates, reusable context, shared files
- Conversations that build on each other (status summaries, checkpoints)
- Clear handoffs between sessions (summary → next session → summary)
When you operate at the "system" level instead of the "single query" level, all 26 techniques compound. You're not just saving tokens on individual exchanges—you're building workflows that stay efficient across months.
That's the real win.
Integrated Workflow: Putting It All Together
Here's how these 12 techniques work together in practice:
Scenario: Research and Write a Technical Analysis
The Bloated Approach
1. Upload raw 15-page PDF (30,000 tokens)
2. Write 300-word prompt with every detail (prompt re-read 15+ times)
3. Send message → realize you need more info → "Actually, I meant…"
(correction stacking)
4. Ask 3 follow-ups in separate messages (context re-read x3)
5. Use Opus for summarization (wrong model)
6. Keep web search enabled (unused, 300 token overhead)
7. Two weeks later, use same PDF in new chat (30,000 tokens again)
8. Maintain 35-message conversation (10,000 tokens re-read overhead)
Total waste: ~95,000 tokens
The Optimized Approach
1. Project: "Technical Analysis" (upload PDF once as .md)
[3,000 tokens vs 30,000]
2. Short prompt (25 words, edited before sending)
"Analyze MQ-9B SeaGuardian: range, sensors, naval integration,
format as summary + technical breakdown + comparison table"
[Prompt re-read cost: minimal]
3. Batch all questions into one message, use Sonnet (not Opus)
[1/3 the cost, 90% as good]
4. Disable web search and tools (enable only if needed)
[No overhead]
5. Reuse prompt template for future system analyses
[Prompt caching reduces repeat cost 90%]
6. Keep conversation to 18 messages, then restart
[Minimal re-read overhead]
Total usage: ~10,000 tokens
Savings: ~85,000 tokens (90% reduction)
Scenario: Customer Support Email + Dataset Analysis in One Session
Wrong: One chat with both tasks, tools enabled, Opus for both
Right:
Chat 1: "Draft customer support email"
- Task: Simple templating
- Model: Haiku
- Tools: Off
- Expected: 1-2 messages
Chat 2: In same Project, "Analyze Q3 customer data"
- Task: Statistical analysis
- Model: Sonnet
- Tools: Off (already have data)
- Expected: 3-4 messages
[Both reference same Project files, no redundant uploads]
[Different tasks, different chats, minimal re-reading]
The Economics: Real Savings
For Individual Users
Using all 12 techniques: ~60–70% token efficiency improvement
A 200K limit effectively becomes ~300K in actual work capacity
Cost savings: If paying per token, 30–40% reduction in bills
For Teams (10 users)
Typical: 500 chats/month, 50M tokens burned
Optimized: 500 chats/month, 15M tokens used
Monthly savings: 35M tokens
Billable value: Equivalent to ~$5,000–$10,000/month in unused capacity recovered
For Enterprises
Bloated workflows lead to:
Team members buying extra credits (hidden costs)
Unnecessary token quota expansions
Perceived "slowness" (actually just inefficiency)
Optimized workflow means:
Planned budgets actually cover work
Clear ROI on Claude investment
Scalability without proportional cost increase
Implementation: Start Here
You don't need to adopt all 12 techniques simultaneously. Phase them in:
Week 1: Quick Wins (Saves ~30%)
Technique 2: Right-size models (Haiku for simple tasks)
Technique 6: Write shorter prompts
Technique 10: Crop screenshots
Week 2: Process Changes (Saves additional 20%)
Technique 3: Batch tasks into single messages
Technique 5: Use separate chats for different topics
Technique 4: Edit instead of stacking corrections
Week 3: Structural Optimization (Saves additional 25%)
Technique 1: Replace PDFs with markdown
Technique 7: Move files to Projects
Technique 12: Trim personal context
Week 4: Advanced Optimization (Saves additional 15%)
Technique 8: Disable tools by default
Technique 11: Build prompt templates
Technique 9: Restart conversations every 15–20 messages
Expected Result After 4 Weeks: 70–80% improvement in token efficiency
The Mindset Shift
Token optimization isn't about deprivation—it's about clarity. When you're forced to communicate concisely, write better prompts, and focus on one task at a time, you get better results and use fewer tokens.
Every inefficient workflow pattern masks itself as "flexibility" or "exploratory thinking." In reality, it's just waste.
The 12 techniques above are the proven guardrails. Use them, and you'll never feel constrained by Claude's limits again. The constraint becomes a feature: it forces you to think like an engineer, not just an experimenter.
Your next Claude session will be 3x more productive and cost 70% less.
No comments:
Post a Comment