#71 | Why LONG AI sessions get worse

TL;DR: Claude’s outputs get weaker as sessions grow longer. So does the invoice. The problem isn’t the prompts.

👋 Happy Labor Day,

There is a pattern in how operators talk about Claude & Co after six months of regular use.

And it is not the beginners still amazed by autocomplete.

More of the ones actually building with it. Funnily, they lower their voice a little when they say it, as if the admission embarrasses them.

The outputs were better early on.

Ok, they are not wrong. Hell not.

When outputs start to drift, the natural assumption is that the prompts need work.

More precision, definitely better context, and probably sharper instructions too.

Hours are spent revising what goes in. The real weight was already in the room before the message arrived.

This is what context rot is. Once you understand how it works, a lot of things make sense.

Degraded outputs. Unexpected API costs. The gap between what you know Claude can do and what it delivers by the end of a long session.

In this edition: You’ll understand why longer Claude or AI sessions produce weaker outputs and higher costs through the same mechanism, and what to do about it.

Key takeaway: The session that feels most productive is often the one quietly degrading your results and inflating your bill.

The AI Learning Guy newsletter 🤖 🧠💡

AI learning hacks and mega prompts delivered to your inbox.

What Claude actually carries into every response

Claude re-reads the full conversation on every response.

Not just your latest message. Every exchange from the beginning of the session is re-entered into the context window.

Tool outputs, file reads, and earlier drafts accumulate there.

Think about what a working session actually looks like.

You open Claude with a task. Back and forth for an hour, maybe two. By message twenty, Claude is reasoning inside a room that contains every previous exchange.

Its own responses, of course, included. That’s the weight in the room before you ask anything new.

A one-hour session might carry 15,000 to 30,000 tokens into context before you type a single new word. That is roughly the length of a short novel chapter.

This overhead, unfortunately, doesn’t announce itself. The session feels normal, and the responses look fine.

Nope, the degradation is gradual. And it’s easy to attribute it to a bad prompt or a topic Claude isn’t really good at (yet).

For operators running structured workflows, content pipelines, or multi-step automations, this effect compounds.

The automation runs sessions longer than a human-directed one would. By the time the final output is generated, the context has been carrying dead weight for most of the run.

Many of us have simply run out of credits by then, too. Wait until 6 pm!

Ok, to stay with Caude. CLAUDE.md is the project memory file that loads automatically in Claude Code and Cowork.

This file can add around 5,000 tokens per turn before any task begins. In a long session, that overhead is incurred on every message.

On top of that, you will use (a lot of) tokens for any files you upload, with PDFs being the worst.

The session that costs the most performs the worst

Here is the part that tends to land differently once you actually understand it. You stay in a session because it’s going well.

Branching across tools, cycling through drafts, building something over several hours.

That session is generating your highest invoice and, past a certain point, your weakest output. Both.

Token count determines your API invoice. Every token in context is billed on every call you make. A session that carries 30,000 tokens charges for all 30,000 tokens on every single message.

Two hours of work doesn’t just produce more messages. Each one costs more than the last.

Jenny from Build to Launch documented what this looks like in practice. One long automation session, branching across multiple tools and several drafts. Invoice: $1,600.

What came from the session’s later stages needed significant rework. The second cost, rework, never shows up on the invoice.

The sessions that feel most productive are often the ones generating the weakest outputs and the largest invoices.

The practical cost of this goes beyond the invoice. Every session that degrades produces outputs that need more rework.

What you save on prompting gets spent on editing. That efficiency gain you expected turns into a rework loop you didn’t plan for.

What the evidence actually shows

Researchers at Microsoft published a peer-reviewed study on context compression in 2024.

Their LLMLingua tool achieved 20x compression with under 2% performance degradation. The original context, condensed to 5% of its size, produced nearly identical results.

Read that again slowly. Twenty times smaller. Two percent worse.

The assumption that a longer, fuller context makes Claude reason better turns out to be costly and incorrect.

In March 2026, an Anthropic caching bug made the cost mechanics visible in a specific way. A configuration error inflated context window usage by 2,206 times for affected accounts.

Several GitHub issue reports documented the scope before Anthropic resolved it. Two thousand two hundred and six times: absurd enough to seem made up, specific enough to be real.

Anthropic’s own documentation on long-context performance noted that extended context degrades precision on specific details. Not suddenly. Gradually.

The longer a session runs, the thinner the model’s attention spreads.

AugmentCode reported 22.7% token savings after introducing scheduled context compression. Treat it as directional rather than definitive. The direction is consistent with everything else here.

The patterns all point the same way. Context isn’t a helpful record of your work. It’s dead weight Claude carries, whether it’s useful or not.

What you can do right now

The fix is the same for both quality and cost. When a task is complete, close the session and start fresh. Give the next task its own clean context.

At the start of each new session, give Claude a structured brief rather than a conversation.

Cover what you’re building, what you need from this session, and any constraints that apply.

That replaces thirty messages of accumulated history with something that actually serves the task.

If you upload files to workflows, Markdown files are preferable to PDFs. You can convert PDFs into markdown with this prompt.

Convert PDF into Markdown in Claude AI

You are about to receive a PDF document. Read it in full, then do the following:

1. Extract all human-readable content

2. Preserve all structure using Markdown — headings as ##, subheadings as ###, bullet points as -, numbered lists as 1. 2. 3., bold text as **bold**, and section breaks as blank lines

3. Strip all filler, repetition, padding, and non-essential language — keep only facts, data, frameworks, decisions, and actionable content

4. If any section contains a diagram or visual that cannot be extracted as text, replace it with a one-line plain text description in brackets e.g., [Diagram: content workflow showing 3 stages]

5. Output the result as a single downloadable file called `_ai-business-blueprint.md.`

6. Do not summarise or paraphrase key facts — compress length by removing filler, not by losing information

Output nothing except the downloadable file.

Also, context is working memory, not a running log. Start a new session when the current task is done.

If outputs degrade over a long session, context weight is usually the cause, not the prompt. Starting over is faster than revising the prompt a dozen times.

For longer technical workflows, document processing, or multi-step research, LLMLingua is worth knowing about.

Twenty times compression. Under two percent performance loss. No changes to the model or the workflow are required.

The practical version: one task, one session. When the session has done its job, stop. The next session will be sharper for it.

Context management is one of those things that feels obvious the moment someone names it for you.

Cheers,

Mark
The AI Learning Guy
👋⚡😎

The AI Learning Guy newsletter 🤖 🧠💡

AI learning hacks and mega prompts delivered to your inbox.

Interesting Sources

LLMLingua compression paper — arXiv / Microsoft Research
LLMLingua project — Microsoft Research
API caching bug report — GitHub
Anthropic long-context guidance — Anthropic
AugmentCode context compression — AugmentCode

Note: No single website has all the answers. This list serves as a starting point for those who want to explore or satisfy their curiosity about AI.
Links: Links with * are affiliate links. See disclosure below.

👋 Happy Labor Day,

The AI Learning Guy newsletter 🤖 🧠💡

What Claude actually carries into every response

The session that costs the most performs the worst

What the evidence actually shows

What you can do right now

The AI Learning Guy newsletter 🤖 🧠💡

Interesting Sources

Leave a Reply Cancel reply