NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Changes in the system prompt between Claude Opus 4.6 and 4.7 (simonwillison.net)
embedding-shape 4 hours ago [-]
> The new <acting_vs_clarifying> section includes: When a request leaves minor details unspecified, the person typically wants Claude to make a reasonable attempt now, not to be interviewed first.

Uff, I've tried stuff like these in my prompts, and the results are never good, I much prefer the agent to prompt me upfront to resolve that before it "attempts" whatever it wants, kind of surprised to see that they added that

alsetmusic 4 minutes ago [-]
I've recently started adding something along the lines of "if you can't find or don't know something, don't assume. Ask me." It's helped cut down on me having to tell it to undo or redo things a fair amount. I also have used something like, "Other agents have made mistakes with this. You have to explain what you think we're doing so I can approve." It's kind of stupid to have to do this, but it really increases the quality of the output when you make it explain, correct mistakes, and iterate until it tells you the right outcome before it operates.

Edit: forgot "don't assume"

naasking 2 hours ago [-]
Seriously, when you're conversing with a person would you prefer they start rambling on their own interpretation or would you prefer they ask you to clarify? The latter seems pretty natural and obvious.

Edit: That said, it's entirely possible that large and sophisticated LLMs can invent some pretty bizarre but technically possible interpretations, so maybe this is to curb that tendency.

embedding-shape 2 hours ago [-]
> The latter seems pretty natural and obvious.

To me too, if something is ambigious or unclear when I'm getting something to do from someone, I need to ask them to clarify, anything else be borderline insane in my world.

But I know so many people whose approach is basically "Well, you didn't clearly state/say X so clearly that was up to me to interpret however I wanted, usually the easiest/shortest way for me", which is exactly how LLMs seem to take prompts with ambigiouity too, unless you strongly prompt them to not "reasonable attempt now without asking questions".

gausswho 2 hours ago [-]
walthamstow 3 hours ago [-]
The eating disorder section is kind of crazy. Are we going to incrementally add sections for every 'bad' human behaviour as time goes on?
embedding-shape 3 hours ago [-]
Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".
gchamonlive 2 hours ago [-]
Just assume each model iteration incorporates all the censorship prompts before and compile the possible list from the system prompt history. To validate it, design an adversary test against the items in the compiled list.
rzmmm 1 hours ago [-]
The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.

It's a particularly sensitive issue so they are just probably being cautious.

WarmWash 2 hours ago [-]
When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.

So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.

felixgallo 3 hours ago [-]
I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?
23 minutes ago [-]
idiotsecant 2 hours ago [-]
Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.

Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.

SoKamil 3 hours ago [-]
New knowledge cutoff date means this is a new foundation model?
lkbm 2 hours ago [-]
Yes, but doesn't the token change mean that?
jimmypk 3 hours ago [-]
[dead]
mwexler 2 hours ago [-]
Interesting that it's not a direct "you should" but an omniscient 3rd person perspective "Claude should".

Also full of "can" and "should" phrases: feels both passive and subjunctive as wishes, vs strict commands (I guess these are better termed “modals”, but not an expert)

sigmoid10 3 hours ago [-]
I knew these system prompts were getting big, but holy fuck. More than 60,000 words. With the 3/4 words per token rule of thumb, that's ~80k tokens. Even with 1M context window, that is approaching 10% and you haven't even had any user input yet. And it gets churned by every single request they receive. No wonder their infra costs keep ballooning. And most of it seems to be stable between claude version iterations too. Why wouldn't they try to bake this into the weights during training? Sure it's cheaper from a dev standpoint, but it is neither more secure nor more efficient from a deployment perspective.
an0malous 3 hours ago [-]
I’m just surprised this works at all. When I was building AI automations for a startup in January, even 1,000 word system prompts would cause the model to start losing track of some of the rules. You could even have something simple like “never do X” and it would still sometimes do X.
embedding-shape 3 hours ago [-]
Two things; the model and runtime matters a lot, smaller/quantized models are basically useless at strict instruction following, compared to SOTA models. The second thing is that "never do X" doesn't work that well, if you want it to "never do X" you need to adjust the harness and/or steer it with "positive prompting" instead. Don't do "Never use uppercase" but instead do "Always use lowercase only", as a silly example, you'll get a lot better results. If you've trained dogs ("positive reinforcement training") before, this will come easier to you.
dataviz1000 2 hours ago [-]
I created a test evaluation (they friggen' stole the word harness) that runs a changed prompt comparing success pass / fail, the number of tokens and time of any change. It is an easy thing to do. The best part is I set up an orchestration pattern where one agent iterations updating the target agent prompts. Not only can it evaluate the outcome after the changes, it can update and rerun self-healing and fixing itself.
mysterydip 3 hours ago [-]
I assume the reason it’s not baked in is so they can “hotfix” it after release. but surely that many things don’t need updates afterwards. there’s novels that are shorter.
sigmoid10 3 hours ago [-]
Yeah that was the original idea of system prompts. Change global behaviour without retraining and with higher authority than users. But this has slowly turned into a complete mess, at least for Anthropic. I'd love to see OpenAI's and Google's system prompts for comparison though. Would be interesting to know if they are just more compute rich or more efficient.
2 hours ago [-]
jatora 3 hours ago [-]
There are different sections in the markdown for different models. It is only 3-4000 words
winwang 3 hours ago [-]
That's usually not how these things work. Only parts of the prompt are actually loaded at any given moment. For example, "system prompt" warnings about intellectual property are effectively alerts that the model gets. ...Though I have to ask in case I'm assuming something dumb: what are you referring to when you said "more than 60,000 words"?
sigmoid10 3 hours ago [-]
What you're describing is not how these things usually work. And all I did was a wc on the .md file.
bavell 3 hours ago [-]
The system prompt is always loaded in its entirety IIUC. It's technically possible to modify it during a conversation but that would invalidate the prefill cache for the big model providers.
formerly_proven 3 hours ago [-]
Surely the system prompt is cached across accounts?
sigmoid10 3 hours ago [-]
You can cache K and V matrices, but for such huge matrices you'll still pay a ton of compute to calculate attention in the end even if the user just adds a five word question.
cfcf14 3 hours ago [-]
I would assume so too, so the costs would not be so substantial to Anthropic.
cma 2 hours ago [-]
> And it gets churned by every single request they receive

It gets pretty efficiently cached, but does eat the context window and RAM.

cfcf14 4 hours ago [-]
I'm curious as to why 4.7 seems obsessed with avoiding any actions that could help the user create or enhance malware. The system prompts seem similar on the matter, so I wonder if this is an early attempt by Anthropic to use steering vector injection?

The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).

In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!

daemonologist 3 hours ago [-]
Note that these are the "chat" system prompts - although it's not mentioned I would assume that Claude Code gets something significantly different, which might have more language about malware refusal (other coding tools would use the API and provide their own prompts).

Of course it's also been noted that this seems to be a new base model, so the change could certainly be in the model itself.

chatmasta 1 hours ago [-]
Claude Code system prompt diffs are available here: https://cchistory.mariozechner.at/?from=2.1.98&to=2.1.112

(URL is to diff since 2.1.98 which seems to be the version that preceded the first reference to Opus 4.7)

dhedlund 33 minutes ago [-]
The "Picking delaySeconds" section is quite enlightening.

I feel like this explains about a quarter to half of my token burn. It was never really clear to me whether tool calls in an agent session would keep the context hot or whether I would have to pay the entire context loading penalty after each call; from my perspective it's one request. I have Claude routinely do large numbers of sequential tool calls, or have long running processes with fairly large context windows. Ouch.

> The Anthropic prompt cache has a 5-minute TTL. Sleeping past 300 seconds means the next wake-up reads your full conversation context uncached — slower and more expensive. So the natural breakpoints:

> - *Under 5 minutes (60s–270s)*: cache stays warm. Right for active work — checking a build, polling for state that's about to change, watching a process you just started.

> - *5 minutes to 1 hour (300s–3600s)*: pay the cache miss. Right when there's no point checking sooner — waiting on something that takes minutes to change, or genuinely idle.

> *Don't pick 300s.* It's the worst-of-both: you pay the cache miss without amortizing it. If you're tempted to "wait 5 minutes," either drop to 270s (stay in cache) or commit to 1200s+ (one cache miss buys a much longer wait). Don't think in round-number minutes — think in cache windows.

> For idle ticks with no specific signal to watch, default to *1200s–1800s* (20–30 min). The loop checks back, you don't burn cache 12× per hour for nothing, and the user can always interrupt if they need you sooner.

> Think about what you're actually waiting for, not just "how long should I sleep." If you kicked off an 8-minute build, sleeping 60s burns the cache 8 times before it finishes — sleep ~270s twice instead.

> The runtime clamps to [60, 3600], so you don't need to clamp yourself.

Definitely not clear if you're only used to the subscription plan that every single interaction triggers a full context load. It's all one session session to most people. So long as they keep replying quickly, or queue up a long arc of work, then there's probably a expectation that you wouldn't incur that much context loading cost. But this suggests that's not at all true.

dandaka 4 hours ago [-]
I have started to notice this malware paranoia in 4.6, Boris was surprised to hear that in comments, probably a bug
ikidd 2 hours ago [-]
I had seen reports that it was clamping down on security research and things like web-scraping projects were getting caught up in that and not able to use the model very easily anymore. But I don't see any changes mentioned in the prompt that seem likely to have affected that, which is where I would think such changes would have been implemented.
embedding-shape 2 hours ago [-]
I think it depends on how badly they want to avoid it. Stuff that is "We prefer if the model didn't do these things when the model is used here" goes into the system prompt, meanwhile stuff that is "We really need to avoid this ever being in any outputs, regardless of when/where the model is used" goes into post-training.

So I'm guessing they want none of the model users (webui + API) to be able to do those things, rather than not being able to do that just in the webui. The changes mentioned in the submission is just for claude.ai AFAIK, not API users, so the "disordered eating" stuff will only be prevented when API users would prompt against it in their system prompts, but not required.

kaoD 2 hours ago [-]
I wonder if the child safety section "leaks" behavior into other risky topics, like malware analysis. I see overlap in how the reports mention that once the safety has been tripped it becomes even more reluctant to work, which seems to match the instructions here for child safety.
bakugo 2 hours ago [-]
It's built into the model, not part of the system prompt. You'll get the same refusals via the API.
richardwong1 2 hours ago [-]
The new `tool_search` mechanism is interesting — it's Anthropic telling Claude "don't claim a tool doesn't exist until you've actually asked the harness about it". The old failure mode was Claude saying "I don't have web access" when the harness DID have web search but it just wasn't in the current tool list. tool_search fixes that confabulation by giving Claude an explicit "check for deferred tools" step.

The wider implication: this is Anthropic admitting the tool-list-in-the-system-prompt model doesn't scale. Once you have dozens of specialised tools (remote MCPs, custom agents, per-workspace plugins), you can't fit them all into the context window's tool slots at initialization. You need a searchable tool registry and a mechanism for the model to pull tools on demand.

MCP's tools/list pagination (added in the 2025-06-18 spec) is the protocol-level version of the same idea. Clients that actually use paginated tool loading + dynamic tool fetching haven't taken off yet — most still flatten all tools into the initial handshake. The tool_search system-prompt entry is Anthropic's nudge for the model itself to handle deferred tools smarter.

dd8601fn 24 minutes ago [-]
Is this really a common problem? This stuff is way above me, but my toy agent seems to have bypassed this as a problem.

I did this in mine by only really having a few relevant tool functions in the prompt, ever. Search for a Tool Function, Execute A Tool Function, Request Authoring of a Tool Function, Request an Update to a Tool Function, Check Status of an Authoring Request.

It doesn't have to "remember" much. Any other functions are ones it already searched for and found in the tool service.

When it needs a tool it reliably searches (just natural language) against the vector db catalog of functions for a good match. If it doesn't have one, it requests one. The authoring pipeline does its thing, and eventually it has a new function to use.

dmk 4 hours ago [-]
The acting_vs_clarifying change is the one I notice most as a heavy user. Older Claude would ask 3 clarifying questions before doing anything. Now it just picks the most reasonable interpretation and goes. Way less friction in practice.
bavell 3 hours ago [-]
Haven't had a chance to test 4.7 much but one of my pet peeves with 4.6 is how eager it is to jump into implementation. Though maybe the 4.7 is smarter about this now.
sersi 55 minutes ago [-]
I really hate that change, it's now regularly picking bad interpretation instead of asking.
varispeed 4 hours ago [-]
Before Opus 4.7, the 4.6 became very much unusable as it has been flagging normal data analysis scripts it wrote itself as cyber security risk. Got several sessions blocked and was unable to finish research with it and had to switch to GPT-5.4 which has its own problems, but at least is not eager to interfere in legitimate work.

edit: to be fair Anthropic should be giving money back for sessions terminated this way.

ceejayoz 3 hours ago [-]
> edit: to be fair Anthropic should be giving money back for sessions terminated this way.

I asked it for one and it told me to file a Github issue.

Which I interpreted as "fuck off".

mannanj 2 hours ago [-]
Personally, as someone who has been lucky enough to completely cure "incurable" diseases with diet, self experimentation and learning from experts who disagreed with the common societal beliefs at the time - I'm concerned that an AI model and an AI company is planting beliefs and limiting what people can and can't learn through their own will and agency.

My concern is these models revert all medical, scientific and personal inquiry to the norm and averages of whats socially acceptable. That's very anti-scientific in my opinion and feels dystopian.

3 hours ago [-]
kantaro 2 hours ago [-]
[dead]
foreman_ 5 hours ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 15:54:11 GMT+0000 (Coordinated Universal Time) with Vercel.