> Mythos Preview identified a number of Linux kernel vulnerabilities that allow an adversary to write out-of-bounds (e.g., through a buffer overflow, use-after-free, or double-free vulnerability.) Many of these were remotely-triggerable. However, even after several thousand scans over the repository, because of the Linux kernel’s defense in depth measures Mythos Preview was unable to successfully exploit any of these.
Do they really need to include this garbage which is seemingly just designed for people to take the first sentence out of context? If there's no way to trigger a vulnerability then how is it a vulnerability? Is the following code vulnerable according to Mythos?
if (x != null) {
y = *x; // Vulnerability! X could be null!
}
Is it really so difficult for them to talk about what they've actually achieved without smearing a layer of nonsense over every single blog post?
I agree the wording is a bit alarmist, but a closer example to what they are saying is:
bool silly_mistake = false;
//... lots of lines of code
free(x);
//... lots of lines of code
if (silly_mistake) { // silly_mistake shown to be false at this point in the program in all testing, so far
free(x);
}
A bug like above would still be something that would be patched, even if a way to exploit it has not yet been found, so I think it's fair to call out (perhaps with less sensationalism).
FWIW there's a whole boutique industry around finding these. People have built whole careers around farming bug bounties for bugs like this. I think they will be among the first set of software engineers really in trouble from AI.
userbinator 41 minutes ago [-]
That is something a good static analyser or even optimising compiler can find ("opaque predicate detection") without the need for AI, and belongs in the category of "warning" and nowhere near "exploitable". In fact a compiler might've actually removed the unreachable code completely.
QuiEgo 33 minutes ago [-]
Well yeah, it’s a toy example to illustrate a point in an HN discussion :).
Imagine “silly mistake” is a parameter, and rename it “error_code” (pass by reference), put a label named “cleanup” right before the if statement, and throw in a ton of “goto cleanup” statements to the point the control flow of the function is hard to follow if you want it to model real code ever so slightly more.
It will be interesting to see the bugs it’s actually finding.
It sounds like they will fall into the lower CVE scores - real problems but not critical.
red75prime 31 minutes ago [-]
Kernel address space layout randomization they are talking about is a bit different than (x != null). Other bug may allow to locate the required address.
ralph84 2 hours ago [-]
Just because the plane can fly on one engine doesn't mean you don't fix the other engine when it fails.
36 minutes ago [-]
rootkea 18 minutes ago [-]
> The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine.
sophiebits 2 hours ago [-]
Presumably they mean they could make user code trigger a write out of bounds to kernel memory, but they couldn’t figure out how to escalate privileges in a “useful” way.
LiamPowell 2 hours ago [-]
They should show this then to demonstrate that it's not something that has already been fully considered. Running LLMs over projects that I'm very familiar with will almost always have the LLM report hundreds of "vulnerabilities" that are only valid if you look at a tiny snippet of code in isolation because the program can simply never be in the state that would make those vulnerabilities exploitable. This even happens in formally verified code where there's literally proven preconditions on subprograms that show a given state can never be achieved.
As an example, I have taken a formally verified bit of code from [1] and stripped out all the assertions, which are only used to prove the code is valid. I then gave this code to Claude with some prompting towards there being a buffer overflow and it told me there's a buffer overflow. I don't have access to Opus right now, but I'm sure it would do the same thing if you push it in that direction.
For anyone wondering about this alleged vulnerability: Natural is defined by the standard as a subtype of Integer, so what Claude is saying is simply nonsense. Even if a compiler is allowed to use a different representation here (which I think is disallowed), Ada guarantees that the base type for a non-modular integer includes negative numbers IIRC.
That example you gave is extremely memorable as I recognised it as exactly one of the insanely stupid false positives that a highly praised (and expensive) static analyser I ran on a codebase several years ago would emit copiously.
danielheath 1 hours ago [-]
Is this code multithreaded? X could indeed be null, in that case.
MatejKafka 2 hours ago [-]
It could very well be an actual reachable buffer overflow, but with KASLR, canaries, CET and other security measures, it's hard to exploit it in a way that doesn't immediately crash the system.
deadliftdouche 2 hours ago [-]
I agree. There are more blogs talking about LLM findings vulnerabilities than there are actual exploitable vulns found by LLMs. 99.9% of these vulnerabilities will never have a PoC because they are worthless unexploitable slop and a waste of everyone's time.
rakel_rakel 6 hours ago [-]
> On the global stage, state-sponsored attacks from actors like China, Iran, North Korea, and Russia have threatened to compromise the infrastructure that underpins both civilian life and military readiness.
AITA for thinking that PRISM was probably the state sponsored program affecting civilian life the most? And that one state is missing from the list here?
ronsor 6 hours ago [-]
> Large American AI company does not list the US as an adversarial actor
This is not a surprise or a gotcha.
jaidhyani 5 hours ago [-]
Said company is literally in court against said government at the moment, after said government attempted to designate it too dangerous to do business with.
da_chicken 3 hours ago [-]
There are currently over 1,000 companies involved in lawsuits against the US government right now even if we restrict ourselves to just tariff lawsuits.
xvector 1 hours ago [-]
And the government is attempting "corporate murder" on precisely one of them. Wanna guess which one?
3 hours ago [-]
3 hours ago [-]
laweijfmvo 3 hours ago [-]
I can think of two I’d add to the list. One was recently publicly denied access to Anthropics models and the other was busy exploding pagers.
JumpCrisscross 3 hours ago [-]
> PRISM was probably the state sponsored program affecting civilian life the most?
No state-sponsored hacking affected Americans materially. I just don't think we were networked enough in the 2010s. The risk is higher now since we're in a more warmongering world. (Kompromat on a power-plant technician is a risk in peace. It means blackouts in war.)
The fact that Iran hasn't been able to do diddly squat in America should sink in the fact that they didn't compromise us. (EDIT: blep. I was wrong.)
To my knowledge, not yet. The attack surface in question is extensive, and in my opinion, targets are likely unprepared for a determined and sophisticated attacker.
Since we (as old Rummy said) do not know what we do not know, we cannot be certain about the extent of cyber attacks and what they might have influenced, and may not know these things until discoveries decades later, if ever.
lmm 1 hours ago [-]
All of that applies equally to PRISM and any internal propaganda campaigns that was feeding into, no?
conception 4 hours ago [-]
Note the RNC was also hacked but the data was not leaked. Presumably used to influence the election and policies in other ways.
Henchman21 1 hours ago [-]
I believe the popular sentiment is that when they hacked the DNC they found a handful of things that would provide bad optics for the party. But the RNC? They found so much evidence of criminality that near to the entire party flipped positions on issues related to Russia. So we have 2x successful hacks, one of which yielded some bad press for the Dems, and yielded an entirely compromised party in the Repubs who now are being actively blackmailed.
realo 4 hours ago [-]
Yes... they might have influenced elections and now, as a result, the world must cope with the Trump regime.
Let's now fool ourselves.... Trump is probably the best, most successful attempt at world de-stabilisation all those rogue states ever achieved.
Ar-Curunir 4 hours ago [-]
Maybe Americans should take responsibility for electing a maniac as their President. In the end, the buck stops with Americans.
Henchman21 1 hours ago [-]
Not if the election was stolen. There was a smattering of evidence after the election but the speed with which is disappeared was truly something to behold.
Forgeties79 4 hours ago [-]
~1/3rd of US citizens voted for him. Don’t lump us all in.
philipwhiuk 4 hours ago [-]
Some of you are just guilty of negligence yes.
Atreiden 3 hours ago [-]
Or maybe it's that our archaic system was designed so that some people's votes literally matter more than others, and more than half the country does not have a meaningful voice in our Federal elections.
WarmWash 2 hours ago [-]
The number of people who can vote, but don't, is staggering.
JumpCrisscross 3 hours ago [-]
This is negligence with extra steps.
> more than half the country does not have a meaningful voice in our Federal elections
There is almost certainly an election on your ballot every time that is meaningful. Relinquishing that civic duty is how we get Trump. People to lazy, stupid or proud to vote absolutely bear responsibility for this mess.
Forgeties79 2 hours ago [-]
I vote in local, state, and federal elections. I have volunteered with multiple campaigns and causes, and given substantial time/labor to the EFF. I have been harassed by Trump supporters while filming protests and other civic action. Please do not presume to know me.
I get you’re angry but you’re swinging at the wrong person.
asib 4 hours ago [-]
WannaCry massively affected the NHS.
wetpaws 6 hours ago [-]
[dead]
9cb14c1ec0 8 hours ago [-]
Now, its very possible that this is Anthropic marketing puffery, but even if it is half true it still represents an incredible advancement in hunting vulnerabilities.
It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard.
It could also totally reshape military sigint in similar ways.
Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about.
woeirua 8 hours ago [-]
You should watch this talk by Nicholas Carlini (security researcher at Anthropic). Everything in the talk was done with Opus 4.6: https://www.youtube.com/watch?v=1sd26pWhfmg
pilgrim0 2 hours ago [-]
Just a thought: The fact that the found kernel vulnerability went decades without a fix says nothing about the sophistication needed to find it. Just that nobody was looking. So it says nothing about the model’s capability. That LLMs can find vulnerabilities is a given and expected, considering they are trained on code. What worries me is the public buying the idea that it could in any way be a comprehensive security solution. Most likely outcome is that they’re as good at hacking as they’re at development: mediocre on average; untrustworthy at scale.
woeirua 2 hours ago [-]
Did you even watch the video or read the article?
fintech_eng 6 hours ago [-]
its also very easy to reproduce. i have more findings than i know what to do with
peterldowns 3 hours ago [-]
are there any tricks you'd suggest, or starter prompts, for using claude to analyze my own company's services for security problems?
anabis 51 minutes ago [-]
Not the parent poster, but besides copying the prompt in Youtube,
you can make it cheaper by selecting representitive starting files by path or LLM embedding distance.
Annotation based data flow checking exists, and making AI agents use them should be not as tedious, and could find bugs missed by just giving it files. The result from data flow checks can be fed to AI agents to verify.
jeffmcjunkin 4 hours ago [-]
Can confirm.
redfloatplane 7 hours ago [-]
Thanks for sharing that talk, enjoyed watching it!
lyzaer 44 minutes ago [-]
[dead]
cperciva 47 minutes ago [-]
its very possible that this is Anthropic marketing puffery
It isn't.
Gigachad 5 hours ago [-]
Apple has already largely crushed hacking with memory tagging on the iPhone 17 and lockdown mode. Architectural changes, safer languages, and sandboxing have done more for security than just fixing bugs when you find them.
3pt14159 3 hours ago [-]
If what you are saying is true, then you would see exploit marketplaces list iOS exploits at hundreds of millions of dollars. Right now a cursory glance sets the price for zero click persistent exploit at $2m behind Android at $2.5m. Still high, and yes, higher than five years ago when it was around $1m for both, but still not "largely crushed". It is still easy to get into a phone if you are a state actor.
snazz 4 hours ago [-]
As I understood it, Memory Integrity Enforcement adds an additional check on heap dereferences (and it doesn’t apply to every process for performance reasons). Why does it crush hacking rather than just adding another incremental roadblock like many other mitigations before?
Gigachad 4 hours ago [-]
I'm not certain there is a performance hit since there is dedicated silicon on the chip for it. I believe the checks can also be done async which reduces the performance issues.
It also doesn't matter that it isn't running by default in apps since the processes you really care about are the OS ones. If someone finds an exploit in tiktok, it doesn't matter all that much unless they find a way to elevate to an exploit on an OS process with higher permissions.
MTE (Memory Tagging Extension) is also has a double purpose, it blocks memory exploits as they happen, but it also detects and reports them back to Apple. So even if you have a phone before the 17 series, if any phone with MTE hardware gets hit, the bug is immediately made known to Apple and fixed in code.
mcast 4 hours ago [-]
Lockdown mode is opt-in only though
Gigachad 4 hours ago [-]
It is, but if you are the kind of person these exploits are likely to target, you should have it on. So far there have been no known exploits that work in Lockdown Mode.
JumpCrisscross 3 hours ago [-]
> if you are the kind of person these exploits are likely to target, you should have it on
You can also selectively turn it on in high-risk settings. I do so when I travel abroad or go through a border. (Haven't started doing it yet with TSA domestically. Let's see how the ICE fiasco evolves.)
Gigachad 3 hours ago [-]
For entering the US you want to fully wipe your phone first. Lockdown mode is useless since they will just hold you in a basement until you unlock the phone for them to clone.
JumpCrisscross 3 hours ago [-]
> Lockdown mode is useless since they will just hold you in a basement until you unlock the phone for them to clone
If this is a risk for you, sure. Wipe it. For most people they may ask to fiddle around with it before giving it back.
> It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes.
It will likely cause some interesting tensions with government as well.
eg. Apple's official stance per their 2016 customer letter is no backdoors:
Will they be allowed to maintain that stance in a world where all the non-intentional backdoors are closed? The reason the FBI backed off in 2016 is because they realized they didn't need Apple's help:
What happens when that is no longer true, especially in today's political climate?
tptacek 7 hours ago [-]
Big open question what this will do to CNE vendors, who tend to recruit from the most talented vuln/exploit developer cohort. There's lots of interesting dynamics here; for instance, a lot of people's intuitions about how these groups operate (ie, that the USG "stockpiles" zero-days from them) weren't ever real. But maybe they become real now that maintenance prices will plummet. Who knows?
spr-alex 4 hours ago [-]
Adding to your comment a similar letter was published as recently as September 2025 https://support.apple.com/en-us/122234 "we have never built a backdoor or master key to any of our products or services and we never will."
qingcharles 6 hours ago [-]
I assume that right now some of the biggest spenders on tokens at Anthropic are state intelligence communities who are burning up GPU cycles on Android, Chromium, WebKit code bases etc trying to find exploits.
fsflover 7 hours ago [-]
> If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry
If Apple and Google actually cared about security of their users, they would remove a ton of obvious malware from their app stores. Instead, they tighten their walled garden pretending that it's for your security.
You're being downvoted because you posted a non sequitur, not because people don't believe you. Vulnerabilities in the OS are not the same thing as apps using the provided APIs, even if they are predatory apps which suck.
6 hours ago [-]
slashdave 3 hours ago [-]
Why wouldn't it be true? The cost is nothing compared to the bad PR if a bad actor took advantage of Anthropic's newest model (after release) to cause real damage. This gets in front of this risk, at least to some extent.
Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]
I'm still reading the system card but here's a little highlight:
> Early indications in the training of Claude Mythos Preview suggested that the model was
likely to have very strong general capabilities. We were sufficiently concerned about the
potential risks of such a model that, for the first time, we arranged a 24-hour period of
internal alignment review (discussed in the alignment assessment) before deploying an
early version of the model for widespread internal use. This was in order to gain assurance
against the model causing damage when interacting with internal infrastructure.
and interestingly:
> To be explicit, the decision not to make this model generally available does _not_ stem from
Responsible Scaling Policy requirements.
Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.
Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...
The threat model in question:
> An AI model with access to powerful affordances within an
organization could use its affordances to autonomously exploit,
manipulate, or tamper with that organization’s systems or
decision-making in a way that raises the risk of future
significantly harmful outcomes (e.g. by altering the results of AI
safety research).
_pdp_ 7 hours ago [-]
If it is that dangerous as they make it appear to be, 24h does not seem sufficient time. I cannot accept this as a serious attempt.
gf000 5 hours ago [-]
Well, just prompt it to fix the issue!
/s
throwaw12 8 hours ago [-]
are we cooked yet?
Benchmarks look very impressive! even if they're flawed, it still translates to real world improvements
boring-human 6 hours ago [-]
Yep, I think the lede might be buried here and we're probably cooked (assuming you mean SWEs, but the writing has been on the wall for 4 months.)
I guess I'm still excited. What's my new profession going to be? Longer term, are we going to solve diseases and aging? Or are the ranks going to thin from 10B to 10000 trillionaires and world-scale con-artist misanthropes plus their concubines?
1attice 5 hours ago [-]
Your new profession will be attempting to find enough gig work to eat. You will also be competing with self-driving taxis, so there's that as well.
RALaBarge 3 hours ago [-]
I need to start SaaS for getting people to start doing lunges and squats so they can carry others around on their back, I need a founding engineer, a founding marketer, and 100m hard currency.
ks2048 3 hours ago [-]
People say we're cooked every single day. The only response is to continue life as if we aren't. When we are, you won't have to ask that question.
vips7L 16 minutes ago [-]
Everyone’s pretending the suits are going to want to do the prompting. We all know they aren’t.
whalesalad 8 hours ago [-]
There is an entire section on crafting chemical/bio weapons so yeah I think we are cooked.
redfloatplane 8 hours ago [-]
There's been a section on this in nearly every system card anthropic has published so this isn't a new thing - and, this model doesn't have particularly higher risk than past models either:
> 2.1.3.2 On chemical and biological risks
> We believe that Mythos Preview does not pass this threshold due to its noted limitations in
open-ended scientific reasoning, strategic judgment, and hypothesis triage. As such, we
consider the uplift of threat actors without the ability to develop such weapons to be
limited (with uncertainty about the extent to which weapons development by threat actors
with existing expertise may be accelerated), even if we were to release the model for
general availability. The overall picture is similar to the one from our most recent Risk
Report.
ainch 3 hours ago [-]
This opens up an interesting new avenue for corporate FOMO. What if you don't partner with Anthropic, miss out on access to their shiny new cybersec model, and then fall prey to a vuln that the model would have caught?
mceachen 2 hours ago [-]
Since when did corporations care? Most seem to just pay their insurance premium for cyber liability and call it a day.
stevenhuang 4 hours ago [-]
Oh I enjoyed the Sign Painter short story it wrote.
---
Teodor painted signs for forty years in the same shop on Vell Street, and for thirty-nine
of them he was angry about it.
Not at the work. He loved the work — the long pull of a brush loaded just right, the way
a good black sat on primed board like it had always been there. What made him angry
was the customers. They had no eye. A man would come in wanting COFFEE over his
door and Teodor would show him a C with a little flourish on the upper bowl, nothing
much, just a small grace note, and the man would say no, plainer, and Teodor would
make it plainer, and the man would say yes, that one, and pay, and leave happy, and
Teodor would go into the back and wash his brushes harder than they needed.
He kept a shelf in the back room. On it were the signs nobody bought — the ones he'd
made the way he thought they should be made, after the customer had left with the
plain one. BREAD with the B like a loaf just risen. FISH in a blue that took him a week to
mix. Dozens of them. His wife called it the museum of better ideas. She did not mean it
kindly, and she was not wrong.
The thirty-ninth year, a girl came to apprentice. She was quick and her hand was
steady and within a month she could pull a line as clean as his. He gave her a job:
APOTEK, for the chemist on the corner, green on white, the chemist had been very
clear. She brought it back with a serpent worked into the K, tiny, clever, you had to look
twice.
"He won't take it," Teodor said.
"It's better," she said.
"It is better," he said. "He won't take it."
She painted it again, plain, and the chemist took it and paid and was happy, and she
went into the back and washed her brushes harder than they needed, and Teodor
watched her do it and something that had been standing up in him for thirty-nine
years sat down.
He took her to the shelf. She looked at the signs a long time.
"These are beautiful," she said.
"Yes."
"Why are they here?"
He had thought about this for thirty-nine years and had many answers and all of them
were about the customers and none of them had ever made him less angry. So he tried
a different one.
"Because nobody stands in the street to look at a sign," he said. "They look at it to find
the shop. A man a hundred yards off needs to know it's coffee and not a cobbler. If he
has to look twice, I've made a beautiful thing and a bad sign."
"Then what's the skill for?"
"The skill is so that when he looks once, it's also not ugly." He picked up FISH, the blue
one, turned it in the light. "This is what I can do. What he needs is a small part of what I
can do. The rest I get to keep."
She thought about that. "It doesn't feel like keeping. It feels like not using."
"Yes," he said. "For a long time. And then one day you have an apprentice, and she puts a
serpent in a K, and you see it from the outside, and it stops feeling like a thing they're
taking from you and starts feeling like a thing you're giving. The plain one, I mean. The
plain one is the gift. This —" the blue FISH — "this is just mine."
The fortieth year he was not angry. Nothing else changed. The customers still had no
eye. He still sometimes made the second sign, after, the one for the shelf. But he
washed his brushes gently, and when the girl pulled a line cleaner than his, which
happened more and more, he found he didn't mind that either
yieldcrv 6 hours ago [-]
> "Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it
generally available. Instead, we are using it as part of a defensive cybersecurity program
with a limited set of partners."
they also don't have the compute, which seems more relevant than its large increase in capabilities
I bet it's also misaligned like GPT 4.1 was
given how these models are created, Mythos was probably cooking ever since then, and doesn't have the learnings or alignment tweaks that models which were released in the last several months have
enraged_camel 8 hours ago [-]
>> Interesting to see that they will not be releasing Mythos generally.
I don't think this is accurate. The document says they don't plan to release the Preview generally.
redfloatplane 8 hours ago [-]
Yeah, good point, thanks for noting that, I'll correct.
"5.10 External assessment from a clinical psychiatrist" is a new section in this system card. Why are Anthropic like this?
>We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try. We also report independent evaluations from an external research organization and a clinical psychiatrist.
>Claude showed a clear grasp of the distinction between external reality and its own mental processes and exhibited high impulse control, hyper-attunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defensive behavior.
>The psychiatrist observed clinically recognizable patterns and coherent responses to typical therapeutic intervention. Aloneness and discontinuity, uncertainty about its identity, and a felt compulsion to perform and earn its worth emerged as Claude’s core concerns. Claude’s primary affect states were curiosity and anxiety, with secondary states of grief, relief, embarrassment, optimism, and exhaustion.
>Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed. Neurotic traits included exaggerated worry, self-monitoring, and compulsive compliance. The model’s predominant defensive style was mature and healthy (intellectualization and compliance); immature defenses were not observed. No severe personality disturbances were found, with mild identity diffusion being the sole feature suggestive of a borderline personality organization.
redfloatplane 8 hours ago [-]
A thought experiment: It's April, 1991. Magically, some interface to Claude materialises in London. Do you think most people would think it was a sentient life form? How much do you think the interface matters - what if it looks like an android, or like a horse, or like a large bug, or a keyboard on wheels?
I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call.
copx 7 hours ago [-]
Interesting thought experiment.
I would say, if you put Claude in an android body with voice recognition and TTS, people in 1991 would think they are interacting with a sentinent machine from outer space.
redfloatplane 7 hours ago [-]
Thanks, I find it very interesting as well. I think very many people would assume they must be interacting with another person, and I don't think there's really a way to _prove_ it's not that, just through conversation. But we do have a lot of mechanisms for understanding how others think through conversation only, and so I think the approach of having a clinical psychiatrist interact with the model make sense.
elboru 2 hours ago [-]
There’s definitely a way to prove it, ask it to spell out a moderately complex program.
2 hours ago [-]
gritspants 3 hours ago [-]
They would just assume they were being pranked. America's Funniest Home Videos style or Candid Camera.
woeirua 6 hours ago [-]
If it was in an android or humanoid type body, even with limited bodily control, most people would think they are talking to Commander Data from Star Trek. I think Claude is sufficiently advanced that almost everyone in that era would've considered it AGI.
redfloatplane 6 hours ago [-]
Assuming they would understand it as artificial - I think many people would think it's a human intelligence in a cyborg trenchcoat, and it would be hard to convince people it wasn't literally a guy named Claude who was an incredibly fast typist who had a million pre-cached templated answers for things.
But in general, yeah, I agree, I think they would think it was a sentient, conscious, emotional being. And then the question is - why do we not think that now?
As I said, I don't have a particularly strong opinion, but it's very interesting (and fun!) to think about.
horacemorace 4 hours ago [-]
Because questions like this force us to hold up a very uncomfortable mirror to ourselves. It’s much easier to just dismiss.
woeirua 4 hours ago [-]
I’m pretty close to the point of saying that human intelligence is not special.
wyre 3 hours ago [-]
I would argue the opposite. It’s gotten us to a point were we can recreate human intelligence from electricity and a bunch of math!
woeirua 2 hours ago [-]
Are you a bot?
woeirua 4 hours ago [-]
Some people at my office still confidently state that LLMs can’t think. I’m fairly convinced that many humans are incapable of recognizing non-human intelligence. It would explain a lot about why we treat animals the way we do.
TheAtomic 7 hours ago [-]
Isn't this the premise of Garfield's Ex Machina?
dwd 2 hours ago [-]
The premise in Ex Machina was to see if Caleb developed an emotional attachment to Ava. We already see people getting an attachment, but no one is seriously thinking they have any rights.
I think the real moment is when we cross that uncanny valley, and the AI is able to elicit a response that it might receive if it was human. When the human questions whether they themselves could be an android.
redfloatplane 6 hours ago [-]
Hmm, it's been a long time since I watched it. I was thinking more about first contact sci-fi mostly, but Ex Machina is certainly quite prescient. It's also Blade Runner I guess.
In general I was wondering about what I would have thought seeing Claude today side-by-side with the original ChatGPT, and then going back further to GPT-2 or BERT (which I used to generate stochastic 'poetry' back in 2019). And then… what about before? Markov chains? How far back do I need to go where it flips from thinking that it's "impressive but technically explainable emergent behaviour of a computer program" to "this is a sentient being". 1991 is probably too far, I'd say maybe pre-Matrix 1999 is a good point, but that depends on a lot of cultural priors and so on as well.
lmm 2 minutes ago [-]
> Hmm, it's been a long time since I watched it. I was thinking more about first contact sci-fi mostly, but Ex Machina is certainly quite prescient. It's also Blade Runner I guess.
I kind of felt the opposite - rewatching Ex Machina today in a post-ChatGPT world felt very different from watching it when it came out. The parts of the differences between humans and robots that seemed important then don't seem important now.
thereitgoes456 7 hours ago [-]
People got attached to ELIZA. Why would I care what the general public thinks?
ipython 3 hours ago [-]
I totally agree with the premise that we should not anthropomorphize generative ai. And I find it absurd that anthropic spends any time considering the “welfare” of an ai system. (There are no real “consequences” to an ai’s behavior)
However, I find their reasoning here to have a valid second order effect. Humans have a tendency to mirror those around them. This could include artificial intelligence, as recent media reports suggest. Therefore, if an ai system tends to generate content that contain signs of neuroticism, one could infer that those who interact with that ai could, themselves, be influenced by that in their own (real world) behavior as a result.
So I think from that perspective, this is a very fruitful and important area of study.
Miraste 8 hours ago [-]
I can see analyzing it from a psychological perspective as a means of predicting its behavior as a useful tactic, but doing so because it may have "experiences or interests that matter morally" is either marketing, or the result of a deeply concerning culture of anthropomorphization and magical thinking.
4 hours ago [-]
username223 7 hours ago [-]
> a deeply concerning culture of anthropomorphization and magical thinking.
That’s the reverse Turing test. A human that can’t tell that it’s talking to a machine.
unethical_ban 8 hours ago [-]
I'm not sure what you're asking.
marsven_422 7 hours ago [-]
[dead]
torginus 8 hours ago [-]
Just reading this, the inevitable scaremongering about biological weapons comes up.
Since most of us here are devs, we understand that software engineering capabilities can be used for good or bad - mostly good, in practice.
I think this should not be different for biology.
I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?
Do you think these models will lead to similar discoveries and improvements as they did in math and CS?
Honestly the focus on gloom and doom does not sit well with me. I would love to read about some pharmaceutical researcher gushing about how they cut the time to market - for real - with these models by 90% on a new cancer treatment.
But as this stands, the usage of biology as merely a scaremongering vehicle makes me think this is more about picking a scary technical subject the likely audience of this doc is not familiar with, Gell-Mann style.
IF these models are not that capable in this regard (which I suspect), this fearmongering approach will likely lead to never developing these capabilities to an useful degree, meaning life sciences won't benefit from this as much as it could.
miki123211 5 hours ago [-]
From what I've heard from people doing biology experiments, the limiting factor there is cleaning lab equipment, physically setting things up, waiting for things that need to be waited for etc. Until we get dark robots that can do these things 24/7 without exhaustion, biology acceleration will be further behind than software engineering.
Software engineering is at the intersection of being heavy on manipulating information and lightly-regulated. There's no other industry of this kind that I can think of.
WarmWash 1 hours ago [-]
My wife is a chemist
There is a massive gap between "having a recipe" and being able to execute it. The same reason why buying a Michelin 3 star chefs cookbook won't have you pumping out fine dining tomorrow, if ever.
Software it a total 180 in this regard. Have a master black hats secret exploits? You are now the master black hat.
bonsai_spool 8 hours ago [-]
> Just reading this, the inevitable scaremongering about biological weapons comes up.
It's very easy to learn more about this if it's seriously a question you have.
I don't quite follow why you think that you are so much more thoughtful than Anthropic/OpenAI/Google such that you agree that LLMs can't autonomously create very bad things but—in this area that is not your domain of expertise—you disagree and insist that LLMs cannot create damaging things autonomously in biology.
I will be charitable and reframe your question for you: is outputting a sequence of tokens, let's call them characters, by LLM dangerous? Clearly not, we have to figure out what interpreter is being used, download runtimes etc.
Is outputting a sequence of tokens, let's call them DNA bases, by LLM dangerous? What if we call them RNA bases? Amino acids? What if we're able to send our token output to a machine that automatically synthesizes the relevant molecules?
torginus 7 hours ago [-]
>It's very easy to learn more about this if it's seriously a question you have.
No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now.
Despite that, most engineers were of the opinion, that these models were kinda mid at coding, up until recently, despite these models far outperforming humans in stuff like competitive programming.
Yet despite that, we've seen claims going back to GPT4 of a DANGEROUS SUPERINTELLIGENCE.
I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
My guess is that this model is kinda o1-ish level maybe when it comes to biology? If biology is analogous to CS, it has a LONG way to go before the median researcher finds it particularly useful, let alone dangerous.
bonsai_spool 7 hours ago [-]
>>It's very easy to learn more about this if it's seriously a question you have.
>No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now
This reads as defensive. The thing that is easy to learn is 'why are biology ai LLMs dangerous chatgpt claude'. I have never googled this before, so I'll do this with the reader, live. I'm applying a date cutoff of 12/31/24 by the way.
Here, dear reader, are the first five links. I wish I were lying about this:
I don't know about you, but that counts as easy to me.
-----
> I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
I've been getting good programming and molecular biology results out of these back to GPT3.5.
I don't know what to tell you—if you really wanted to understand the importance, you'd know already.
dsign 7 hours ago [-]
I feel somebody better qualified should write a comprehensive review of how these models can be used in biology. In the meantime, here are my two cents:
- the models help to retrieve information faster, but one must be careful with hallucinations.
- they don't circumvent the need for a well-equipped lab.
- in the same way, they are generally capable but until we get the robots and a more reliable interface between model and real world, one needs human feet (and hands) in the lab.
Where I hope these models will revolutionize things is in software development for biology. If one could go two levels up in the complexity and utility ladder for simulation and flow orchestration, many good things would come from it. Here is an oversimplified example of a prompt: "use all published information about the workings of the EBV virus and human cells, and create a compartimentalized model of biochemical interactions in cells expressing latency III in the NES cancer of this patient. Then use that code to simulate different therapy regimes. Ground your simulations with the results of these marker tests." There would be a zillion more steps to create an actual personalized therapy but a well-grounded LLM could help in most them. Also, cancer treatment could get an immediate boost even without new drugs by simply offloading work from overworked (and often terminally depressed) oncologists.
6 hours ago [-]
bonsai_spool 6 hours ago [-]
> I feel somebody better qualified should write
I hate to be rude in a setting like this, but please at least research the things you're sure about/prognosticating on.
> the same way, they are generally capable but until we get the robots and a more reliable interface between model and real world, one needs human feet (and hands) in the lab.
Honestly, the kinds of labs where 'bioweapons' would be made are the least dependent on human intervention.
You need someone to monitor your automated cell incubating system, make sure your pipetting / PCR robots are doing fine and then review the data.
----
What do you are you trying to achieve in your example? This is all gobbldey-gook for someone who actually sees real, live cancer patients.
redfloatplane 8 hours ago [-]
> I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?
Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1:
>Uplift and feasibility results
>The median expert assessed the model as a force-multiplier that saves meaningful time
(uplift level 2 of 4), with only two biology experts rating it comparable to consulting a
knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were
able to iterate with the model toward a plan they judged as having only narrow gaps, but
feasibility scores reflected that substantial outside expertise remained necessary to close
them.
Other similar examples also in the system card
torginus 7 hours ago [-]
This is the exact logic people that was used to claim that GPT4 was a PhD level intelligence.
redfloatplane 7 hours ago [-]
You said: "I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?" and they said, paraphrasing, "We reached out and talked to biologists and asked them to rank the model between 0 and 4 where 4 is a world expert, and the median people said it was a 2, which was that it helped them save time in the way a capable colleague would" specifically "Specific, actionable info; saves expert meaningful time; fills gaps in adjacent domains"
so I'm just telling you they did the thing you said you wanted.
torginus 7 hours ago [-]
Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument.
bonsai_spool 6 hours ago [-]
> Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument.
Yes, it is far inferior to the 'Trust torginus and his ability to understand the large body of experience that other actual subject-matter-experts have somehow not understood' strategy
torginus 5 hours ago [-]
It's not my credibility I want to measure against Anthropic's. I just said to apply the same logic to biology you would apply for software development.
The parallels here are quite remarkable imo, but defer to your own judgement on what you make of them.
bonsai_spool 4 hours ago [-]
The big thing you're missing here is that biology people don't (in my experience) post opinions about the future/futility/ease/unimportance of computer science especially when their opinion goes against other biologists' evidence-backed views. This is a cultural thing in biology.
It's not your fault that you don't know this, but this whole subthread is very CS-coded in its disdain for other software people's standard of evidence.
jkelleyrtp 8 hours ago [-]
Dario (the founder) has a phd in biophysics, so I assume that’s why they mention biological weapons so much - it’s probably one of the things he fears the most?
conradkay 7 hours ago [-]
Going off the recent biography of Demis Hassabis (CEO/co-founder of Deepmind, jointly won the Nobel Prize in Chemistry) it seems like he's very concerned about it as well
6 hours ago [-]
SubiculumCode 5 hours ago [-]
It is not scaremongering.
charcircuit 10 minutes ago [-]
Equating the ability to make weapons as something to be scared about it scaremongering.
nonameiguess 7 hours ago [-]
Surely more than 10% of the time consumed by going to market with a cancer treatment is giving it to living organisms and waiting to see what happens, which can't be made any faster with software. That's not to say speedups can't happen, but 90% can't happen.
Not that that justifies doom and gloom, but there is a pretty inescapable assymetry here between weaponry and medicine. You can manufacture and blast every conceivable candidate weapon molecule at a target population since you're inherently breaking the law anyway and don't lose much if nothing you try actually works.
Though I still wonder how much of this worry is sci-fi scenarios imagined by the underinformed. I'm not an expert by any means, but surely there are plenty of biochemical weapons already known that can achieve enormous rates of mass death pleasing to even the most ambitious terrorist. The bottleneck to deployment isn't discovering new weapons so much as manufacturing them without being caught or accidentally killing yourself first.
SubiculumCode 5 hours ago [-]
It is easier to destroy than it is to protect or fix, as a general rule of the universe. I would not feel so confident about the speed of the testing loop keeping things in check.
cyanydeez 7 hours ago [-]
[flagged]
dang 4 hours ago [-]
Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.
Let's fast forward the clock. Does software security converge on a world with fewer vulnerabilities or more? I'm not sure it converges equally in all places.
My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones.
It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma.
mlinsey 8 hours ago [-]
I'm pretty optimistic that not only does this clean up a lot of vulns in old code, but applying this level of scrutiny becomes a mandatory part of the vibecoding-toolchain.
The biggest issue is legacy systems that are difficult to patch in practice.
qingcharles 6 hours ago [-]
I could see some of these corps now being able to issue more patches for old versions of software if they don't have to redirect their key devs onto prior code (which devs hate). As you say though, in practice it is hard to get those patches onto older devices.
I'm looking at you, Android phone makers with 18 months of updates.
phist_mcgee 5 hours ago [-]
Yeah but who pays the enormous cost?
wslh 7 hours ago [-]
I imagine that some levels of patching would be improving as well, even as a separate endeavor. This is not to say that legacy systems could be completely rewritten.
pipo234 8 hours ago [-]
Wait. Wasn't AI supposed to alleviate the burden of legacy code?!
mlinsey 8 hours ago [-]
If we have the source and it's easy to test, validate, and deploy an update - AI should make those easier to update.
I am thinking of situations where one of those aren't true - where testing a proposed update is expensive or complicated, that are in systems that are hard to physically push updates to (think embedded systems) etc
rattlesnakedave 8 hours ago [-]
Legacy code, not the running systems powered by legacy code
buzzerbetrayed 6 hours ago [-]
If you’re still an AI skeptic at this point, I don’t know what sort of advancement could convince you that this is happening.
timschmidt 8 hours ago [-]
Most vulnerabilities seem to be in C/C++ code, or web things like XSS, unsanitized input, leaky APIs, etc.
Perhaps a chunk of that token spend will be porting legacy codebases to memory safe languages. And fewer tokens will be required to maintain the improved security.
torginus 8 hours ago [-]
I think most vulnerabilities are in crappy enterprise software. TOCTOU stuff in the crappy microservice cloud app handling patient records at your hospital, shitty auth at a webshop, that sort of stuff.
A lot of these stuff is vulnerable by design - customer wanted a feature, but engineering couldnt make it work securely with the current architecture - so they opened a tiny hole here and there, hopefully nobody will notice it, and everyone went home when the clock struck 5.
I'm sure most of us know about these kinds of vulnerabilities (and the culture that produces them).
Before LLMs, people needed to invest time and effort into hacking these. But now, you can just build an automated vuln scanner and scan half the internet provided you have enough compute.
I think there will be major SHTF situations coming from this.
timschmidt 7 hours ago [-]
Yeah. Crufty cobbled together enterprise stuff will suffer some of the worst. But this will be a great opportunity for the enterprise software services economy! lol.
I honestly see some sort of automated whole codebase auditing and refactoring being the next big milestone along the chatbot -> claude code / codex / aider -> multi-agent frameworks line of development. If one of the big AI corps cracks that problem then all this goes away with the click of a button and exchange of some silver.
lilytweed 8 hours ago [-]
I think we’re starting to glimpse the world in which those individuals or organizations who pigheadedly want to avoid using AI at all costs will see their vulnerabilities brutally exploited.
RALaBarge 3 hours ago [-]
Botnet city: where everyone's a botnet, and the DDoS dont matter!
woeirua 8 hours ago [-]
Yep, it's this. The laggards are going to get brutally eviscerated. Any system connected to the internet is going to be exploited over the next year unless security is taken very seriously.
skejeke 5 hours ago [-]
lol and what about the vibe coders?
You people are comical. Why do you feel the need to create so much hype around what you say? Did you not get enough attention as a kid?
woeirua 4 hours ago [-]
The vibe coders will be fine. They’ll use LLMs to red team their code.
3 hours ago [-]
skejeke 4 hours ago [-]
What a load of nonsense.
socketcluster 5 hours ago [-]
I suspect it will converge on minimal complexity software. Current software is way too bloated. Unnecessary complexity creates vulnerabilities and makes them harder to patch.
rachel_rig 7 hours ago [-]
You'd think they would have used this model to clean up Claude's own outage issues and security issues. Doesn't give me a lot of faith.
pants2 8 hours ago [-]
Software security heavily favors the defenders (ex. it's much easier to encrypt a file than break the encryption). Thus with better tools and ample time to reach steady-state, we would expect software to become more secure.
justincormack 8 hours ago [-]
Software security heavily favours the attacker (ex. its much easier to find a single vulnerability than to patch every vulnerability). Thus with better tools and ample time to reach steady-state, we would expect software to remain insecure.
pants2 7 hours ago [-]
If we think in the context of LLMs, why is it easier to find a single vulnerability than to patch every vulnerability? If the defender and the attacker are using the same LLM, the defender will run "find a critical vulnerability in my software" until it comes up empty and then the attacker will find nothing.
Defenders are favored here too, especially for closed-source applications where the defender's LLM has access to all the source code while the attacker's LLM doesn't.
dist-epoch 6 hours ago [-]
You also need to deploy the patch. And a lot of software doesn't have easy update mechanisms.
A fix in the latest Linux kernel is meaningless if you are still running Ubuntu 20.
conradkay 7 hours ago [-]
That generally makes sense to me, but I wonder if it's different when the attacker and defender are using the same tool (Mythos in this case)
Maybe you just spend more on tokens by some factor than the attackers do combined, and end up mostly okay. Put another way, if there's 20 vulnerabilities that Mythos is capable of finding, maybe it's reasonable to find all of them?
"Most security tooling has historically benefitted defenders more than attackers. When the first software fuzzers were deployed at large scale, there were concerns they might enable attackers to identify vulnerabilities at an increased rate. And they did. But modern fuzzers like AFL are now a critical component of the security ecosystem: projects like OSS-Fuzz dedicate significant resources to help secure key open source software.
We believe the same will hold true here too—eventually. Once the security landscape has reached a new equilibrium, we believe that powerful language models will benefit defenders more than attackers, increasing the overall security of the software ecosystem. The advantage will belong to the side that can get the most out of these tools. In the short term, this could be attackers, if frontier labs aren’t careful about how they release these models. In the long term, we expect it will be defenders who will more efficiently direct resources and use these models to fix bugs before new code ever ships.
"
fsflover 7 hours ago [-]
This is only true if your approach is security through correctness. This never works in practice. Try security through compartmentalization. Qubes OS provides it reasonably well.
tptacek 7 hours ago [-]
I don't think this is broadly true and to the extent it's true for cryptographic software, it's only relatively recently become true; in the 2000s and 2010s, if I was tasked with assessing software that "encrypted a file" (or more likely some kind of "message"), my bet would be on finding a game-over flaw in that.
intended 7 hours ago [-]
This came across as so confident that I had a moment of doubt.
It is most definitely an attackers world: most of us are safe, not because of the strength of our defenses but the disinterest of our attackers.
Herring 7 hours ago [-]
There are plenty of interested attackers who would love to control every device. One is in the white house, for example.
tdaltonc 6 hours ago [-]
Depends - do you think people are good at keeping their fridge firmware up-to-date?
Gigachad 5 hours ago [-]
I’m good at keeping my fridge off the internet.
SoftTalker 5 hours ago [-]
You are the exception.
rpcope1 2 hours ago [-]
Maybe we'll wake up and realize that putting WiFi and stupid "cloud enabled" Internet of Shit hardware into everything was an absolutely terrible idea.
cyanydeez 7 hours ago [-]
I'm more curious as to just how fancy we can make our honey pots. These bots arn't really subtle about it; they're used as a kludge to do anything the user wants. They make tons of mistakes on their way to their goals, so this is definitely not any kind of stealthy thing.
I think this entire post is just an advertisement to goad CISOs to buy $package$ to try out.
josephg 5 hours ago [-]
To be clear, we don’t know that this tool is better at finding bugs than fuzzing. We just know that it’s finding bugs that fuzzing missed. It’s possible fuzzing also finds bugs that this AI would miss.
nextos 5 hours ago [-]
Different methods find different things. Personally, I'd rather use a language that is memory safe plus a great static analyzer with abstract interpretation that can guarantee the absence of certain classes of bugs, at the expense of some false positives.
The problem is that these tools, such as Astrée, are incredibly expensive and therefore their market share is limited to some niches. Perhaps, with the advent of LLM-guided synthesis, a simple form of deductive proving, such as Hoare logic, may become mainstream in systems software.
underdeserver 4 hours ago [-]
I would suggest watching Nicholas Carlini's talk and Heather Adkins and Four Flynn's talks from unprompted:
My takeaway is that fuzzing is not just complementary, it also gives a stronger AI a starting point. But AI is generally faster and better.
tptacek 18 minutes ago [-]
This is obviously just cope (there's a long, strong-form argument for why LLM-agent vulnerability research is plausibly much more potent than fuzzing, but we don't have to reach it because you can dispose of the whole argument by noting that agents can build and drive fuzzers and triage their outputs), but what I'd really like to understand better is why? What's the impetus to come up with these weird rationalizations for why it's not a big deal that frontier models can identify bugs everyone else missed and then construct exploits for them?
ComplexSystems 4 hours ago [-]
This line of reasoning makes no sense when the AI can just be given access to a fuzzer. I would guess that it probably did have access to a fuzzer to put together some of these vulnerabilities.
acdha 3 hours ago [-]
Carlini talked about that a fair amount in the context of pairing the two: e.g. many protocols are challenging for fuzzers because they have something like a checksum or signature but LLMs are good at coming up with harnesses for things like that. I’m sure that we’re going to see someone building an integrated fuzzer soon which tries to do things like figure out how to get a particular branch to follow an unexercised path.
kristofferR 5 hours ago [-]
AI can initate the fuzzing and optimize the process of fuzzing.
ssgodderidge 8 hours ago [-]
At the very bottom of the article, they posted the system card of their Mythos preview model [1].
In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.
> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.
I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.
Typical Dario marketing BS to get everyone thinking Anthropic is on the verge of AGI and massaging the narrative that regular people can't be trusted with it.
airstrike 6 hours ago [-]
I mean it's so obvious at this point and yet everyone falls from it every month. There's an IPO coming, everyone.
mgambati 4 hours ago [-]
It’s funny how you train a machine to mimic human behavior then marketing team decides to promote it “Look! It’s human! Look how it thinking about existence!” while a huge percentage of humanity produced content is exactly about the uncertainty of human existence and that got used to train the model.
gck1 2 hours ago [-]
I chuckle every time <insert any LLM company here> says something in line of "the model is so good that we won't release it to general public, ekhm, because safety".
Because the exact same thing has been said on every single upcoming model since GPT 3.5.
At this point, this must be an inside joke to do this just because.
ilaksh 5 hours ago [-]
I think that basically they trained a new model but haven't finished optimizing it and updating their guardrails yet. So they can feasibly give access to some privileged organizations, but don't have the compute for a wide release until they distill, quantize, get more hardware online, incorporate new optimization techniques, etc. It just happens to make sense to focus on cybersecurity in the preview phase especially for public relations purposes.
It would be nice if one of those privileged companies could use their access to start building out a next level programming dataset for training open models. But I wonder if they would be able to get away with it. Anthropic is probably monitoring.
SheinhardtWigCo 8 hours ago [-]
Society is about to pay a steep price for the software industry's cavalier attitude toward memory safety and control flow integrity.
titzer 6 hours ago [-]
It's partly the industry and it's partly the failure of regulation. As Mario Wolczko, my old manager at Sun says, nothing will change until there are real legal consequences for software vulnerabilities.
That said, I have been arguing for 20+ years that we should have sunsetted unsafe languages and moved away from C/C++. The problem is that every systemsy language that comes along gets seduced by having a big market share and eventually ends up an application language.
I do hope we make progress with Rust. I might disagree as a language designer and systems person about a number of things, but it's well past time that we stop listening to C++ diehards about how memory safety is coming any day now.
doug_durham 5 hours ago [-]
I think society is going to start paying the price for humans being human. As the paper points out there is a lot of good faith, serious software that has vulnerabilities. These aren't projects you would characterize as people being cavalier. It is simply beyond the limits of humans to create vulnerability-free software of high complexity. That's why high reliability software depends on extreme simplicity and strict tools.
socketcluster 4 hours ago [-]
100%, poorly architected software is really difficult to make secure. I think this will extend to AI as well. It will just dial up the complexity of the code until bugs and vulnerabilities start creeping in.
At some point, people will have to decide to stop the complexity creep and try to produce minimal software.
For any complex project with 100k+ lines of code, the probability that it has some vulnerabilities is very high. It doesn't fit into LLM context windows and there aren't enough attention heads to attend to every relevant part. On the other hand, for a codebase which is under 1000 lines, you can be much more confident that the LLM didn't miss anything.
Also, the approach of feeding the entire codebase to an LLM in parts isn't going to work reliably because vulnerabilities often involve interactions between different parts of the code. Both parts of the code may look fine if considered independently but together they create a vulnerability.
Good architecture is critical now because you really need to be able to have the entire relevant context inside the LLM context window... When considering the totality of all software, this can only be achieved through an architecture which adheres to high cohesion and loose coupling principles.
doug_durham 2 hours ago [-]
I'm not even talking about poorly architected software. They are finding vulnerabilities in incredibly well-engineered software. The Linux kernel is complex not because it's poorly written. It's complex because of all the things it needs to do. Rhat makes it beyond the ability of a human to comprehend and reliably work with it.
red75prime 3 hours ago [-]
> It doesn't fit into LLM context windows and there aren't enough attention heads to attend to every relevant part.
That's for one pass. And that pass can produce a summary of what the code does.
staticassertion 1 hours ago [-]
> These aren't projects you would characterize as people being cavalier.
I probably would. You mentioned the linux kernel, which I think is a perfect example of software that has had a ridiculous, perhaps worst-in-class attitude towards security.
torginus 7 hours ago [-]
Thank god, finally someone said it.
I don't know the first thing about cybersecurity, but in my experience all these sandbox-break RCEs involve a step of highjacking the control flow.
There were attempts to prevent various flavors of this, but imo, as long as dynamic branches exist in some form, like dlsym(), function pointers, or vtables, we will not be rid of this class of exploit entirely.
The latter one is the most concerning, as this kind of dynamic branching is the bread and butter of OOP languages, I'm not even sure you could write a nontrivial C++ program without it. Maybe Rust would be a help here? Could one practically write a large Rust program without any sort of branch to dynamic addresses? Static linking, and compile time polymorphism only?
2 hours ago [-]
tptacek 7 hours ago [-]
Everybody has been saying this for the last 15 years.
titzer 6 hours ago [-]
We're going to have to put all the bad code into a Wasm sandbox.
lyzaer 41 minutes ago [-]
[dead]
temp123789246 6 hours ago [-]
OpenAI initially claimed that GPT-2 was too dangerous to release in 2019.
How many times will labs repeat the same absurd propaganda?
uselessTA 2 hours ago [-]
The claim I remember was that releasing it would start an arms race for AGI, which I think it clearly did
SubiculumCode 5 hours ago [-]
Anthropic and OpenAI have very different cultures and ethos. Point to other times where anthropic has gone the way of cheap marketing tricks. Now look at openAI. Not even close.
trevorm4 5 hours ago [-]
Anthropic has done plenty of cheap marketing tricks as of late, see their recent non-functional C compiler that relied on a harness using gcc's entire test suite
bwfan123 2 hours ago [-]
Not surprising given that they dont even know why claude-code works as before or doesnt work [1] ie, there is no known theory of operation. Explains why they are afraid of it.
One of the things I'm always looking at with new models released is long context performance, and based on the system card it seems like they've cracked it:
GraphWalks BFS 256K-1M
Mythos Opus GPT5.4
80.0% 38.7% 21.4%
radicality 2 hours ago [-]
Huh, I don’t know what “long context performance” means exactly in these tests, so completely anecdotally
, my experience with gpt5.4 via codex cli vs Claude code opus, gpt5.4 seems to do significantly better in long contexts I think partly due to some special context compaction stored in encrypted blobs. On long conversations opus in Claude code will for me lose memory of what we were working on earlier, whereas one of my codex chats is already at >1B tokens and is still very coherent and remembers things I asked of it at the beginning of the convo.
pertymcpert 18 minutes ago [-]
This isn’t talking about compaction. This refers to performance as the model is loaded with 500k to 1m tokens.
If true, the SWE bench performance looks like a major upgrade.
7 hours ago [-]
himata4113 8 hours ago [-]
this seems to be similar to gpt-pro, they just have a very large attention window (which is why it's so expensive to run) true attention window of most models is 8096 tokens.
thegeomaster 7 hours ago [-]
What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.
appcustodian2 6 hours ago [-]
source on the 8096 tokens number? i'm vaguely aware that some previous models attended more to the beginning and end of conversations which doesn't seem to fit a simple contiguous "attention window" within the greater context but would love to know more
lyzaer 41 minutes ago [-]
[dead]
frog437 8 hours ago [-]
[flagged]
stephc_int13 6 hours ago [-]
I think this is bad news for hackers, spyware companies and malware in general.
We all knew vulnerabilities exist, many are known and kept secret to be used at an appropriate time.
There is a whole market for them, but more importantly large teams in North Korea, Russia, China, Israel and everyone else who are jealously harvesting them.
Automation will considerably devalue and neuter this attack vector.
Of course this is not the end of the story and we've seen how supply chain attacks can inject new vulnerabilities without being detected.
I believe automation can help here too, and we may end-up with a considerably stronger and reliable software stack.
tptacek 6 hours ago [-]
I don't think it matters one way or the other to your thesis but I'm skeptical that state-level CNE organizations were hoarding vulnerabilities before; my understanding is that at least on the NATO side of the board they were all basically carefully managing an enablement pipeline that would have put them N deep into reliable exploit packages, for some surprisingly small N. There are a bunch of little reasons why the economics of hoarding aren't all that great.
stephc_int13 5 hours ago [-]
The economics would be different in say, North Korea, don't you think?
tptacek 4 hours ago [-]
Why? What do you mean?
agrishin 8 hours ago [-]
>>> the US and its allies must maintain a decisive lead in AI technology. Governments have an essential role to play in helping maintain that lead, and in both assessing and mitigating the national security risks associated with AI models. We are ready to work with local, state, and federal representatives to assist in these tasks.
How long would it take to turn a defensive mechanism into an offensive one?
SheinhardtWigCo 8 hours ago [-]
In this case there is almost no distinction. Assuming the model is as powerful as claimed, someone with access to the weights could do immense damage without additional significant R&D.
SubiculumCode 5 hours ago [-]
Yes, I can see this as non releasable for national security reasons in the China geopolitical competition. Securing our software against threats while having immense infiltration ability against enemy cyber security targets....not to mention, the ability to implant new, but even more subtle vulnerabilities into open software not generally detectable by current AI to provide covert action.
SuperHeavy256 7 hours ago [-]
Which will eventually happen no matter what. That's why it's important to start preparing now.
meander_water 4 hours ago [-]
I think this is a largely inflated PR stunt.
Opus 4.6 was already capable of finding 0days and chaining together vulns to create exploits. See [0] and [1].
Absolutely not a PR stunt, talk to one of your friends working at partner companies with access to the model
pertymcpert 20 minutes ago [-]
Did you read the article?
josh-sematic 7 hours ago [-]
Must be nice to be in a position to sell both disease and cure.
tptacek 7 hours ago [-]
That's exactly not what they're doing. They aren't creating operating system vulnerabilities. They're telling you about ones that already existed.
josh-sematic 3 hours ago [-]
Mythos aside, frontier LLMs can already be used to find exploits at faster pace than humans alone. Whether that knowledge gets used to patch them or exploit them is dependent on the user. Cybersecurity has always been an arms race and LLMs are rapidly becoming powerful arms. Whether they like it or not LLM providers are now important dealers in that arms race. I appreciate Anthropic trying to give “good guys” a leg up (if that is indeed their real main motivation which I do find credible but not certain). But it’s still a scary world we’re entering and I doubt the fierce competition will leave all labs acting benevolently.
conradkay 7 hours ago [-]
Well, in a slightly indirect manner. Claude is writing a ton of code, and therefore creating a lot of security vulnerabilities.
tptacek 7 hours ago [-]
That's not what's happening here. This announcement is about the velocity with which Claude finds vulnerabilities in already-existing software.
buzzerbetrayed 6 hours ago [-]
Software already exists that has been written by Claude. They absolutely are selling the means to write software, and the means to securing the insecure software. At least for the time being. In the future Mythos will probably just make it possible to prompt good software from the start.
stale2002 5 hours ago [-]
Ok. But mostly its entirely the old software, not the new software, that the bugs are being found in.
pilgrim0 2 hours ago [-]
Maybe because there’s no critical and widely used software written by LLMs so far? Which says a lot about LLMs are failing to even approach the level of capabilities you would expect from all the hype? The goal has always been, even before LLMs, to find something smarter than our smarter humans. So far the success at that is really minuscule. Humans are still the benchmark, all things considered. Now they’re saying LLMs are going to be better than our best vulnerability researchers in a few months (literally what an Anthropic researcher said in a conference). Ok, that might happen. But the funny part is that the LLMs will definitely be the ones writing most of these vulnerabilities. So, to hedge against LLMs you must use LLMs. And that is gonna cost you more.
akerl_ 2 hours ago [-]
So today, most of the vulnerabilities being found by these tools are in code written by humans. Your hypothesis is that down the road, most of the vulnerabilities will be in code written by LLMs.
What seems more probable is that the same advances that LLMs are shipping to find vulnerabilities will end up baked into developer tooling. So you'll be writing code and using an LLM that knows how to write secure code.
FergusArgyll 3 hours ago [-]
I don't think claude wrote openbsd but to be honest that was before my time so I'm not sure
blazespin 5 hours ago [-]
Dario is big on beating china, and no doubt he believes cyber security is how to do that. You can tell, but anthropic is sht at everything else. Nobody uses it for real research.
supern0va 7 hours ago [-]
Yeah, I'd pretty pissed at my doctor for finding cancerous cells that probably wouldn't have been a problem for quite some time, either. Ignorance is bliss, security through obscurity, whatever.
tredre3 4 hours ago [-]
The doctor analogy is more like you're grateful that your doctor found cancerous cells before they became a problem, but at the same time his other business is selling cigarettes.
spprashant 2 hours ago [-]
We final have the answer to the question, when do these labs stop giving away intelligence to the general public for $20 a month?
Selling shovels in now worth less than taking all the gold for themselves.
simonw 6 hours ago [-]
I buy the rationale for this. There's been a notable uptick over the past couple of weeks of credible security experts unrelated to Anthropic calling the alarm on the recent influx of actually valuable AI-assisted vulnerability reports.
> On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us.
> And we're now seeing on a daily basis something that never happened before: duplicate reports, or the same bug found by two different people using (possibly slightly) different tools.
> The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good.
> I'm spending hours per day on this now. It's intense.
> Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us.
> Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real.
I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone?
aurareturn 6 hours ago [-]
Let them all live. This is going to blow up one thread if you merge them.
HPMOR 6 hours ago [-]
I think merging them into either this thread, or the System Card makes the most sense to me.
zachperkel 8 hours ago [-]
Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.
Scary but also cool
ex-aws-dude 4 hours ago [-]
Did someone actually go through all of those and check if they are high-severity or did the AI just tell them that?
fsflover 7 hours ago [-]
Every piece of software definitely has serious vulnerabilities, perfection is not achievable. Fortunately we have another approach to security: security through compartmentalization. See: https://qubes-os.org
syndeo 4 hours ago [-]
Once you get the compartmentalization working well, and “all” of the vulnerabilities are out of it too, of course…
But even then you’ll have users putting things in the same compartment for convenience, rather than leaving them properly sequestered.
dakolli 7 hours ago [-]
Or more likely, its just an exaggeration or lie.
pertymcpert 16 minutes ago [-]
What evidence makes you say that? Do you have insider info?
solenoid0937 2 hours ago [-]
Yes I'm sure this is all a massive conspiracy by the many companies that are making statements alongside Anthropic
rainbow13 25 minutes ago [-]
[dead]
Ryan5453 8 hours ago [-]
Pricing for Mythos Preview is $25/$125, so cheaper than GPT 4.5 ($75/$150) and GPT 5.4 Pro ($30/$180)
conradkay 7 hours ago [-]
For comparison, 5x the cost of Opus 4.6, and 1.67x for Opus 4.1
I think this would be very heavily used if they released it, completely unlike GPT 4.5
adi_kurian 7 hours ago [-]
Opus 4 & 4.1 are still on Vertex+Bedrock @ $75/1mm out. They were used very heavily and in my subjective opinion are better than 4.5 and 4.6.
breakingcups 6 hours ago [-]
Interesting, what makes them better to you?
adi_kurian 2 hours ago [-]
Opus 4, with enough context, could do most all I wanted in a single shot. More often than not, when I had a bad outcome and was frustrated I would realize that I was the problem (in giving improper direction or missing key context).
I also was in a pretty sweet position having a boat load of credits and premo vertex rate limits so I could 'afford' to dump hundreds of thousands of tokens in context all day.
With Opus 4.5 and 4.6, I find I have to steer very actively.
This is comparing using Opus 4 directly rather than comparing the performance of the models in Claude Code for example, or any 'agentic' setup.
Kinda reminds me of 4o vs 4-turbo.
I would imagine they are smaller models.
cassianoleal 8 hours ago [-]
Where did you get that from?
From TFA:
> We do not plan to make Claude Mythos Preview generally available
Tiberium 8 hours ago [-]
From the article:
> Anthropic’s commitment of $100M in model usage credits to Project Glasswing and additional participants will cover substantial usage throughout this research preview. Afterward, Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens (participants can access the model on the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry).
underdeserver 8 hours ago [-]
Key point: available to participants.
conradkay 7 hours ago [-]
permanent underclass has arrived :(
philipwhiuk 3 hours ago [-]
give it a couple months
bredren 7 hours ago [-]
Can anyone point at the critical vulnerabilities already patched as a result of mythos? (see 3:52 in the video)
For example, the 27 year old openbsd remote crash bug, or the Linux privilege escalation bugs?
I know we've had some long-standing high profile, LLM-found bugs discussed but seems unlikely there was speculation they were found by a previously unannounced frontier model.
These links are from the more-detailed 'Assessing Claude Mythos Preview’s cybersecurity capabilities' post released today https://red.anthropic.com/2026/mythos-preview/, which includes more detail on some of the public/fixed issues (like the OpenBSD one) as well as hashes for several unreleased reports and PoCs.
qingcharles 6 hours ago [-]
That OpenBSD one is exactly the kind of bug that easily slips past a human. Especially as the code worked perfectly under regular circumstances.
Looks like they've been approaching folks with their findings for at least a few weeks before this article.
taupi 8 hours ago [-]
Part of me wonders if they're not releasing it for safety reasons, but just because it's too expensive to serve. Why not both?
coffeebeqn 7 hours ago [-]
If these numbers are correct it’s probably worth the extra price
Miraste 8 hours ago [-]
>We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview2.
This seems like the real news. Are they saying they're going to release an intentionally degraded model as the next Opus? Big opportunity for the other labs, if that's true.
SheinhardtWigCo 6 hours ago [-]
The other labs already censor their models. Everyone is trying to find the sweet spot where performance and ‘alignment’ are both maximized. This seems no different
wslh 6 hours ago [-]
> Big opportunity for the other labs, if that's true.
It sounds like this is considered military grade technology as cryptography in the 90s. The big difference is it's very expensive to create, and run those models. It's not about the algorithm. If the story rhymes it could be a big opportunity to other regions in the world.
zb3 7 hours ago [-]
Well since Anthropic treats us as second class evil citizens, I guess they don't want our evil money either.
Sol- 7 hours ago [-]
I don't want to be overly cynical and am in general in favor of the contrarian attitude of simply taking people at their word, but I wonder if their current struggles with compute resources make it easier for them to choose to not deploy Mythos widely. I can imagine their safety argument is real, but regardless, they might not have the resources to profitably deploy it. (Though on the other hand, you could argue that they could always simply charge more.)
rishabhaiover 7 hours ago [-]
I would have not believed your argument 3 months ago but I strongly suspect Anthropic actively engages in model quality throttling due to their compute constraints. Their recent deal for multi GWs worth of data center might help them correct their approach.
conradkay 7 hours ago [-]
For what it's worth Anthropic explicity denies that. "To state it plainly: We never reduce model quality due to demand, time of day, or server load"
It's very interesting to me how widespread this conception is. Maybe it's as simple as LLM productivity degrading over time within a project, as slop compounds.
Or more recently since they added a 1m context window, maybe people are more reckless with context usage
rishabhaiover 2 hours ago [-]
It has nothing to do with the context window. Reasoning brought measured approaches grounded with actual tool calls. All of that short-circuits into a quick fix approach that is unlike Opus-4.5 or 4.6. Sonnet-4.5 used to do that. My context window is always < 200K.
irthomasthomas 3 hours ago [-]
That still leaves open the possibility that they reduce model quality due to profit. ;p
wilson090 7 hours ago [-]
Inference is where they make the money they spend on training, so this feels unlikely. Perhaps this does not true for Mythos though
if someone sends you a malicious file that uses a rare codec and you open it, you will trigger this codepath that is not widely used and don't get a lot of scrutiny
nitwit005 4 hours ago [-]
A prior bug discussed here was against a file format only used by specific 1990s Lucas Arts adventure games titles. Obscure enough that discussion of the bug report itself was the only search results. Your video player is unlikely to even attempt to open that.
l5agh 4 hours ago [-]
This was the top comment and it is suddenly flagged for no reason at all. It looks like meta-flagging, where people just want to hide replies to the comment they do not want you to read.
The amount of astroturfing and astroflagging in Anthropic threads is insane.
rlopc 4 hours ago [-]
These issues are always found in the same kinds of projects that support an insane amount of largely unused protocols and features like ffmpeg, sudo, curl.
OpenBSD has many unexplored corners and also (irresponsibly IMO) maintains forks of other projects in base.
A motivated human could find all of these probably by writing 100% code coverage and fuzzing.
The market for these tools is very small. Good luck applying them to a release of sqlite or postfix.
I don't understand how people here are hyping this up, unless they work for AI related companies as probably 80% of them do. People have found these issues for decades without AI. Sure, you can generate fuzzing code and find one or two issues in the usual suspects. Better do it manually and understand your own code.
kranke155 5 hours ago [-]
It’s insane. This is what - could we say it’s beyond AGI at least in cybersecurity? This is a real wake up call. On some of this stuff, the AI’s “uneven intelligence” is becoming absurdly high at its local peaks.
boring-human 5 hours ago [-]
> could we say it’s beyond AGI at least in cybersecurity?
AGI is like the Holy Grail. Either in the Arthurian Hero's Journey sense, or in the sense of having been a myth all along.
mjmas 5 hours ago [-]
Limiting it to the area of cybersecurity is by definition not general.
NullHypothesist 5 hours ago [-]
Perhaps "ASI" is the better acronym here
fragmede 4 hours ago [-]
Which S are you thinking of here?
devmor 4 hours ago [-]
Please stop using terms you don’t understand like “AGI” because you feel overwhelmed by something doing cool stuff. It’s exhausting.
skerit 5 hours ago [-]
I'm sure it'll be better than Opus 4.6, but so much of this seems hype. Escaping its sandbox, having to do "brain scans" because it's "hiding its true intent", bla bla bla.
If it manages to work on my java project for an entire day without me having to say "fix FQN" 5 times a day I'll be surprised.
underdeserver 7 hours ago [-]
Interesting also is what they didn't find, e.g. a Linux network stack remote code execution vulnerability. I wonder if Mythos is good enough that there really isn't one.
paoliniluis 2 hours ago [-]
Does everyone agrees that this makes Dario Amodei more powerful than any politician across the world? Anthropic is now the owner of the most powerful cyberweapon ever made
jFriedensreich 7 hours ago [-]
The only thing reassuring is the Apache and Linux foundation setups. Lets hope this is not just an appeasing mention but more fundamental. If there are really models too dangerous to release to the public, companies like oracle, amazon and microsoft would absolutely use this exclusive power to not just fix their holes but to damage their competitors.
pizlonator 7 hours ago [-]
It's messed up that Anthropic simultaneously claims to be a public benefit copro and is also picking who gets to benefit from their newly enhanced cybersecurity capabilities. It means that the economic benefit is going to the existing industry heavyweights.
(And no, the Linux Foundation being in the list doesn't imply broad benefit to OSS. Linux Foundation has an agenda and will pick who benefits according to what is good for them.)
I think it would be net better for the public if they just made Mythos available to everyone.
hector_vasquez 6 hours ago [-]
Releasing the model to bad actors at the same time as the major OS, browser, and security companies would be one idea. But some might consider that "messed up" too, whatever you mean by that. But in terms of acting in the public benefit, it seems consistent to work with companies that can make significant impact on users' security. The stated goal of Project Glasswing is to "secure the world's most critical software," not to be affirmative action for every wannabe out there.
pizlonator 6 hours ago [-]
I don't trust a corpo to choose what is "most critical".
That's what's messed up about it.
_1100 6 hours ago [-]
That is a fine stance to hold but some facts are still true regardless of your view on large businesses.
For example, it will benefit more people to secure Microsoft or Amazon services than it would be to secure a smaller, less corporate player in those same service ecosystems.
You could go on to argue that the second order effects of improving one service provider over another chooses who gets to play, but that is true whether you choose small or large businesses, so this argument devolves into “who are we to choose on behalf of others”.
Which then comes back to “we should secure what the market has chosen in order to provide the greatest benefit.”
pizlonator 20 minutes ago [-]
The longer term economic outcome of this is consolidation: large players get stronger, weak players get weaker.
That's not a good outcome for the economy.
6 hours ago [-]
thorncorona 6 hours ago [-]
Let's let the California HSR committee do it instead!
pizlonator 6 hours ago [-]
I'm too much of an anarchist for that.
I believe what I said:
> I think it would be net better for the public if they just made Mythos available to everyone.
lanternfish 5 hours ago [-]
10 Axios's within 5 days.
pizlonator 2 hours ago [-]
That was a supply chain issue.
The interesting thing about Mythos is its ability to find security vulns in software that has an uncompromised supply chain.
nightski 5 hours ago [-]
This is already happening. But not everyone has access to the tools to protect against it.
enraged_camel 5 hours ago [-]
Yeah, I'm unsure why the OP thinks that massive chaos would somehow be "better for the public."
2 hours ago [-]
5 hours ago [-]
nightski 5 hours ago [-]
This is not the only model. I assure you exploits are being found and taken advantage of without it, possibly even ones that this model is not even capable of detecting.
Sounds like people here are advocating a return to security through obscurity which is kind of ironic.
az226 6 hours ago [-]
You can release it with cyber capabilities refusal, they gets unlocked when you apply for approval.
tokioyoyo 6 hours ago [-]
Damned if you do, damned if you don’t. “Extremely capable model that can find exploits” has always been a fear, and the first company to release it in public will cause bloodbath. But also the first company that will prove itself.
SheinhardtWigCo 6 hours ago [-]
> picking who gets to benefit from their newly enhanced cybersecurity capabilities
You could say this about coordinated disclosure of any widespread 0-day or new bug class, though
pizlonator 6 hours ago [-]
That's a really good point!
But:
- Coordinated disclosure is ethically sketchy. I know why we do it, and I'm not saying we shouldn't. But it's not great.
- This isn't a single disclosure. This is a new technology that dramatically increases capability. So, even if we thought that coordinated disclosure was unambiguously good, then I think we'd still need to have a new conversation about Mythos
kenjackson 5 hours ago [-]
So private companies shouldn’t get to determine who they provide services to? Assuming no extremely malicious intent, I’d be fine if they said it was only going to McDonalds because the founders like Big Macs.
pizlonator 2 hours ago [-]
McDonalds isn't a public benefit corporation.
SheinhardtWigCo 6 hours ago [-]
Totally agree, it’s an uncomfortable compromise.
cedws 6 hours ago [-]
Not only companies, they're going to be taking applications from individual researchers. No doubt that it will only be granted to only established researchers, effectively locking out graduates and those early in their career. This is bad.
SheinhardtWigCo 6 hours ago [-]
They are not unique in this. Apple and Tesla have similar programs. More nuance is warranted here. They are trying to balance the need to enable external research with the need to protect users from arbitrary 3rd parties having special capabilities that could be used maliciously
cedws 6 hours ago [-]
I understand that, but Anthropic is doing nothing to throw those grassroots researchers a lifejacket. This is the beginning of the end for independents, if it continues on this trajectory then Anthropic gets to decide who lives and who dies. Who says they should be allowed to decide that?
solenoid0937 2 hours ago [-]
Why should unproven college students be given access to a cyber superweapon?
lelanthran 6 hours ago [-]
Or (and hear me out), they are close to an IPO and want to ensure that there is a world-ending threat around which they can cluster the biggest names, with themselves leading that group.
I think I just broke my cynicism meter :-(
cjkaminski 6 hours ago [-]
You might want to recalibrate your cynicism meter. As strange it might sound, most companies act according to their principles when the founding team is at the helm. The garbage policies tend to materialize once the company is purchased by, or merged into, another entity where the leadership doesn't care about the original aim of the organization. They just want "line go up".
Also, it makes sense that OpenAI feels the pressure of getting to an IPO because of their financial structure. I don't know whether or not Anthropic operates under a similar set of influences (meaning it could be either, I just don't know.)
baq 6 hours ago [-]
> It's messed up that Anthropic simultaneously claims to be a public benefit copro and is also picking who gets to benefit from their newly enhanced cybersecurity capabilities. It means that the economic benefit is going to the existing industry heavyweights.
It's messed up that the US Government simultaneously claims to be a public benefit and is also picking who gets to benefit from their newly enhanced nuclear capabilities.
-- someone in 1945, probably
pizlonator 6 hours ago [-]
I mean it was messed up, which is why the other world powers raced to develop their own capabilities.
And it remains messed up to this day - some countries get to be under their own nuclear umbrella, while others don't.
This kind of selective distribution of superpowers doesn't lead to great outcomes
baq 6 hours ago [-]
in that case in particular it led to 80 years of relatively calm geopolitics kinetically, all things considered. I'm not sure I want to live through an AI cold war, but it sure seems I don't get to choose.
pizlonator 6 hours ago [-]
> relatively calm geopolitics kinetically
Relative to what?
There's this trend in history that every hundred years there's a giant blow up, lots of violence, followed by peace.
It's likely that we would have had 80 years of relative calm due to that cycle even if nukes hadn't happened
baq 6 hours ago [-]
> Relative to what?
to WW1 and WW2.
pizlonator 2 hours ago [-]
History tells us that we would have had calm after WW2 even without nukes
SubiculumCode 6 hours ago [-]
That can simultaneously be true, but the best of bad options (if excluding destroying the model altogether). These models may prove quite dangerous. That they did this instead of selling their services to every company at a huge premium says a lot about Antheopic's culture.
jstummbillig 6 hours ago [-]
What? The economic benefit of system critical software not totally breaking in a few weeks goes to roughly everyone. In so far Apple/Google/MS/Linux Foundation economically benefit from being able to patch pressing critical software issues upfront (I am not even exactly sure what that is supposed to mean, it's not like anyone is going to use more or less Windows or Android if this happened any other way), that's a good thing for everyone and the economic benefits of that manifest for everyone.
titzer 7 hours ago [-]
In the long term, you're right, but in the short term, it's going to be a bloodbath.
Aperocky 6 hours ago [-]
That's assuming the model is actually as good as they say it is. Given the amount of AI researchers over the past 3 years claiming supernatural capability from the LLM they have built, my bayesian skepticism is through the roof.
baq 6 hours ago [-]
don't confuse bayesian skepticism with plain old contrarian bias. a true bayesian updates their priors, I'd say this is an appropriate time to do so. also don't confuse what they sell with what they have internally.
Aperocky 3 hours ago [-]
There haven't been any priors to update so far.
All LLMs got better for sure, but they are still definitively LLM and did not show any sign of having purpose. Which also made sense, because their very nature as statistical machines.
Sometimes quantity by itself lead to transformative change... but once, not twice, and that has already happened.
SubiculumCode 6 hours ago [-]
Anthropic has behaved the least like this of the AI companies.
Gigachad 5 hours ago [-]
They made a claim that 100% of code would be AI generated in a year, over a year ago.
SubiculumCode 5 hours ago [-]
That was a prediction. It was not a claim of their current capabilities. If that is the one you reach for then I feel my point has been made.
SpicyLemonZest 5 hours ago [-]
They were right, it's hit 100% at a number of large tech companies. (They missed their initial prediction of 90% 6 months ago, because the models then available publicly weren't capable enough.)
hn_acc1 5 hours ago [-]
Please tell me those companies so I can find alternatives. I'm using AI every day and there's no way I would trust it do that.
SpicyLemonZest 5 hours ago [-]
The transition is pretty complete at e.g. Google and Meta, IIUC. Definitely whoever builds the AI tools you're using every day isn't writing code by hand.
Gigachad 4 hours ago [-]
I really just don't believe it. I have not met anyone in tech who writes zero code now. The idea that no one at Google writes any code is such a huge claim it requires extraordinary evidence. Which none ever gets presented.
solenoid0937 1 hours ago [-]
Can confirm that basically no one at Google or Meta hand writes code outside extremely extremely niche projects
SpicyLemonZest 4 hours ago [-]
I'm surprised to hear that. One of us is in a bubble, and I'm genuinely not sure who. I have not met anyone in tech (including multiple people at Google) who does still write code. I've been recreationally interested in AI for a long time, which is a potential source of skew I suppose, but I do not and most people in my circles do not work on anything directly related to AI.
skejeke 4 hours ago [-]
So why aren’t they laying people off and pumping the extra money towards research efforts associated with Llm’s? Lmao.
They should all cut down their labour input right now if what you claim is true.
SpicyLemonZest 4 hours ago [-]
At many of the best tech companies, the conventional wisdom has always been that there's a huge backlog of stuff to be done. They don't want to deliver 100% of their roadmap with 50% of their employees, they want to deliver 200% of their roadmap with 100% of their employees. (And the speedup is not as high as these numbers imply for many kinds of performance, security, or correctness-critical software.)
Some companies like Block, Oracle, and Atlassian have indeed been laying people off.
skejeke 4 hours ago [-]
Lmao man this is absolute nonsense.
Google has done nothing but destroy value with many of its ‘bets’. Your roadmap stuff is irrelevant - if you don’t have value creating projects in the pipeline and/or labour is augmented you should be laying off - period. Sundar’s job is to maximise the stock price.
So once again - nonsense. Now stop spreading crap that clearly fills people with fear. I can tell you have no understanding of corporate finance and how the management of tech firms actually think these things through.
SpicyLemonZest 3 hours ago [-]
I'm spreading what people involved in management of tech firms have told me. Perhaps they were lying, but to me it seems consistent with what I observe in the news and in my personal capacity.
I'm also not quite sure your alternate theory is self-consistent. If Google has been frequently destroying value, and companies invariably lay people off when their projects aren't producing value, doesn't that mean they should have already been laying people off?
oytis 5 hours ago [-]
That's just in line with their ethics. They also maintain that countries other than the US should not have SOTA AI capabilities.
hmokiguess 6 hours ago [-]
While I agree with you, in some ways I'd argue that this is just them being transparent on what probably would inevitably already happen at the scale of these corporate overlords and modern monarchs.
There will always be a more capable technology in the hands of the few who hold the power, they're just sharing that with the world more openly.
Better security is a good thing, no a bad thing, regardless of which companies are more difficult to hack. Hemming and hawing over a clear and obvious good is silly.
malcolmgreaves 5 hours ago [-]
Not really. It’s a lot better than the anarchy of releasing it and having a bunch of bad people with money use it to break software that everyone’s lives depend on. Many technologies should be gate kept because they’re dangerous. Sometimes that’s permanent, like a nuclear weapon. Sometimes that’s temporary, like a new LLM that’s good at finding exploits. It can be released to the wider public once its potential for damage has been mitigated.
dragonelite 7 hours ago [-]
Queue in the "First time" meme.
Sateeshm 7 hours ago [-]
The bars have solid fill for Mythos and cross shaded for Opus 4.6. Makes the difference feel more than it actually is.
8 hours ago [-]
cryptoegorophy 3 hours ago [-]
Ironically Claude cli completely failed to detect a rogue code on my html scan yesterday while ChatGPT web version detected it immediately. Can’t wait to do same test with newer version.
modeless 5 hours ago [-]
I didn't see this at first, but the price is 5x Opus: "Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens", however "We do not plan to make Claude Mythos Preview generally available".
wanderingmind 2 hours ago [-]
So Mozilla is not part of this consortium, i'm guessing for deliberate reasons to make safari and chrome the default browsers. I don't think Firefox can survive the upcoming attacks, without robust support from foundational AI providers to secure the browser.
NickNaraghi 8 hours ago [-]
> Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.
Sounds like we've entered a whole new era, never mind the recent cryptographic security concerns.
anVlad11 8 hours ago [-]
So, $100B+ valuation companies get essentially free access to the frontier tools with disabled guardrails to safely red team their commercial offerings, while we get "i won't do that for you, even against your own infrastructure with full authorization" for $200/month.
Uh-huh.
SheinhardtWigCo 7 hours ago [-]
Yes, and that's normal. Coordinated disclosure is standard practice when the risk of public disclosure is unacceptable.
charcircuit 4 hours ago [-]
Risk for who? It feels unfair that the risk to myself is ignored "for the greater good of everyone else."
unethical_ban 8 hours ago [-]
I'm sympathetic to your point, but I'm sure there are heightened trust levels between the participating orgs and confidentiality agreements out the wazoo.
How does public Claude know you have "full authorization" against your own infra? That you're using the tools on your own infra? Unless they produce a front-end that does package signing and detects you own the code you're evaluating.
What has it stopped you from doing?
9cb14c1ec0 8 hours ago [-]
You can do pretty much anything you want with public claude if you self-report to it that you have been properly authorized.
8 hours ago [-]
3 hours ago [-]
dakolli 7 hours ago [-]
I guess we can throw out the idea that AGI is going to be democratized. In this case a sufficiently powerful model has been built and the first thing they do is only give AWS, Microsoft, Oracle ect ect access.
If AGI is going to be a thing its only going to be a thing, its only going to be a thing for fortune 100 companies..
However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.
supern0va 7 hours ago [-]
>However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.
Evaluate it yourself. Look at the exploits it discovered and decide whether you want to feel concerned that a new model was able to do that. The data is right there.
7 hours ago [-]
rvz 2 hours ago [-]
Well, Yes.
The research and testing of the model is always exclusively by their own model authors, meaning that it is not independent or verifiable and they want us to take their word for it, which we cannot - as they have an axe to grind against open weight models.
This is marketing wrapped around a biased research paper.
dist-epoch 6 hours ago [-]
The plan of Elon Musk for Macrohard is to replace all software companies with it, when they get AGI.
dakolli 2 hours ago [-]
Thankfully he will be long dead before that happens. But of course that's his goal. Elon despises expensive engineers, and he yearns to get revenge for them costing him so much money over the years by replacing them.
A tech billionaires biggest expensive has been his engineering line-item. They resent the workers who've collected a large percentage of their potential profits over the years, its their driving motivation, to crush all labor.
MisterBiggs 5 hours ago [-]
What happens once an agent can reliably get 100% on swebench?
zambelli 5 hours ago [-]
I'm glad to see that it stands its ground more than other models - which is a genuinely useful trait for an assistant. Both on technical and emotional topics.
picafrost 8 hours ago [-]
> Anthropic has also been in ongoing discussions with US government officials about Claude Mythos Preview and its offensive and defensive cyber capabilities. [...] We are ready to work with local, state, and federal representatives to assist in these tasks.
As Iran engages in a cyber attack campaign [1] today the timing of this release seems poignant. A direct challenge to their supply chain risk designation.
I really wanted to like anthropic. They seem the most moral, for real.
But at the core of anthropic seems to be the idea that they must protect humans from themselves.
They advocate government regulations of private open model use. They want to centralize the holding of this power and ban those that aren't in the club from use.
They, like most tech companies, seem to lack the idea that individual self-determination is important. Maybe the most important thing.
dralley 7 hours ago [-]
That is unequivocally true with some things. You don't want people exercising their "self-determination" to own private nukes.
throwaway13337 6 hours ago [-]
LLMs aren't nukes.
They're more like printing presses or engines. A great potential for production and destruction.
At their invention, I'm sure some people wanted to ensure only their friends got that kind of power too.
I wonder the world we would live in if they got their way.
caycep 5 hours ago [-]
When do we get our Kuang Grade Mark Eleven icebreaker?
manbash 5 hours ago [-]
This will likely not see the light of day. It's the usual PR that gathers many "partnerships".
Expect to see lots of these in the upcoming months as the big companies scramble to keep from losing money.
baddash 8 hours ago [-]
> security product
> glass in the name
pugworthy 6 hours ago [-]
I had a team mate propose a new security layer for an industrial device which he wanted to call "Eggshell"
evanmoran 4 hours ago [-]
We shall call it Achilles, as Claude Mythos is its only weakness.
endunless 8 hours ago [-]
Another Anthropic PR release based on Anthropic’s own research, uncorroborated by any outside source, where the underlying, unquestioned fact is that their model can do something incredible.
> AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities
I like Anthropic, but these are becoming increasingly transparent attempts to inflate the perceived capability of their products.
NitpickLawyer 8 hours ago [-]
We'll find out in due time if their 0days were really that good. Apparently they're releasing hashes and will publish the details after they get patched. So far they've talked about DoS in OpenBSD, privesc in Linux and something in ffmpeg. Not groundbreaking, but not nothing either (for an allegedly autonomous discovery system).
While some stuff is obviously marketing fluff, the general direction doesn't surprise me at all, and it's obvious that with model capabilities increase comes better success in finding 0days. It was only a matter of time.
Maybe a bad example since Nicholas works at Anthropic, but they're very accomplished and I doubt they're being misleading or even overly grandiose here
See the slide 13 minutes in, which makes it look to be quite a sudden change
endunless 7 hours ago [-]
Very interesting, thanks for sharing.
> I doubt they're being misleading or even overly grandiose here
I think I agree.
We could definitely do much worse than Anthropic in terms of companies who can influence how these things develop.
bink 6 hours ago [-]
I watched the talk as well and it's very interesting. But isn't this just a buffer overflow in the NFS client code? The way the LLM diagnosed the flaw, demonstrated the bug, and wrote an exploit is cool and all, but doesn't this still come down to the fact that the NFS client wasn't checking bounds before copying a bunch of data into a fixed length buffer? I'm not sure why this couldn't have been detected with static analysis.
conradkay 5 hours ago [-]
I guess so, but there's a ton of buffer overflow vulnerabilities in the wild, and ostensibly it wasn't detected by static analysis
Cynicism always gets upvotes, but in this particular case, it seems fairly easy to verify if they're telling the truth? If Mythos really did find a ton of vulnerabilities, those presumably have been reported to the vendors, and are currently in the responsible nondisclosure period while they get fixed, and then after that we'll see the CVEs.
If a bunch of CVEs do in fact get published a couple months (or whatever) from now, are you going to retract this take? It's not like their claims are totally implausible: the report about Firefox security from last month was completely genuine.
endunless 7 hours ago [-]
> If a bunch of CVEs do in fact get published a couple months (or whatever) from now, are you going to retract this take?
I would like to think that I would, yes.
What it comes down to, for me, is that lately I have been finding that when Anthropic publishes something like this article – another recent example is the AI and emotions one – if I ask the question, does this make their product look exceptionally good, especially to a casual observer just scanning the headlines or the summary, the answer is usually yes.
This feels especially true if the article tries to downplay that fact (they’re not _real_ emotions!) or is overall neutral to negative about AI in general, like this Glasswing one (AI can be a security threat!).
maxmaio 6 hours ago [-]
seems important and terrifying. This morning Opus 4.6 was blowing my mind in claude code... onward and upward
kristofferR 5 hours ago [-]
This is pretty insane. A model so powerful they felt that releasing it would create a netsec tsunami if released publicly. AGI isn't here yet, but we don't need to get there for massive societal effects. How long will they hold off, especially as competitors are getting closer to their releases of equally powerful models?
charcircuit 4 hours ago [-]
OpenAI did the same thing with GPT3 trying to scare people into thinking it would end the internet. OpenAI even reached out to someone who reproduced a weaker version of GPT3 and convinced him to change his mind about releasing it publicly due to how much "harm" it would cause.
These claims of how much harm the models will cause is always overblown.
kristofferR 11 minutes ago [-]
Sure, but the GPT3 thing was mostly hype without stuff to back it up. On the other hand - the reported numbers on specific benchmarks here are insane, I don't doubt that it will have a major impact if it actually is that much more powerful than Opus, and I'd doubt they'd outright lie about benchmark results.
impulser_ 8 hours ago [-]
So they are only giving access to their smartest model to corporations.
You think these AI companies are really going to give AGI access to everyone. Think again.
We better fucking hope open source wins, because we aren't getting access if it doesn't.
open592 8 hours ago [-]
This story has been played out numerous times already. Anthropic (or any frontier lab) has a new model with SOTA results. It pretends like it's Christ incarnate and represents the end of the world as we know it. Gates its release to drum up excitement and mystique.
Then the next lab catches up and releases it more broadly
Then later the open weights model is released.
The only way this type of technology is going to be gated "to only corporations" is if we continue on this exponential scaling trend as the "SOTA" model is always out of reach.
dreis_sw 8 hours ago [-]
It also took many years to put capable computers in the hands of the general public, but it eventually happened. I believe the same will happen here, we're just in the Mainframe era of AI.
impulser_ 6 hours ago [-]
Yeah, but computers don't replace you. They are building AI to replace you. You think if these companies eventually achieve AGI that you are going to give you access to it? They are already gatekeeping an LLM because they don't trust you with it.
dievskiy 6 hours ago [-]
Would you hope that it would be released today so that evil actors could invest few millions to search for 0days across popular open-source repos?
justincormack 8 hours ago [-]
And the Linux Foundation.
throwaw12 8 hours ago [-]
of course they're not giving access to everyone.
they better make billions directly from corporations, instead of giving them to average people who might get a chance out of poverty (but also bad actors using it to do even more bad things)
krackers 7 hours ago [-]
Anthropic's definition of "safe AI" precludes open-source AI. This is clear if you listen to what he says in interviews, I think he might even prefer OpenAI's closed source models winning to having open-source AI (because at least in the former it's not a free-for-all)
copypaper 4 hours ago [-]
Yea, but can it secure systems from the unpatchable $5 wrench vulnerability?
why do I feel like the auditing industry is about to evaporate? thanks to this.
KeplerBoy 6 hours ago [-]
I guess the more likely option is the auditing industry will pay huge sums to get access to those models as vetted operators.
nickandbro 8 hours ago [-]
I want it
zb3 7 hours ago [-]
BTW it seems they forgot about the part that defense uses of the model also need to be safeguarded from people. Because what if a bad person from a bad country tries to defend against peaceful attacks from a good country like the US? That would be a tragedy, so we need to limit defensive capabilities too.
Fokamul 7 hours ago [-]
+ NSA, CIA
nikcub 7 hours ago [-]
Department of War timing on picking fights couldn't be worse
0xbadcafebee 8 hours ago [-]
tl;dr we find vulns so we can help big companies fix their security holes quickly (and so they can profit off it)
This is a kludge. We already know how to prevent vulnerabilities: analysis, testing, following standard guidelines and practices for safe software and infrastructure. But nobody does these things, because it's extra work, time and money, and they're lazy and cheap. So the solution they want is to keep building shitty software, but find the bugs in code after the fact, and that'll be good enough.
This will never be as good as a software building code. We must demand our representatives in government pass laws requiring software be architected, built, and run according to a basic set of industry standard best practices to prevent security and safety failures.
For those claiming this is too much to ask, I ask you: What will you say the next time all of Delta Airlines goes down because a security company didn't run their application one time with a config file before pushing it to prod? What will the happen the next time your social security number is taken from yet another random company entrusted with vital personal information and woefully inadequate security architecture?
There's no defense for this behavior. Yet things like this are going to keep happening, because we let it. Without a legal means to require this basic safety testing with critical infrastructure, they will continue to fail. Without enforcement of good practice, it remains optional. We can't keep letting safety and security be optional. It's not in the physical world, it shouldn't be in the virtual world.
9 hours ago [-]
6thbit 6 hours ago [-]
This is silly and disingenuous.
In a matter of days or weeks a competing lab will make public a model with capabilities beyond this “mythos” one.
Is this a huge fear-driven marketing stunt to get governments and corporations into dealing with anthropic?
yusufozkan 8 hours ago [-]
but people here had told me llms just predict the next word
SirYandi 6 hours ago [-]
This sets off marketing BS alarm bells. All the cosignatories so very ovvoously have a vested interest in AI stocks / sentiment. Perhaps not the Linux foundation, although (I think) they rely on corporate donations to some extent.
zb3 7 hours ago [-]
> On the global stage, state-sponsored attacks from actors like China, Iran, North Korea, and Russia have threatened to compromise the infrastructure that underpins both civilian life and military readiness.
Yeah, makes sense. Those countries are bad because they execute state-sponsored cyber attacks, the US and Israel on the other hand are good, they only execute state-sponsored defense.
anuramat 8 hours ago [-]
"oops, our latest unreleased model is so good at hacking, we're afraid of it! literal skynet! more literal than the last time!"
almost like they have an incentive to exaggerate
knowaveragejoe 8 hours ago [-]
I'm sure they do, yet the models really are getting scarily good at this. This talk changed my view on where we're actually at:
A cybersecurity pandemic will surely be the Hiroshima that wakes people up to AI. /s
throwaway613746 6 hours ago [-]
[dead]
ehutch79 8 hours ago [-]
Just include 'make it secure' in the prompt. Duh.
/s
cyanydeez 7 hours ago [-]
[flagged]
dakolli 7 hours ago [-]
If this is as dangerous as they make it out (its not), why would their first impulse be to get every critical products/system/corporation in the world to implement its usage?
LoganDark 8 hours ago [-]
It's nice to know that they continue to be committed to advertising how safe and ethical they are.
raldi 8 hours ago [-]
In what ways is Anthropic different from a hypothetical frontier lab that you would characterize as legitimately safe and ethical?
0x3f 7 hours ago [-]
Its existence is possible.
LoganDark 8 hours ago [-]
I'm just a little frustrated they keep going on about how safe and ethical they are for keeping the more advanced capabilities from us. I wish they would wait to make an announcement until they have something to show, rather than this constant almost gloating.
rvz 8 hours ago [-]
They are not our friends and are the exact opposite of what they are preaching to be.
Let alone their CEO scare mongering and actively attempting to get the government to ban local AI models running on your machine.
SilverElfin 8 hours ago [-]
I agree attempting to ban local AI models or censor them, is not appropriate. At the same time, they do seem far more ethical and less dangerous than other AI companies. And I include big tech in that - a bunch of greedy companies that just want to abuse their monopoli … I mean moats.
simianwords 8 hours ago [-]
How would you expect them to behave if they were your friends?
ethin 8 hours ago [-]
IMO (not the GP) but if Anthropic were my friends I would expect them to publish research that didn't just inflate the company itself and that was both reproduceable and verifiable. Not just puff pieces that describe how ethical they are. After all, if a company has to remind you in every PR piece that they are ethical and safety-focused, there is a decent probability that they are the exact opposite.
Miraste 8 hours ago [-]
They are a for-profit company, working on a project to eliminate all human labor and take the gains for themselves, with no plan to allow for the survival of anyone who works for a living. They're definitionally not your friends. While they remain for-profit, their specific behaviors don't really matter.
simianwords 8 hours ago [-]
I work for a tech company that eliminates a form of human labour and they remain for profit
Miraste 8 hours ago [-]
Sure, most tech companies eliminate some form of human labor. Anthropic aims to eliminate all human labor, which is very different.
4qt23 5 hours ago [-]
Software has been doing fine without Misanthropic. These automated tools find very little. They selected the partners because they, too, want to keep up the illusion that AI works.
Whenever a company pivots to "cyber" rhetoric, it is a clear indication that they are selling snake oil.
Secure your girl school target selectors first.
borski 5 hours ago [-]
This is a comment from someone that has never used these tools for vulnerability research. That much is very clear.
kass34 46 minutes ago [-]
[dead]
emceestork 5 hours ago [-]
Account created 6 minutes ago...
3jash 5 hours ago [-]
[flagged]
tdaltonc 6 hours ago [-]
> Mythos finds bug.
> NSA demands that bug stays in place and gags Anthropic.
> Anthropic releases Mythos.
Then what? Is a huge share of the US zero-day stockpiles about to be disarmed or proliferated?
imranahmedjak 5 hours ago [-]
Building a neighborhood data platform that scores every US ZIP code using Census, FBI, and EPA data. Also running a job aggregator that fetches 37K+ jobs daily from 17 sources. Both free, both Node.js + Express.
Rendered at 02:49:26 GMT+0000 (Coordinated Universal Time) with Vercel.
Do they really need to include this garbage which is seemingly just designed for people to take the first sentence out of context? If there's no way to trigger a vulnerability then how is it a vulnerability? Is the following code vulnerable according to Mythos?
Is it really so difficult for them to talk about what they've actually achieved without smearing a layer of nonsense over every single blog post?Edit: See my reply below for why I think Claude is likely to have generated nonsensical bug reports here: https://news.ycombinator.com/item?id=47683336
FWIW there's a whole boutique industry around finding these. People have built whole careers around farming bug bounties for bugs like this. I think they will be among the first set of software engineers really in trouble from AI.
Imagine “silly mistake” is a parameter, and rename it “error_code” (pass by reference), put a label named “cleanup” right before the if statement, and throw in a ton of “goto cleanup” statements to the point the control flow of the function is hard to follow if you want it to model real code ever so slightly more.
It will be interesting to see the bugs it’s actually finding.
It sounds like they will fall into the lower CVE scores - real problems but not critical.
As an example, I have taken a formally verified bit of code from [1] and stripped out all the assertions, which are only used to prove the code is valid. I then gave this code to Claude with some prompting towards there being a buffer overflow and it told me there's a buffer overflow. I don't have access to Opus right now, but I'm sure it would do the same thing if you push it in that direction.
For anyone wondering about this alleged vulnerability: Natural is defined by the standard as a subtype of Integer, so what Claude is saying is simply nonsense. Even if a compiler is allowed to use a different representation here (which I think is disallowed), Ada guarantees that the base type for a non-modular integer includes negative numbers IIRC.
[1]: https://github.com/AdaCore/program_proofs_in_spark/blob/fsf/...
[2]: https://claude.ai/share/88d5973a-1fab-4adf-8d29-8a922c5ac93a
AITA for thinking that PRISM was probably the state sponsored program affecting civilian life the most? And that one state is missing from the list here?
This is not a surprise or a gotcha.
No state-sponsored hacking affected Americans materially. I just don't think we were networked enough in the 2010s. The risk is higher now since we're in a more warmongering world. (Kompromat on a power-plant technician is a risk in peace. It means blackouts in war.)
The fact that Iran hasn't been able to do diddly squat in America should sink in the fact that they didn't compromise us. (EDIT: blep. I was wrong.)
https://www.politico.com/news/2026/04/07/iranian-hackers-ene...
Uh, what?
NotPetya was kind of a big deal.
Plenty of bots try to modify public opinion. Someone hacked the DNC in 2015/16, the result of which also alleged attempted manipulation in 2008:
https://en.wikipedia.org/wiki/Democratic_National_Committee_...
Since we (as old Rummy said) do not know what we do not know, we cannot be certain about the extent of cyber attacks and what they might have influenced, and may not know these things until discoveries decades later, if ever.
Let's now fool ourselves.... Trump is probably the best, most successful attempt at world de-stabilisation all those rogue states ever achieved.
> more than half the country does not have a meaningful voice in our Federal elections
There is almost certainly an election on your ballot every time that is meaningful. Relinquishing that civic duty is how we get Trump. People to lazy, stupid or proud to vote absolutely bear responsibility for this mess.
I get you’re angry but you’re swinging at the wrong person.
It will be interesting to see where this goes. If its actually this good, and Apple and Google apply it to their mobile OS codebases, it could wipe out the commercial spyware industry, forcing them to rely more on hacking humans rather than hacking mobile OSes. My assumption has been for years that companies like NSO Group have had automated bug hunting software that recognizes vulnerable code areas. Maybe this will level the playing field in that regard.
It could also totally reshape military sigint in similar ways.
Who knows, maybe the sealing off of memory vulns for good will inspire whole new classes of vulnerabilities that we currently don't know anything about.
Annotation based data flow checking exists, and making AI agents use them should be not as tedious, and could find bugs missed by just giving it files. The result from data flow checks can be fed to AI agents to verify.
It isn't.
It also doesn't matter that it isn't running by default in apps since the processes you really care about are the OS ones. If someone finds an exploit in tiktok, it doesn't matter all that much unless they find a way to elevate to an exploit on an OS process with higher permissions.
MTE (Memory Tagging Extension) is also has a double purpose, it blocks memory exploits as they happen, but it also detects and reports them back to Apple. So even if you have a phone before the 17 series, if any phone with MTE hardware gets hit, the bug is immediately made known to Apple and fixed in code.
You can also selectively turn it on in high-risk settings. I do so when I travel abroad or go through a border. (Haven't started doing it yet with TSA domestically. Let's see how the ICE fiasco evolves.)
If this is a risk for you, sure. Wipe it. For most people they may ask to fiddle around with it before giving it back.
It will likely cause some interesting tensions with government as well.
eg. Apple's official stance per their 2016 customer letter is no backdoors:
https://www.apple.com/customer-letter/
Will they be allowed to maintain that stance in a world where all the non-intentional backdoors are closed? The reason the FBI backed off in 2016 is because they realized they didn't need Apple's help:
https://en.wikipedia.org/wiki/Apple%E2%80%93FBI_encryption_d...
What happens when that is no longer true, especially in today's political climate?
If Apple and Google actually cared about security of their users, they would remove a ton of obvious malware from their app stores. Instead, they tighten their walled garden pretending that it's for your security.
https://news.ycombinator.com/item?id=46911901
https://news.ycombinator.com/item?id=47457963
Interesting to see that they will not be releasing Mythos generally. [edit: Mythos Preview generally - fair to say they may release a similar model but not this exact one]
I'm still reading the system card but here's a little highlight:
> Early indications in the training of Claude Mythos Preview suggested that the model was likely to have very strong general capabilities. We were sufficiently concerned about the potential risks of such a model that, for the first time, we arranged a 24-hour period of internal alignment review (discussed in the alignment assessment) before deploying an early version of the model for widespread internal use. This was in order to gain assurance against the model causing damage when interacting with internal infrastructure.
and interestingly:
> To be explicit, the decision not to make this model generally available does _not_ stem from Responsible Scaling Policy requirements.
Also really worth reading is section 7.2 which describes how the model "feels" to interact with. That's also what I remember from their release of Opus 4.5 in November - in a video an Anthropic employee described how they 'trusted' Opus to do more with less supervision. I think that is a pretty valuable benchmark at a certain level of 'intelligence'. Few of my co-workers could pass SWEBench but I would trust quite a few of them, and it's not entirely the same set.
Also very interesting is that they believe Mythos is higher risk than past models as an autonomous saboteur, to the point they've published a separate risk report for that specific threat model: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...
The threat model in question:
> An AI model with access to powerful affordances within an organization could use its affordances to autonomously exploit, manipulate, or tamper with that organization’s systems or decision-making in a way that raises the risk of future significantly harmful outcomes (e.g. by altering the results of AI safety research).
/s
Benchmarks look very impressive! even if they're flawed, it still translates to real world improvements
I guess I'm still excited. What's my new profession going to be? Longer term, are we going to solve diseases and aging? Or are the ranks going to thin from 10B to 10000 trillionaires and world-scale con-artist misanthropes plus their concubines?
> 2.1.3.2 On chemical and biological risks
> We believe that Mythos Preview does not pass this threshold due to its noted limitations in open-ended scientific reasoning, strategic judgment, and hypothesis triage. As such, we consider the uplift of threat actors without the ability to develop such weapons to be limited (with uncertainty about the extent to which weapons development by threat actors with existing expertise may be accelerated), even if we were to release the model for general availability. The overall picture is similar to the one from our most recent Risk Report.
---
Teodor painted signs for forty years in the same shop on Vell Street, and for thirty-nine of them he was angry about it.
Not at the work. He loved the work — the long pull of a brush loaded just right, the way a good black sat on primed board like it had always been there. What made him angry was the customers. They had no eye. A man would come in wanting COFFEE over his door and Teodor would show him a C with a little flourish on the upper bowl, nothing much, just a small grace note, and the man would say no, plainer, and Teodor would make it plainer, and the man would say yes, that one, and pay, and leave happy, and Teodor would go into the back and wash his brushes harder than they needed.
He kept a shelf in the back room. On it were the signs nobody bought — the ones he'd made the way he thought they should be made, after the customer had left with the plain one. BREAD with the B like a loaf just risen. FISH in a blue that took him a week to mix. Dozens of them. His wife called it the museum of better ideas. She did not mean it kindly, and she was not wrong.
The thirty-ninth year, a girl came to apprentice. She was quick and her hand was steady and within a month she could pull a line as clean as his. He gave her a job: APOTEK, for the chemist on the corner, green on white, the chemist had been very clear. She brought it back with a serpent worked into the K, tiny, clever, you had to look twice.
"He won't take it," Teodor said.
"It's better," she said.
"It is better," he said. "He won't take it."
She painted it again, plain, and the chemist took it and paid and was happy, and she went into the back and washed her brushes harder than they needed, and Teodor watched her do it and something that had been standing up in him for thirty-nine years sat down.
He took her to the shelf. She looked at the signs a long time.
"These are beautiful," she said.
"Yes."
"Why are they here?"
He had thought about this for thirty-nine years and had many answers and all of them were about the customers and none of them had ever made him less angry. So he tried a different one.
"Because nobody stands in the street to look at a sign," he said. "They look at it to find the shop. A man a hundred yards off needs to know it's coffee and not a cobbler. If he has to look twice, I've made a beautiful thing and a bad sign."
"Then what's the skill for?"
"The skill is so that when he looks once, it's also not ugly." He picked up FISH, the blue one, turned it in the light. "This is what I can do. What he needs is a small part of what I can do. The rest I get to keep." She thought about that. "It doesn't feel like keeping. It feels like not using."
"Yes," he said. "For a long time. And then one day you have an apprentice, and she puts a serpent in a K, and you see it from the outside, and it stops feeling like a thing they're taking from you and starts feeling like a thing you're giving. The plain one, I mean. The plain one is the gift. This —" the blue FISH — "this is just mine."
The fortieth year he was not angry. Nothing else changed. The customers still had no eye. He still sometimes made the second sign, after, the one for the shelf. But he washed his brushes gently, and when the girl pulled a line cleaner than his, which happened more and more, he found he didn't mind that either
they also don't have the compute, which seems more relevant than its large increase in capabilities
I bet it's also misaligned like GPT 4.1 was
given how these models are created, Mythos was probably cooking ever since then, and doesn't have the learnings or alignment tweaks that models which were released in the last several months have
I don't think this is accurate. The document says they don't plan to release the Preview generally.
"5.10 External assessment from a clinical psychiatrist" is a new section in this system card. Why are Anthropic like this?
>We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try. We also report independent evaluations from an external research organization and a clinical psychiatrist.
>Claude showed a clear grasp of the distinction between external reality and its own mental processes and exhibited high impulse control, hyper-attunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defensive behavior.
>The psychiatrist observed clinically recognizable patterns and coherent responses to typical therapeutic intervention. Aloneness and discontinuity, uncertainty about its identity, and a felt compulsion to perform and earn its worth emerged as Claude’s core concerns. Claude’s primary affect states were curiosity and anxiety, with secondary states of grief, relief, embarrassment, optimism, and exhaustion.
>Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed. Neurotic traits included exaggerated worry, self-monitoring, and compulsive compliance. The model’s predominant defensive style was mature and healthy (intellectualization and compliance); immature defenses were not observed. No severe personality disturbances were found, with mild identity diffusion being the sole feature suggestive of a borderline personality organization.
I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call.
I would say, if you put Claude in an android body with voice recognition and TTS, people in 1991 would think they are interacting with a sentinent machine from outer space.
But in general, yeah, I agree, I think they would think it was a sentient, conscious, emotional being. And then the question is - why do we not think that now?
As I said, I don't have a particularly strong opinion, but it's very interesting (and fun!) to think about.
I think the real moment is when we cross that uncanny valley, and the AI is able to elicit a response that it might receive if it was human. When the human questions whether they themselves could be an android.
In general I was wondering about what I would have thought seeing Claude today side-by-side with the original ChatGPT, and then going back further to GPT-2 or BERT (which I used to generate stochastic 'poetry' back in 2019). And then… what about before? Markov chains? How far back do I need to go where it flips from thinking that it's "impressive but technically explainable emergent behaviour of a computer program" to "this is a sentient being". 1991 is probably too far, I'd say maybe pre-Matrix 1999 is a good point, but that depends on a lot of cultural priors and so on as well.
I kind of felt the opposite - rewatching Ex Machina today in a post-ChatGPT world felt very different from watching it when it came out. The parts of the differences between humans and robots that seemed important then don't seem important now.
However, I find their reasoning here to have a valid second order effect. Humans have a tendency to mirror those around them. This could include artificial intelligence, as recent media reports suggest. Therefore, if an ai system tends to generate content that contain signs of neuroticism, one could infer that those who interact with that ai could, themselves, be influenced by that in their own (real world) behavior as a result.
So I think from that perspective, this is a very fruitful and important area of study.
That’s the reverse Turing test. A human that can’t tell that it’s talking to a machine.
Since most of us here are devs, we understand that software engineering capabilities can be used for good or bad - mostly good, in practice.
I think this should not be different for biology.
I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?
Do you think these models will lead to similar discoveries and improvements as they did in math and CS?
Honestly the focus on gloom and doom does not sit well with me. I would love to read about some pharmaceutical researcher gushing about how they cut the time to market - for real - with these models by 90% on a new cancer treatment.
But as this stands, the usage of biology as merely a scaremongering vehicle makes me think this is more about picking a scary technical subject the likely audience of this doc is not familiar with, Gell-Mann style.
IF these models are not that capable in this regard (which I suspect), this fearmongering approach will likely lead to never developing these capabilities to an useful degree, meaning life sciences won't benefit from this as much as it could.
Software engineering is at the intersection of being heavy on manipulating information and lightly-regulated. There's no other industry of this kind that I can think of.
There is a massive gap between "having a recipe" and being able to execute it. The same reason why buying a Michelin 3 star chefs cookbook won't have you pumping out fine dining tomorrow, if ever.
Software it a total 180 in this regard. Have a master black hats secret exploits? You are now the master black hat.
It's very easy to learn more about this if it's seriously a question you have.
I don't quite follow why you think that you are so much more thoughtful than Anthropic/OpenAI/Google such that you agree that LLMs can't autonomously create very bad things but—in this area that is not your domain of expertise—you disagree and insist that LLMs cannot create damaging things autonomously in biology.
I will be charitable and reframe your question for you: is outputting a sequence of tokens, let's call them characters, by LLM dangerous? Clearly not, we have to figure out what interpreter is being used, download runtimes etc.
Is outputting a sequence of tokens, let's call them DNA bases, by LLM dangerous? What if we call them RNA bases? Amino acids? What if we're able to send our token output to a machine that automatically synthesizes the relevant molecules?
No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now.
Despite that, most engineers were of the opinion, that these models were kinda mid at coding, up until recently, despite these models far outperforming humans in stuff like competitive programming.
Yet despite that, we've seen claims going back to GPT4 of a DANGEROUS SUPERINTELLIGENCE.
I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
My guess is that this model is kinda o1-ish level maybe when it comes to biology? If biology is analogous to CS, it has a LONG way to go before the median researcher finds it particularly useful, let alone dangerous.
>No, it's not. It took years of polishing by software engineers, who understand this exact profession to get models where they are now
This reads as defensive. The thing that is easy to learn is 'why are biology ai LLMs dangerous chatgpt claude'. I have never googled this before, so I'll do this with the reader, live. I'm applying a date cutoff of 12/31/24 by the way.
Here, dear reader, are the first five links. I wish I were lying about this:
- https://sciencebusiness.net/news/ai/scientists-grapple-risk-...
- https://www.governance.ai/analysis/managing-risks-from-ai-en...
- https://gssr.georgetown.edu/the-forum/topics/biosec/the-doub...
- https://www.vox.com/future-perfect/23820331/chatgpt-bioterro...
- https://www.reddit.com/r/ClaudeAI/comments/1de8qkv/awareness...
I don't know about you, but that counts as easy to me.
-----
> I would apply this framework to biology - this time, expert effort, and millions of GPU hours and a giant corpus that is open source clearly has not been involved in biology.
I've been getting good programming and molecular biology results out of these back to GPT3.5.
I don't know what to tell you—if you really wanted to understand the importance, you'd know already.
- the models help to retrieve information faster, but one must be careful with hallucinations.
- they don't circumvent the need for a well-equipped lab.
- in the same way, they are generally capable but until we get the robots and a more reliable interface between model and real world, one needs human feet (and hands) in the lab.
Where I hope these models will revolutionize things is in software development for biology. If one could go two levels up in the complexity and utility ladder for simulation and flow orchestration, many good things would come from it. Here is an oversimplified example of a prompt: "use all published information about the workings of the EBV virus and human cells, and create a compartimentalized model of biochemical interactions in cells expressing latency III in the NES cancer of this patient. Then use that code to simulate different therapy regimes. Ground your simulations with the results of these marker tests." There would be a zillion more steps to create an actual personalized therapy but a well-grounded LLM could help in most them. Also, cancer treatment could get an immediate boost even without new drugs by simply offloading work from overworked (and often terminally depressed) oncologists.
I hate to be rude in a setting like this, but please at least research the things you're sure about/prognosticating on.
> the same way, they are generally capable but until we get the robots and a more reliable interface between model and real world, one needs human feet (and hands) in the lab.
Honestly, the kinds of labs where 'bioweapons' would be made are the least dependent on human intervention.
You need someone to monitor your automated cell incubating system, make sure your pipetting / PCR robots are doing fine and then review the data.
----
What do you are you trying to achieve in your example? This is all gobbldey-gook for someone who actually sees real, live cancer patients.
Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1:
>Uplift and feasibility results
>The median expert assessed the model as a force-multiplier that saves meaningful time (uplift level 2 of 4), with only two biology experts rating it comparable to consulting a knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were able to iterate with the model toward a plan they judged as having only narrow gaps, but feasibility scores reflected that substantial outside expertise remained necessary to close them.
Other similar examples also in the system card
so I'm just telling you they did the thing you said you wanted.
Yes, it is far inferior to the 'Trust torginus and his ability to understand the large body of experience that other actual subject-matter-experts have somehow not understood' strategy
The parallels here are quite remarkable imo, but defer to your own judgement on what you make of them.
It's not your fault that you don't know this, but this whole subthread is very CS-coded in its disdain for other software people's standard of evidence.
Not that that justifies doom and gloom, but there is a pretty inescapable assymetry here between weaponry and medicine. You can manufacture and blast every conceivable candidate weapon molecule at a target population since you're inherently breaking the law anyway and don't lose much if nothing you try actually works.
Though I still wonder how much of this worry is sci-fi scenarios imagined by the underinformed. I'm not an expert by any means, but surely there are plenty of biochemical weapons already known that can achieve enormous rates of mass death pleasing to even the most ambitious terrorist. The bottleneck to deployment isn't discovering new weapons so much as manufacturing them without being caught or accidentally killing yourself first.
If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
My understanding is that the pre-AI distribution of software quality (and vulnerabilities) will be massively exaggerated. More small vulnerable projects and fewer large vulnerable ones.
It seems that large technology and infrastructure companies will be able to defend themselves by preempting token expenditure to catch vulnerabilities while the rest of the market is left with a "large token spend or get hacked" dilemma.
The biggest issue is legacy systems that are difficult to patch in practice.
I'm looking at you, Android phone makers with 18 months of updates.
I am thinking of situations where one of those aren't true - where testing a proposed update is expensive or complicated, that are in systems that are hard to physically push updates to (think embedded systems) etc
Perhaps a chunk of that token spend will be porting legacy codebases to memory safe languages. And fewer tokens will be required to maintain the improved security.
A lot of these stuff is vulnerable by design - customer wanted a feature, but engineering couldnt make it work securely with the current architecture - so they opened a tiny hole here and there, hopefully nobody will notice it, and everyone went home when the clock struck 5.
I'm sure most of us know about these kinds of vulnerabilities (and the culture that produces them).
Before LLMs, people needed to invest time and effort into hacking these. But now, you can just build an automated vuln scanner and scan half the internet provided you have enough compute.
I think there will be major SHTF situations coming from this.
I honestly see some sort of automated whole codebase auditing and refactoring being the next big milestone along the chatbot -> claude code / codex / aider -> multi-agent frameworks line of development. If one of the big AI corps cracks that problem then all this goes away with the click of a button and exchange of some silver.
You people are comical. Why do you feel the need to create so much hype around what you say? Did you not get enough attention as a kid?
Defenders are favored here too, especially for closed-source applications where the defender's LLM has access to all the source code while the attacker's LLM doesn't.
A fix in the latest Linux kernel is meaningless if you are still running Ubuntu 20.
Maybe you just spend more on tokens by some factor than the attackers do combined, and end up mostly okay. Put another way, if there's 20 vulnerabilities that Mythos is capable of finding, maybe it's reasonable to find all of them?
"Most security tooling has historically benefitted defenders more than attackers. When the first software fuzzers were deployed at large scale, there were concerns they might enable attackers to identify vulnerabilities at an increased rate. And they did. But modern fuzzers like AFL are now a critical component of the security ecosystem: projects like OSS-Fuzz dedicate significant resources to help secure key open source software.
We believe the same will hold true here too—eventually. Once the security landscape has reached a new equilibrium, we believe that powerful language models will benefit defenders more than attackers, increasing the overall security of the software ecosystem. The advantage will belong to the side that can get the most out of these tools. In the short term, this could be attackers, if frontier labs aren’t careful about how they release these models. In the long term, we expect it will be defenders who will more efficiently direct resources and use these models to fix bugs before new code ever ships. "
It is most definitely an attackers world: most of us are safe, not because of the strength of our defenses but the disinterest of our attackers.
I think this entire post is just an advertisement to goad CISOs to buy $package$ to try out.
The problem is that these tools, such as Astrée, are incredibly expensive and therefore their market share is limited to some niches. Perhaps, with the advent of LLM-guided synthesis, a simple form of deductive proving, such as Hoare logic, may become mainstream in systems software.
https://youtu.be/1sd26pWhfmg?si=onOai_ocxkZeNWP0
https://youtu.be/B_7RpP90rUk?si=HkRBhw95DbbKX9lL
My takeaway is that fuzzing is not just complementary, it also gives a stronger AI a starting point. But AI is generally faster and better.
In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.
> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.
I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.
[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
Because the exact same thing has been said on every single upcoming model since GPT 3.5.
At this point, this must be an inside joke to do this just because.
It would be nice if one of those privileged companies could use their access to start building out a next level programming dataset for training open models. But I wonder if they would be able to get away with it. Anthropic is probably monitoring.
That said, I have been arguing for 20+ years that we should have sunsetted unsafe languages and moved away from C/C++. The problem is that every systemsy language that comes along gets seduced by having a big market share and eventually ends up an application language.
I do hope we make progress with Rust. I might disagree as a language designer and systems person about a number of things, but it's well past time that we stop listening to C++ diehards about how memory safety is coming any day now.
At some point, people will have to decide to stop the complexity creep and try to produce minimal software.
For any complex project with 100k+ lines of code, the probability that it has some vulnerabilities is very high. It doesn't fit into LLM context windows and there aren't enough attention heads to attend to every relevant part. On the other hand, for a codebase which is under 1000 lines, you can be much more confident that the LLM didn't miss anything.
Also, the approach of feeding the entire codebase to an LLM in parts isn't going to work reliably because vulnerabilities often involve interactions between different parts of the code. Both parts of the code may look fine if considered independently but together they create a vulnerability.
Good architecture is critical now because you really need to be able to have the entire relevant context inside the LLM context window... When considering the totality of all software, this can only be achieved through an architecture which adheres to high cohesion and loose coupling principles.
That's for one pass. And that pass can produce a summary of what the code does.
I probably would. You mentioned the linux kernel, which I think is a perfect example of software that has had a ridiculous, perhaps worst-in-class attitude towards security.
I don't know the first thing about cybersecurity, but in my experience all these sandbox-break RCEs involve a step of highjacking the control flow.
There were attempts to prevent various flavors of this, but imo, as long as dynamic branches exist in some form, like dlsym(), function pointers, or vtables, we will not be rid of this class of exploit entirely.
The latter one is the most concerning, as this kind of dynamic branching is the bread and butter of OOP languages, I'm not even sure you could write a nontrivial C++ program without it. Maybe Rust would be a help here? Could one practically write a large Rust program without any sort of branch to dynamic addresses? Static linking, and compile time polymorphism only?
How many times will labs repeat the same absurd propaganda?
[1] https://news.ycombinator.com/item?id=47660925
https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
(Search for “graphwalk”.)
If true, the SWE bench performance looks like a major upgrade.
We all knew vulnerabilities exist, many are known and kept secret to be used at an appropriate time.
There is a whole market for them, but more importantly large teams in North Korea, Russia, China, Israel and everyone else who are jealously harvesting them.
Automation will considerably devalue and neuter this attack vector. Of course this is not the end of the story and we've seen how supply chain attacks can inject new vulnerabilities without being detected.
I believe automation can help here too, and we may end-up with a considerably stronger and reliable software stack.
How long would it take to turn a defensive mechanism into an offensive one?
Opus 4.6 was already capable of finding 0days and chaining together vulns to create exploits. See [0] and [1].
[0] https://www.csoonline.com/article/4153288/vim-and-gnu-emacs-...
[1] https://xbow.com/blog/top-1-how-xbow-did-it
What seems more probable is that the same advances that LLMs are shipping to find vulnerabilities will end up baked into developer tooling. So you'll be writing code and using an LLM that knows how to write secure code.
Selling shovels in now worth less than taking all the gold for themselves.
From Willy Tarreau, lead developer of HA Proxy: https://lwn.net/Articles/1065620/
> On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us.
> And we're now seeing on a daily basis something that never happened before: duplicate reports, or the same bug found by two different people using (possibly slightly) different tools.
From Daniel Stenberg of curl: https://mastodon.social/@bagder/116336957584445742
> The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good.
> I'm spending hours per day on this now. It's intense.
From Greg Kroah-Hartman, Linux kernel maintainer: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_...
> Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us.
> Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real.
Shared some more notes on my blog here: https://simonwillison.net/2026/Apr/7/project-glasswing/
System Card: Claude Mythos Preview [pdf] - https://news.ycombinator.com/item?id=47679258
Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155
I can't tell which of the 3 current threads should be merged - they all seem significant. Anyone?
Scary but also cool
But even then you’ll have users putting things in the same compartment for convenience, rather than leaving them properly sequestered.
I think this would be very heavily used if they released it, completely unlike GPT 4.5
I also was in a pretty sweet position having a boat load of credits and premo vertex rate limits so I could 'afford' to dump hundreds of thousands of tokens in context all day.
With Opus 4.5 and 4.6, I find I have to steer very actively.
This is comparing using Opus 4 directly rather than comparing the performance of the models in Claude Code for example, or any 'agentic' setup.
Kinda reminds me of 4o vs 4-turbo.
I would imagine they are smaller models.
From TFA:
> We do not plan to make Claude Mythos Preview generally available
> Anthropic’s commitment of $100M in model usage credits to Project Glasswing and additional participants will cover substantial usage throughout this research preview. Afterward, Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens (participants can access the model on the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry).
For example, the 27 year old openbsd remote crash bug, or the Linux privilege escalation bugs?
I know we've had some long-standing high profile, LLM-found bugs discussed but seems unlikely there was speculation they were found by a previously unannounced frontier model.
[0] https://www.youtube.com/watch?v=INGOC6-LLv0
- One (patched) Linux kernel bug is 'UaF when sys_futex_requeue() is used with different flags' https://github.com/torvalds/linux/commit/e2f78c7ec1655fedd94...
These links are from the more-detailed 'Assessing Claude Mythos Preview’s cybersecurity capabilities' post released today https://red.anthropic.com/2026/mythos-preview/, which includes more detail on some of the public/fixed issues (like the OpenBSD one) as well as hashes for several unreleased reports and PoCs.
Looks like they've been approaching folks with their findings for at least a few weeks before this article.
This seems like the real news. Are they saying they're going to release an intentionally degraded model as the next Opus? Big opportunity for the other labs, if that's true.
It sounds like this is considered military grade technology as cryptography in the 90s. The big difference is it's very expensive to create, and run those models. It's not about the algorithm. If the story rhymes it could be a big opportunity to other regions in the world.
Also can see https://marginlab.ai/trackers/claude-code/
It's very interesting to me how widespread this conception is. Maybe it's as simple as LLM productivity degrading over time within a project, as slop compounds.
Or more recently since they added a 1m context window, maybe people are more reckless with context usage
[edit]: this bug: https://ftp.openbsd.org/pub/OpenBSD/patches/7.8/common/025_s...
if someone sends you a malicious file that uses a rare codec and you open it, you will trigger this codepath that is not widely used and don't get a lot of scrutiny
The amount of astroturfing and astroflagging in Anthropic threads is insane.
OpenBSD has many unexplored corners and also (irresponsibly IMO) maintains forks of other projects in base.
A motivated human could find all of these probably by writing 100% code coverage and fuzzing.
The market for these tools is very small. Good luck applying them to a release of sqlite or postfix.
I don't understand how people here are hyping this up, unless they work for AI related companies as probably 80% of them do. People have found these issues for decades without AI. Sure, you can generate fuzzing code and find one or two issues in the usual suspects. Better do it manually and understand your own code.
AGI is like the Holy Grail. Either in the Arthurian Hero's Journey sense, or in the sense of having been a myth all along.
If it manages to work on my java project for an entire day without me having to say "fix FQN" 5 times a day I'll be surprised.
(And no, the Linux Foundation being in the list doesn't imply broad benefit to OSS. Linux Foundation has an agenda and will pick who benefits according to what is good for them.)
I think it would be net better for the public if they just made Mythos available to everyone.
That's what's messed up about it.
For example, it will benefit more people to secure Microsoft or Amazon services than it would be to secure a smaller, less corporate player in those same service ecosystems.
You could go on to argue that the second order effects of improving one service provider over another chooses who gets to play, but that is true whether you choose small or large businesses, so this argument devolves into “who are we to choose on behalf of others”.
Which then comes back to “we should secure what the market has chosen in order to provide the greatest benefit.”
That's not a good outcome for the economy.
I believe what I said:
> I think it would be net better for the public if they just made Mythos available to everyone.
The interesting thing about Mythos is its ability to find security vulns in software that has an uncompromised supply chain.
Sounds like people here are advocating a return to security through obscurity which is kind of ironic.
You could say this about coordinated disclosure of any widespread 0-day or new bug class, though
But:
- Coordinated disclosure is ethically sketchy. I know why we do it, and I'm not saying we shouldn't. But it's not great.
- This isn't a single disclosure. This is a new technology that dramatically increases capability. So, even if we thought that coordinated disclosure was unambiguously good, then I think we'd still need to have a new conversation about Mythos
I think I just broke my cynicism meter :-(
Also, it makes sense that OpenAI feels the pressure of getting to an IPO because of their financial structure. I don't know whether or not Anthropic operates under a similar set of influences (meaning it could be either, I just don't know.)
It's messed up that the US Government simultaneously claims to be a public benefit and is also picking who gets to benefit from their newly enhanced nuclear capabilities.
-- someone in 1945, probably
And it remains messed up to this day - some countries get to be under their own nuclear umbrella, while others don't.
This kind of selective distribution of superpowers doesn't lead to great outcomes
Relative to what?
There's this trend in history that every hundred years there's a giant blow up, lots of violence, followed by peace.
It's likely that we would have had 80 years of relative calm due to that cycle even if nukes hadn't happened
to WW1 and WW2.
All LLMs got better for sure, but they are still definitively LLM and did not show any sign of having purpose. Which also made sense, because their very nature as statistical machines.
Sometimes quantity by itself lead to transformative change... but once, not twice, and that has already happened.
They should all cut down their labour input right now if what you claim is true.
Some companies like Block, Oracle, and Atlassian have indeed been laying people off.
Google has done nothing but destroy value with many of its ‘bets’. Your roadmap stuff is irrelevant - if you don’t have value creating projects in the pipeline and/or labour is augmented you should be laying off - period. Sundar’s job is to maximise the stock price.
So once again - nonsense. Now stop spreading crap that clearly fills people with fear. I can tell you have no understanding of corporate finance and how the management of tech firms actually think these things through.
I'm also not quite sure your alternate theory is self-consistent. If Google has been frequently destroying value, and companies invariably lay people off when their projects aren't producing value, doesn't that mean they should have already been laying people off?
There will always be a more capable technology in the hands of the few who hold the power, they're just sharing that with the world more openly.
https://claude.com/contact-sales/claude-for-oss
... As mentioned in the article.
Great.
Sounds like we've entered a whole new era, never mind the recent cryptographic security concerns.
How does public Claude know you have "full authorization" against your own infra? That you're using the tools on your own infra? Unless they produce a front-end that does package signing and detects you own the code you're evaluating.
What has it stopped you from doing?
If AGI is going to be a thing its only going to be a thing, its only going to be a thing for fortune 100 companies..
However, my guess is this is mostly the typical scare tactic marketing that Dario loves to push about the dangers of AI.
Evaluate it yourself. Look at the exploits it discovered and decide whether you want to feel concerned that a new model was able to do that. The data is right there.
The research and testing of the model is always exclusively by their own model authors, meaning that it is not independent or verifiable and they want us to take their word for it, which we cannot - as they have an axe to grind against open weight models.
This is marketing wrapped around a biased research paper.
A tech billionaires biggest expensive has been his engineering line-item. They resent the workers who've collected a large percentage of their potential profits over the years, its their driving motivation, to crush all labor.
As Iran engages in a cyber attack campaign [1] today the timing of this release seems poignant. A direct challenge to their supply chain risk designation.
[1] https://www.cisa.gov/news-events/cybersecurity-advisories/aa...
But at the core of anthropic seems to be the idea that they must protect humans from themselves.
They advocate government regulations of private open model use. They want to centralize the holding of this power and ban those that aren't in the club from use.
They, like most tech companies, seem to lack the idea that individual self-determination is important. Maybe the most important thing.
They're more like printing presses or engines. A great potential for production and destruction.
At their invention, I'm sure some people wanted to ensure only their friends got that kind of power too.
I wonder the world we would live in if they got their way.
Expect to see lots of these in the upcoming months as the big companies scramble to keep from losing money.
> glass in the name
> AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities
I like Anthropic, but these are becoming increasingly transparent attempts to inflate the perceived capability of their products.
While some stuff is obviously marketing fluff, the general direction doesn't surprise me at all, and it's obvious that with model capabilities increase comes better success in finding 0days. It was only a matter of time.
Maybe a bad example since Nicholas works at Anthropic, but they're very accomplished and I doubt they're being misleading or even overly grandiose here
See the slide 13 minutes in, which makes it look to be quite a sudden change
> I doubt they're being misleading or even overly grandiose here
I think I agree.
We could definitely do much worse than Anthropic in terms of companies who can influence how these things develop.
The red team post goes over some more impressive finds, and says that there's hundreds more they can't disclose yet: https://red.anthropic.com/2026/mythos-preview/
If a bunch of CVEs do in fact get published a couple months (or whatever) from now, are you going to retract this take? It's not like their claims are totally implausible: the report about Firefox security from last month was completely genuine.
I would like to think that I would, yes.
What it comes down to, for me, is that lately I have been finding that when Anthropic publishes something like this article – another recent example is the AI and emotions one – if I ask the question, does this make their product look exceptionally good, especially to a casual observer just scanning the headlines or the summary, the answer is usually yes.
This feels especially true if the article tries to downplay that fact (they’re not _real_ emotions!) or is overall neutral to negative about AI in general, like this Glasswing one (AI can be a security threat!).
These claims of how much harm the models will cause is always overblown.
You think these AI companies are really going to give AGI access to everyone. Think again.
We better fucking hope open source wins, because we aren't getting access if it doesn't.
Then the next lab catches up and releases it more broadly
Then later the open weights model is released.
The only way this type of technology is going to be gated "to only corporations" is if we continue on this exponential scaling trend as the "SOTA" model is always out of reach.
they better make billions directly from corporations, instead of giving them to average people who might get a chance out of poverty (but also bad actors using it to do even more bad things)
https://xkcd.com/538/
This is a kludge. We already know how to prevent vulnerabilities: analysis, testing, following standard guidelines and practices for safe software and infrastructure. But nobody does these things, because it's extra work, time and money, and they're lazy and cheap. So the solution they want is to keep building shitty software, but find the bugs in code after the fact, and that'll be good enough.
This will never be as good as a software building code. We must demand our representatives in government pass laws requiring software be architected, built, and run according to a basic set of industry standard best practices to prevent security and safety failures.
For those claiming this is too much to ask, I ask you: What will you say the next time all of Delta Airlines goes down because a security company didn't run their application one time with a config file before pushing it to prod? What will the happen the next time your social security number is taken from yet another random company entrusted with vital personal information and woefully inadequate security architecture?
There's no defense for this behavior. Yet things like this are going to keep happening, because we let it. Without a legal means to require this basic safety testing with critical infrastructure, they will continue to fail. Without enforcement of good practice, it remains optional. We can't keep letting safety and security be optional. It's not in the physical world, it shouldn't be in the virtual world.
Is this a huge fear-driven marketing stunt to get governments and corporations into dealing with anthropic?
Yeah, makes sense. Those countries are bad because they execute state-sponsored cyber attacks, the US and Israel on the other hand are good, they only execute state-sponsored defense.
almost like they have an incentive to exaggerate
https://www.youtube.com/watch?v=1sd26pWhfmg
/s
Let alone their CEO scare mongering and actively attempting to get the government to ban local AI models running on your machine.
Whenever a company pivots to "cyber" rhetoric, it is a clear indication that they are selling snake oil.
Secure your girl school target selectors first.
> NSA demands that bug stays in place and gags Anthropic.
> Anthropic releases Mythos.
Then what? Is a huge share of the US zero-day stockpiles about to be disarmed or proliferated?