The cool thing about LLMs is that once a capability is "good enough" you can always "chain" them together for better overall results. On the client side this means "write an API that does x y z" -> "analyse this API for security concerns" -> "PoC for each finding from this report" -> "fix this code according to these verified claims".
On the "server side" (i.e. training) you can use the current gen models to improve the training data by running many parallel environments with a similar loop as above. Then incorporate the new data and repeat. Reminiscent of the old GAN approach, where the generator and discriminator are trained together in an adversarial regime. The end result should be safer code on "vanilla" prompts. "Write an API that does x y z" should now contain the learnings from this loop, and the models should produce better code.
Works really well for every verifiable scenario. And as the models become better, they can also more reliably create environments that closely match real-world scenarios. If you also have some data from human devs (say you run a subsidised coding model for a few months), even better.
An example of turning a "normal" repo into a verifiable environment that I read recently in the Cursor blog: take a repo, ask an LLM to remove a feature, verify that the app still works w/o the feature, verify that the tests for that feature fail. Ask a generator to "add feature x". Verify with the original tests. If pass -> give carrot :)
The key is composition. Once you unlock a new capability, that gets implemented and incorporated into the next training run. Pretty neat, I would say, and the main driver for the recent increase in the breadth of capabilities for new models.
xeyownt 32 minutes ago [-]
Nice writeup. A practical example of a project, what was found, how it was found, the quality of the findings, reproducible.
vachanmn123 2 hours ago [-]
> Trying to work around Anthropic blocking security-related prompts does get pretty tiring though.
Didn't know this is a thing... interesting for a company that's marketing their Mythos so hard not allowing security prompts.
I am also curious how the cheaper Chinese models do, I have an Opencode Go plan, so I'll let 'em rip over the weekend, hopefully I get to see a few bugs!
keybored 8 minutes ago [-]
I don’t really care about posting in bold 20 bugs when it comes to a hobby project. (In before “Linux was just a hobby project”) No need to LLM post over what this tells us about the trajectory of society, oh my.
We can save that dialogue for finding bugs in widely used projects.
Edit: Something I tried to reply to a now-dead top level comment here: Whoever claims that new accounts alone is a signal for submission-boosting comments etc. needs to update their heuristics.
shandilyaharsh 2 hours ago [-]
sometimes i feel mythos is just that a myth
dv_dt 26 minutes ago [-]
The "too many security issues" meme feels like a form of product placement marketing. How many of the bugs found would have also been found if you said to a security team - ok you now have a project with independent time to go spelunking for bugs - this is your highest priority for the next month. Now do the same with a bunch of security teams across multiple organizations across the industry doing that at the same time. What is the differential in actuality with and without Mythos. The brilliant part is now those discoveries have a Anthropic mythos tag on them.
Even if it is marketing, at least there is some positive side effects of identified and closed security flaws.
nullbio 1 hours ago [-]
Agreed. Considering Anthropic had a sandbox bypass vulnerability in CC for a year, silently patched it, and still hasn't made a disclosure statement, no one on Earth should trust them or believe a word they say. https://www.securityweek.com/anthropic-silently-patches-clau...
Makiaveli 11 minutes ago [-]
[dead]
MarStudio 1 hours ago [-]
[dead]
grey-area 3 hours ago [-]
[flagged]
Rendered at 10:45:54 GMT+0000 (Coordinated Universal Time) with Vercel.
On the "server side" (i.e. training) you can use the current gen models to improve the training data by running many parallel environments with a similar loop as above. Then incorporate the new data and repeat. Reminiscent of the old GAN approach, where the generator and discriminator are trained together in an adversarial regime. The end result should be safer code on "vanilla" prompts. "Write an API that does x y z" should now contain the learnings from this loop, and the models should produce better code.
Works really well for every verifiable scenario. And as the models become better, they can also more reliably create environments that closely match real-world scenarios. If you also have some data from human devs (say you run a subsidised coding model for a few months), even better.
An example of turning a "normal" repo into a verifiable environment that I read recently in the Cursor blog: take a repo, ask an LLM to remove a feature, verify that the app still works w/o the feature, verify that the tests for that feature fail. Ask a generator to "add feature x". Verify with the original tests. If pass -> give carrot :)
The key is composition. Once you unlock a new capability, that gets implemented and incorporated into the next training run. Pretty neat, I would say, and the main driver for the recent increase in the breadth of capabilities for new models.
Didn't know this is a thing... interesting for a company that's marketing their Mythos so hard not allowing security prompts.
I am also curious how the cheaper Chinese models do, I have an Opencode Go plan, so I'll let 'em rip over the weekend, hopefully I get to see a few bugs!
We can save that dialogue for finding bugs in widely used projects.
Edit: Something I tried to reply to a now-dead top level comment here: Whoever claims that new accounts alone is a signal for submission-boosting comments etc. needs to update their heuristics.
Even if it is marketing, at least there is some positive side effects of identified and closed security flaws.