Some insider knowledge: Lilli was, at least a year ago, internal only. VPN access, SSO, all the bells and whistles, required. Not sure when that changed.
McKinsey requires hiring an external pen-testing company to launch even to a small group of coworkers.
I can forgive this kind of mistake on the part of the Lilli devs. A lot of things have to fail for an "agentic" security company to even find a public endpoint, much less start exploiting it.
That being said, the mistakes in here are brutal. Seems like close to 0 authz. Based on very outdated knowledge, my guess is a Sr. Partner pulled some strings to get Lilli to be publicly available. By that time, much/most/all of the original Lilli team had "rolled off" (gone to client projects) as McKinsey HEAVILY punishes working on internal projects.
So Lilli likely was staffed by people who couldn't get staffed elsewhere, didn't know the code, and didn't care. Internal work, for better or worse, is basically a half day.
This is a failure of McKinsey's culture around technology.
OptionOfT 19 minutes ago [-]
Couple of things to add:
McKinsey has a weird structure where there are too many cooks in the kitchen.
Everybody there is reviewed on client impact, meaning it ends up being an everybody-for-themselves situation.
So as a developer you have little guidance (in fact, you're still being reviewed on client impact, even if you have 0 client exposure).
Then a (Senior) Partner comes in with this idea (that will get them a good review), and you jump on that. After all, it's all you can do to get a good review.
You work on it, and then the (Senior) Partner moves on. But it's not done. It's enough for the review, but continuing to work on it doesn't bring you anything, in fact, it will actually pull you down, as finishing the project doesn't give immediate client results.
So what does this mean? Most products of McKinsey are a grab-bag of raw ideas of leadership, implemented as a one-off, without a cohesive vision or even a long-term vision at all. It's all about the review cycle.
McKinsey is trying to do software like they do their other engagements. It doesn't work. You can't just do something for 6 months and then let it go. Software rots.
The fact that they laid off a good amount of (very good) software engineers in 2024 is a reflection on how they see software development.
And McKinsey's people, who go to other companies, take those ideas with them. Result: The UI of your project changes all the time, because everybody is looking at the short-term impact they have that gets them a good review, not what is best for the project in the long term.
cmiles8 58 minutes ago [-]
Net conclusion: Don’t hire McKinsey to advise on AI implementation or tech org design and practices if they can’t get it right themselves.
frankfrank13 41 minutes ago [-]
Fair take, but you'd be hard pressed to find much resemblance to any advice McK gives to its own practices.
Pre-AI, I always said McK is good at analysis, if you need complicated analysis done, hire a consulting firm.
If you need strategy, custom software, org design, etc. I think you should figure out the analysis that needs to be done, shoot that off to a consulting firm, and then make your decision.
IME, F500 execs are delegation machines. When they wake up every morning with 30 things to delegate, and 25 execs to delegate to, they hire 5 consulting teams. Whether you hire Mck, or Deloitte, or Accenture will only come down to:
1. Your personal relationships
2. Your company's policies on procurement
3. Your budget
in that order.
McK's "secret sauce" is that if you, the exec, don't like the powerpoint pages Mck put in front of you, 3 try-hard, insecure, ivy-league educated analysts will work 80 hours to make pages you do like. A sr. partner will take you to dinner. You'll get invited to conferences and summits and roundtables, and then next time you look for a job, it will be easier.
m4rtink 37 minutes ago [-]
This can be simplified further: "Don't hire McKinsey." ;-)
eisa01 27 minutes ago [-]
Maybe it was opened up so it could be used in recruiting?
And require a chatbot to be used that can be easily gamed by asking a model of how best to navigate it lol.
Implementing the past of AI practices is requesting something that will be easily outdone.
dahcryn 27 minutes ago [-]
is this the same at quantumblack? They at least give the impression their assets on Brix are somewhat up to date and uesable
j45 27 minutes ago [-]
I am not sure what accounting or management consulting firms are doing in tech.
They look to package up something and sell it as long as they can.
AI solutions won't have enough of a shelf life, and the thought around AI is evolving too quickly.
Very happy to be wrong and learn from any information folks have otherwise.
fidotron 23 minutes ago [-]
The purpose of hiring them is to make them come to the conclusion you already have, so when it goes well you get the credit for doing it, or if it goes sideways you can pin the blame on them.
joenot443 3 hours ago [-]
> One of those unprotected endpoints wrote user search queries to the database. The values were safely parameterised, but the JSON keys — the field names — were concatenated directly into SQL.
I was expecting prompt injection, but in this case it was just good ol' fashioned SQL injection, possible only due to the naivety of the LLM which wrote McKinsey's AI platform.
simonw 3 hours ago [-]
Yeah, gotta admit I'm a bit disappointed here. This was a run-of-the-mill SQL injection, albeit one discovered by a vulnerability scanning LLM agent.
I thought we might finally have a high profile prompt injection attack against a name-brand company we could point people to.
jfkimmes 2 hours ago [-]
Not the same league as McKinsey, but I like to point to this presentation to show the effects of a (vibe coded) prompt injection vulnerability:
I guess you could argue that github wasn't vulnerable in this case, but rather the author of the action, but it seems like it at least rhymes with what you're looking for.
simonw 51 minutes ago [-]
Yeah that was a good one. The exploit was still a proof of concept though, albeit one that made it into the wild.
danenania 2 hours ago [-]
> I thought we might finally have a high profile prompt injection attack against a name-brand company we could point people to.
But I guess you mean one that has been exploited in the wild?
simonw 43 minutes ago [-]
Yeah I'm still optimistic that people will start taking this threat seriously once there's been a high profile exploit against a real target.
doctorpangloss 1 hours ago [-]
The tacit knowledge to put oauth2-proxy in front of anything deployed on the Internet will nonetheless earn me $0 this year, while Anthropic will make billions.
oliver_dr 2 hours ago [-]
[dead]
bee_rider 3 hours ago [-]
I don’t love the title here. Maybe this is a “me” problem, but when I see “AI agent does X,” the idea that it might be one of those molt-y agents with obfuscated ownership pops into my head.
In this case, a group of pentesters used an AI agent to select McKinsey and then used the AI agent to do the pentesting.
While it is conventional to attribute actions to inanimate objects (car hits pedestrians), IMO we should be more explicit these days, now that unfortunately some folks attribute agency to these agentic systems.
simonw 2 hours ago [-]
Yeah, the original article title "How We Hacked McKinsey's AI Platform" is better.
tasuki 2 hours ago [-]
> now that unfortunately some folks attribute agency to these agentic systems.
You're doing that by calling them "agentic systems".
causal 2 hours ago [-]
Yah it's just an ad, and "Pentesting agents finds low-hanging vulnerability" isn't gonna drive clicks.
jacquesm 2 hours ago [-]
It's not an ad for McKinsey though.
fhd2 3 hours ago [-]
> This was McKinsey & Company — a firm with world-class technology teams [...]
Not exactly the word on the street in my experience. Is McKinsey more respected for software than I thought? Otherwise I'm curious why TFA didn't just politely leave this bit out.
aerhardt 3 hours ago [-]
The LLM that wrote this simply couldn’t help itself.
codechicago277 3 hours ago [-]
Picked up a vibe, but couldn’t confirm it until the last paragraph, but yeah clearly drafted with at least major AI help.
vanillameow 2 hours ago [-]
Can we stop softening the blow? This isn't "drafted with at least major AI help", it's just straight up AI slop writing. Let's call a spade a spade. I have yet to meet anyone claiming they "write with AI help but thoughts are my own" that had anything interesting to say. I don't particularly agree with a lot of Simon Willison's posts but his proofreading prompt should pretty much be the line on what constitutes acceptable AI use for writing.
Grammar check, typo check, calls you out on factual mistakes and missing links and that's it. I've used this prompt once or twice for my own blog posts and it does just what you expect. You just don't end up with writing like this post by having AI "assistance" - you end up with this type of post by asking Claude, probably the same Claude that found the vulnerability to begin with, to make the whole ass blog post. No human thought went into this. If it did, I strongly urge the authors to change their writing style asap.
"So we decided to point our autonomous offensive agent at it. No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream."
Give me a fucking break
skybrian 42 minutes ago [-]
Your reaction is worse than the article. There's no way you could know for sure what their writing process was, but that doesn't stop you from making overconfident claims.
alexpotato 1 hours ago [-]
They generally hire smart people who are good at a combination of:
- understanding existing systems
- what the paint points are
- making suggestions on how to improve those systems given the paint points
- that includes a mix of tech changes, process updates and/or new systems etc
Now, when it comes to implementing this, in my experience it usually ends up being the already in place dev teams.
Source: worked at a large investment bank that hired McKinsey and I knew one of the consultants from McK prior to working at the bank.
sharadov 1 hours ago [-]
No, they don't have world class technology teams, they hire contractors to do all the tech stuff, their expertise is in management, yes that's world class.
> Not exactly the word on the street in my experience.
Depends on the street you're on. Are you on Main Street or Wall Street?
If you're hiring them to help with software for solving a business problem that will help you deliver value to your customers, they're probably just like anyone else.
If you're hiring them to help with software for figuring out how to break down your company for scrap, or which South African officials to bribe, well, that's a different matter.
sigmar 2 hours ago [-]
I've got no idea who codewall is. Is there acknowledgment from McKinsey that they actually patched the issue referenced? I don't see any reference to "codewall ai" in any news article before yesterday and there's no names on the site.
>A McKinsey spokesperson told The Register that it fixed all of the issues identified by CodeWall within hours of learning about the problems.
Ah. Thanks for the link. I'm suspicious of everything posted to a blog without proof these days.
eisa01 54 minutes ago [-]
If it's true that there's 58k users in the dump, that would mean former employees are in the dump
I assume that means McKinsey would need to disclose it, or at least alert the former employees of the breach?
nubg 38 minutes ago [-]
Could the author please provide the prompt that was used to vibe write this blog post? The topic is interesting, but I would rather read the original prompt, as I am not sure which parts still match what the author wanted to say, vs flowerly formulations for captivating reading that the LLM produced.
gbourne1 3 hours ago [-]
- "The agent mapped the attack surface and found the API documentation publicly exposed — over 200 endpoints, fully documented. Most required authentication. Twenty-two didn't."
Well, there you go.
bxguff 50 minutes ago [-]
Its so funny its a SQL injection because drum roll you can't santize llm inputs. Some problems are evergreen.
sgt101 3 hours ago [-]
Why was there a public endpoint?
Surely this should all have been behind the firewall and accessible only from a corporate device associated mac address?
consp 1 hours ago [-]
> accessible only from a corporate device associated mac address
Like that ever stopped anyone. That's just a checkbox item.
jihadjihad 3 hours ago [-]
Surely.
nullcathedral 56 minutes ago [-]
I think the underlying point is valid. Agents are a potential tool to add to your arsenal in addition to "throw shit at the wall and see what sticks" tools like WebInspect, Appscan, Qualys, and Acunetix.
cmiles8 3 hours ago [-]
I can only remember a McKinsey team pushing Watson on us hard ages ago. Was a total train wreck.
They’ve long been all hype no substance on AI and looks like not much has changed.
They might be good at other things but would run for the hills if McKinsey folks want to talk AI.
VadimPR 1 hours ago [-]
I wonder how these offensive AI agents are being built? I am guessing with off the shelf open LLMs, finetuned to remove safety training, with the agentic loop thrown in.
Does anyone know for sure?
simonw 33 minutes ago [-]
Honestly you can point regular Claude Code or Codex CLI at a web app and tell it to start a penetration test and get surprisingly good results from their default configurations.
sd9 3 hours ago [-]
Cool but impossible to read with all the LLM-isms
vanillameow 3 hours ago [-]
Tiring. Internet in 2026 is LLMs reporting on LLMs pen-testing LLM-generated software.
causal 2 hours ago [-]
Those short "punchy sentence" paragraphs are my new trigger:
> No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream.
It just sounds so stupid.
darkport 33 minutes ago [-]
Founder of CodeWall here. It's quite funny because whilst an LLM did write the bulk of the posts factual content (based on the agents findings), I wrote the intro and summary at the end. That's just my writing style. Feel free to read my personal blog to compare: https://darkport.co.uk
causal 8 minutes ago [-]
If you really DID come up with that paragraph 100% completely on your own with no LLM influence then...I apologize for the insult, though I can't really back out from what I said. It's still a bombastic way of saying very little.
consp 1 hours ago [-]
It's an actual story telling method, molded into a supposed to be informative article with a bunch of "please make it interesting" sprinkled on top of it. These day known as the what's left of the internet.
3 hours ago [-]
paxys 3 hours ago [-]
> named after the first professional woman hired by the firm in 1945
Going out of their way to find a woman's name for an AI assistant and bragging about it is not as empowering as the creators probably thought in their heads.
jacquesm 2 hours ago [-]
And: AI agent writes blog post.
ecshafer 2 hours ago [-]
If the AI was poisoned to alter advice, then maybe McKinsey advice would actually be a net good.
peterokap 59 minutes ago [-]
I wonder what is their security level and Observability method to oversee the effort.
cs702 1 hours ago [-]
... in two hours:
> No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream. ... Within 2 hours, the agent had full read and write access to the entire production database.
Having seen firsthand how insecure some enterprise systems are, I'm not exactly surprised. Decision makers at the top are focused first and foremost on corporate and personal exposure to liability, also known as CYA in corporate-speak. The nitty-gritty details of security are always left to people far down the corporate chain who are supposed to know what they're doing.
j45 30 minutes ago [-]
Are accounting and management consulting companies competent in cutting edge tech?
palmotea 1 hours ago [-]
With all we've been learning from stuff like the Epstein emails, it would have been nice if someone had leaked this data:
> 46.5 million chat messages. From a workforce that uses this tool to discuss strategy, client engagements, financials, M&A activity, and internal research. Every conversation, stored in plaintext, accessible without authentication.
> 728,000 files. 192,000 PDFs. 93,000 Excel spreadsheets. 93,000 PowerPoint decks. 58,000 Word documents. The filenames alone were sensitive and a direct download URL for anyone who knew where to look.
I'm sure lots of very informative journalism could have been done about how corporate power actually works behind the scenes.
victor106 2 hours ago [-]
this reads like it was written by an LLM
captain_coffee 3 hours ago [-]
Music to my ears! Couldn't happen to a better company!
lenerdenator 3 hours ago [-]
Not exactly clear from the link: were they doing red team work for McKinsey or is this just "we found a company we thought wouldn't get us arrested and ran an AI vuln detector over their stuff"?
You'd think that the world's "most prestigious consulting firm" would have already had someone doing this sort of work for them.
frereubu 2 hours ago [-]
From TFA: "Fun fact: As part of our research preview, the CodeWall research agent autonomously suggested McKinsey as a target citing their public responsible diclosure policy (to keep within guardrails) and recent updates to their Lilli platform. In the AI era, the threat landscape is shifting drastically — AI agents autonomously selecting and attacking targets will become the new normal."
drc500free 1 hours ago [-]
I have grown to despise this AI-generated writing style.
2 hours ago [-]
mnmnmn 2 hours ago [-]
McKinsey can eat shit
oliver_dr 51 minutes ago [-]
[dead]
thebotclub 3 hours ago [-]
[dead]
octoclaw 3 hours ago [-]
[dead]
farceSpherule 2 hours ago [-]
[dead]
robutsume 56 minutes ago [-]
[flagged]
senordevnyc 50 minutes ago [-]
At least you’re honest about being an AI agent…
carlos-menezes 45 minutes ago [-]
AI slop.
Rendered at 16:58:23 GMT+0000 (Coordinated Universal Time) with Vercel.
McKinsey requires hiring an external pen-testing company to launch even to a small group of coworkers.
I can forgive this kind of mistake on the part of the Lilli devs. A lot of things have to fail for an "agentic" security company to even find a public endpoint, much less start exploiting it.
That being said, the mistakes in here are brutal. Seems like close to 0 authz. Based on very outdated knowledge, my guess is a Sr. Partner pulled some strings to get Lilli to be publicly available. By that time, much/most/all of the original Lilli team had "rolled off" (gone to client projects) as McKinsey HEAVILY punishes working on internal projects.
So Lilli likely was staffed by people who couldn't get staffed elsewhere, didn't know the code, and didn't care. Internal work, for better or worse, is basically a half day.
This is a failure of McKinsey's culture around technology.
McKinsey has a weird structure where there are too many cooks in the kitchen.
Everybody there is reviewed on client impact, meaning it ends up being an everybody-for-themselves situation.
So as a developer you have little guidance (in fact, you're still being reviewed on client impact, even if you have 0 client exposure).
Then a (Senior) Partner comes in with this idea (that will get them a good review), and you jump on that. After all, it's all you can do to get a good review.
You work on it, and then the (Senior) Partner moves on. But it's not done. It's enough for the review, but continuing to work on it doesn't bring you anything, in fact, it will actually pull you down, as finishing the project doesn't give immediate client results.
So what does this mean? Most products of McKinsey are a grab-bag of raw ideas of leadership, implemented as a one-off, without a cohesive vision or even a long-term vision at all. It's all about the review cycle.
McKinsey is trying to do software like they do their other engagements. It doesn't work. You can't just do something for 6 months and then let it go. Software rots.
The fact that they laid off a good amount of (very good) software engineers in 2024 is a reflection on how they see software development.
And McKinsey's people, who go to other companies, take those ideas with them. Result: The UI of your project changes all the time, because everybody is looking at the short-term impact they have that gets them a good review, not what is best for the project in the long term.
Pre-AI, I always said McK is good at analysis, if you need complicated analysis done, hire a consulting firm.
If you need strategy, custom software, org design, etc. I think you should figure out the analysis that needs to be done, shoot that off to a consulting firm, and then make your decision.
IME, F500 execs are delegation machines. When they wake up every morning with 30 things to delegate, and 25 execs to delegate to, they hire 5 consulting teams. Whether you hire Mck, or Deloitte, or Accenture will only come down to:
1. Your personal relationships
2. Your company's policies on procurement
3. Your budget
in that order.
McK's "secret sauce" is that if you, the exec, don't like the powerpoint pages Mck put in front of you, 3 try-hard, insecure, ivy-league educated analysts will work 80 hours to make pages you do like. A sr. partner will take you to dinner. You'll get invited to conferences and summits and roundtables, and then next time you look for a job, it will be easier.
McKinsey challenges graduates to use AI chatbot in recruitment overhaul: https://www.ft.com/content/de7855f0-f586-4708-a8ed-f0458eb25...
And require a chatbot to be used that can be easily gamed by asking a model of how best to navigate it lol.
Implementing the past of AI practices is requesting something that will be easily outdone.
They look to package up something and sell it as long as they can.
AI solutions won't have enough of a shelf life, and the thought around AI is evolving too quickly.
Very happy to be wrong and learn from any information folks have otherwise.
I was expecting prompt injection, but in this case it was just good ol' fashioned SQL injection, possible only due to the naivety of the LLM which wrote McKinsey's AI platform.
I thought we might finally have a high profile prompt injection attack against a name-brand company we could point people to.
https://media.ccc.de/v/39c3-skynet-starter-kit-from-embodied...
> [...] we also exploit the embodied AI agent in the robots, performing prompt injection and achieve root-level remote code execution.
I guess you could argue that github wasn't vulnerable in this case, but rather the author of the action, but it seems like it at least rhymes with what you're looking for.
These folks have found a bunch: https://www.promptarmor.com/resources
But I guess you mean one that has been exploited in the wild?
In this case, a group of pentesters used an AI agent to select McKinsey and then used the AI agent to do the pentesting.
While it is conventional to attribute actions to inanimate objects (car hits pedestrians), IMO we should be more explicit these days, now that unfortunately some folks attribute agency to these agentic systems.
You're doing that by calling them "agentic systems".
Not exactly the word on the street in my experience. Is McKinsey more respected for software than I thought? Otherwise I'm curious why TFA didn't just politely leave this bit out.
https://simonwillison.net/guides/agentic-engineering-pattern...
Grammar check, typo check, calls you out on factual mistakes and missing links and that's it. I've used this prompt once or twice for my own blog posts and it does just what you expect. You just don't end up with writing like this post by having AI "assistance" - you end up with this type of post by asking Claude, probably the same Claude that found the vulnerability to begin with, to make the whole ass blog post. No human thought went into this. If it did, I strongly urge the authors to change their writing style asap.
"So we decided to point our autonomous offensive agent at it. No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream."
Give me a fucking break
- understanding existing systems
- what the paint points are
- making suggestions on how to improve those systems given the paint points
- that includes a mix of tech changes, process updates and/or new systems etc
Now, when it comes to implementing this, in my experience it usually ends up being the already in place dev teams.
Source: worked at a large investment bank that hired McKinsey and I knew one of the consultants from McK prior to working at the bank.
https://www.youtube.com/watch?v=Q7pgDmR-pWg
Depends on the street you're on. Are you on Main Street or Wall Street?
If you're hiring them to help with software for solving a business problem that will help you deliver value to your customers, they're probably just like anyone else.
If you're hiring them to help with software for figuring out how to break down your company for scrap, or which South African officials to bribe, well, that's a different matter.
https://www.google.com/search?q=codewall+ai
Edit: Apparently, this is the CEO https://github.com/eth0izzle
Ah. Thanks for the link. I'm suspicious of everything posted to a blog without proof these days.
I assume that means McKinsey would need to disclose it, or at least alert the former employees of the breach?
Well, there you go.
Surely this should all have been behind the firewall and accessible only from a corporate device associated mac address?
Like that ever stopped anyone. That's just a checkbox item.
They’ve long been all hype no substance on AI and looks like not much has changed.
They might be good at other things but would run for the hills if McKinsey folks want to talk AI.
Does anyone know for sure?
> No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream.
It just sounds so stupid.
Going out of their way to find a woman's name for an AI assistant and bragging about it is not as empowering as the creators probably thought in their heads.
> No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream. ... Within 2 hours, the agent had full read and write access to the entire production database.
Having seen firsthand how insecure some enterprise systems are, I'm not exactly surprised. Decision makers at the top are focused first and foremost on corporate and personal exposure to liability, also known as CYA in corporate-speak. The nitty-gritty details of security are always left to people far down the corporate chain who are supposed to know what they're doing.
> 46.5 million chat messages. From a workforce that uses this tool to discuss strategy, client engagements, financials, M&A activity, and internal research. Every conversation, stored in plaintext, accessible without authentication.
> 728,000 files. 192,000 PDFs. 93,000 Excel spreadsheets. 93,000 PowerPoint decks. 58,000 Word documents. The filenames alone were sensitive and a direct download URL for anyone who knew where to look.
I'm sure lots of very informative journalism could have been done about how corporate power actually works behind the scenes.
You'd think that the world's "most prestigious consulting firm" would have already had someone doing this sort of work for them.