If you represent the state as a 128-long vector of GF(2) elements, you can model the state transition function as a matrix multiplication.
This allows you to describe any output bit (at any offset in the output stream) as a function of the 128 initial state elements.
Treating the initial state vector as 128 unknown variables, you can solve for them (via gaussian elimination) with any 128 bits from the output stream, so long as you know their offsets in the output sequence.
Although, depending on what those offsets are there's a ~50% chance there's no unique solution, in which case you may need a few more bits. For the case of two consecutive 64-bit outputs, you get a unique solution.
For 128 samples, the gaussian elimination can be done in 2^21 steps, or only 2^14 steps if you can put a whole bit-vector in one register (which you can!)
If the offsets are known in advance, you can pre-compute the gaussian elimination work and end up with a constant matrix that, if multiplied by the leaked bits, obtains the initial state vector.
tptacek 4 hours ago [-]
If you follow the ChatGPT log, that's the angle he takes as well.
Retr0id 4 hours ago [-]
Oh that's fascinating, I didn't get that impression at all from the article itself.
viraptor 5 hours ago [-]
> I couldn’t believe that z3 was the best one can do
It probably never is, but if you don't care about a nice solution or don't care about anything past "is it possible?", then z3 and smt2 are amazing. You often don't even need to understand the whole problem. "take constraints, give solution" works great in those situations.
Sesse__ 4 hours ago [-]
I've tried to use Z3 to find hash collisions in a simple hash, and it was amazingly bad at it. It took forever, whereas just trying random values until some were (multi)colliding was sub-second :-)
NooneAtAll3 5 hours ago [-]
looking at the CVE report itself, Math.random() not being crypto-level seems to be known? - and vulnerability comes from Node.js using it for some crypto purpose
so OP simply did a good exercise for himself recreating exact weakness of it
tptacek 5 hours ago [-]
No, the post takes the attack from 5 observations down to 3.
delduca 4 hours ago [-]
I have recently replaced Lua's random for this implemetation
A word of caution. A few years ago we had a production impact event where customers were getting identical cookies (and so started seeing each others sessions). When I took a look at the code, what I found was that they were doing something very like your code - using a time() based seed and an PRNG.
Whenever we deployed new nginx configs, those servers would roll out and restart, getting _similar_ time() results in the seed. But the individual nginx workers? Their seeds were nearly identical. Not every call to the PRNG was meant for UUIDs, but enough were that disaster was inevitable.
The solution is to use a library that leverages libuuid (via ffi or otherwise). A "native lua" implementation is always going to miss the entropy sources available in your server and generate clashes if it's seeded with time(). (eg https://github.com/Kong/lua-uuid, https://github.com/bungle/lua-resty-uuid)
tptacek 3 hours ago [-]
Why would you ever use an insecure RNG to generate a cookie?
bazzargh 2 hours ago [-]
In the code I saw, at least twice in its history people had introduced a "pure lua" solution for speed, and were clearly unaware of the shotgun they'd just pointed at their feet. (as in, somebody saw the issue and fixed it, and then someone else _fixed it back_ before I came along).
But in case _I'm_ messing up here, I'll bow to your expertise: libuuid uses /dev/random, which uses a CSPRNG (ChaCha20) with entropy ingested via Blake2 from whatever sources the system can get, right?
We did actually do a bunch of before/after testing showing the collision rates (zero after), and I believe the cookie in question has been replaced with a third party identity system in the intervening years - but if we did it wrong, I'd like to know.
akerl_ 1 hours ago [-]
I think the question isn’t “why switch it to libuuid”, it’s “why is anybody ever setting it to a time-based non-CS PRNG”.
delduca 2 hours ago [-]
Thank you for your advise, I will update my blog with it.
Just FYI I only use this on a hidden n' seek game engine, so it is fine.
Aardwolf 4 hours ago [-]
Xorshift128+ is not a cryptographic rng though, so at least this isn't a cryptographic attack...
Should programming languages use cryptographic rngs like a ChaCha20 based one in their standard libraries to stop accidental use of non cryptographic rngs for cryptographic purposes? But that comes at the cost of speed
kbolino 3 hours ago [-]
Go has taken exactly this stance, at least since Go 1.22 from 18 months ago.
They settled on 8 rounds of ChaCha with 300 bytes of internal state to amortize the latency. The end result is something only 1.5-2x slower than (their particular flavor of) PCG [1]. It was deemed good enough to become the default.
I think some naming conventions could go a long way. If you want to import `fast_unsafe_random`, you might think twice.
thomasmg 3 hours ago [-]
I agree, why would you slow down things for everybody if it's only a problem for cryptographic purposes. Xorshift128+ etc are around 10 to 30 times faster than ChaCha20.
The challenge is things that don't _obviously_ need cryptographically secure generators. For example, do you need a secure generator for the seed of a hash table, or a sorting algorithm? (For those that do use a seed). Some will argue that yes, this is important. Until a few years ago, the hash tables used static hash algorithms without any randomization, but "hash flooding" changed that. I think that nowadays, still many hash table implementations don't use secure generators.
Then, there's secure and insecure hash functions. Secure hash functions like SHA-256 are (compared to non-secure functions) specially slow for short keys. There are "somewhat" secure hash function algorithms like SipHash that can be used for this purpose.
tptacek 3 hours ago [-]
I think people overthink this, and we should just have a library standard full-strength CSPRNG, and then people with fussy fast-randomness needs (Monte Carlo or whatever) can just pull in libraries. "Don't need secure random and can't use it" is a very niche problem space.
It'll never happen in Go though! They're over a decade committed to this library shape.
kstrauser 2 hours ago [-]
I strongly agree here. The default should be strong, “slow” randomness. If you know you need something different in your specific use case, and just can’t abide the safe version, import something else.
kstrauser 2 hours ago [-]
Good point about the hashing. Python does the right thing by making you select the one you want when writing your own code. If it had a default option, make that SHA-256 so that all users get the strong one by default. But yes, if you’re not actually doing crypto stuff, say if you only want to see if two locally generated files have the same content, there are lots of much faster alternatives.
Dylan16807 2 hours ago [-]
> why would you slow down things for everybody
Secure generators are still fast and the number of programs where you'd see a difference is very small.
tptacek 3 hours ago [-]
It is a cryptographic attack, just against an algorithm that isn't cryptographically strong. There's a lot of value in those.
chaboud 3 hours ago [-]
Perhaps put a warning in the name since the folks who don’t read the docs are the ones you’re trying to protect?
For example:
Math.RandomNotCrypto()
When someone uses that in production for cryptographic purposes (and, yes someone is going to do that), they have to wear a dunce cap to the office for a month.
Rendered at 18:00:05 GMT+0000 (Coordinated Universal Time) with Vercel.
This allows you to describe any output bit (at any offset in the output stream) as a function of the 128 initial state elements.
Treating the initial state vector as 128 unknown variables, you can solve for them (via gaussian elimination) with any 128 bits from the output stream, so long as you know their offsets in the output sequence.
Although, depending on what those offsets are there's a ~50% chance there's no unique solution, in which case you may need a few more bits. For the case of two consecutive 64-bit outputs, you get a unique solution.
For 128 samples, the gaussian elimination can be done in 2^21 steps, or only 2^14 steps if you can put a whole bit-vector in one register (which you can!)
If the offsets are known in advance, you can pre-compute the gaussian elimination work and end up with a constant matrix that, if multiplied by the leaked bits, obtains the initial state vector.
It probably never is, but if you don't care about a nice solution or don't care about anything past "is it possible?", then z3 and smt2 are amazing. You often don't even need to understand the whole problem. "take constraints, give solution" works great in those situations.
so OP simply did a good exercise for himself recreating exact weakness of it
https://nullonerror.org/2025/08/02/replacing-lua-s-math-rand...
Whenever we deployed new nginx configs, those servers would roll out and restart, getting _similar_ time() results in the seed. But the individual nginx workers? Their seeds were nearly identical. Not every call to the PRNG was meant for UUIDs, but enough were that disaster was inevitable.
The solution is to use a library that leverages libuuid (via ffi or otherwise). A "native lua" implementation is always going to miss the entropy sources available in your server and generate clashes if it's seeded with time(). (eg https://github.com/Kong/lua-uuid, https://github.com/bungle/lua-resty-uuid)
But in case _I'm_ messing up here, I'll bow to your expertise: libuuid uses /dev/random, which uses a CSPRNG (ChaCha20) with entropy ingested via Blake2 from whatever sources the system can get, right?
We did actually do a bunch of before/after testing showing the collision rates (zero after), and I believe the cookie in question has been replaced with a third party identity system in the intervening years - but if we did it wrong, I'd like to know.
Just FYI I only use this on a hidden n' seek game engine, so it is fine.
Should programming languages use cryptographic rngs like a ChaCha20 based one in their standard libraries to stop accidental use of non cryptographic rngs for cryptographic purposes? But that comes at the cost of speed
They settled on 8 rounds of ChaCha with 300 bytes of internal state to amortize the latency. The end result is something only 1.5-2x slower than (their particular flavor of) PCG [1]. It was deemed good enough to become the default.
[1]: https://go.dev/blog/chacha8rand#performance
The challenge is things that don't _obviously_ need cryptographically secure generators. For example, do you need a secure generator for the seed of a hash table, or a sorting algorithm? (For those that do use a seed). Some will argue that yes, this is important. Until a few years ago, the hash tables used static hash algorithms without any randomization, but "hash flooding" changed that. I think that nowadays, still many hash table implementations don't use secure generators.
Then, there's secure and insecure hash functions. Secure hash functions like SHA-256 are (compared to non-secure functions) specially slow for short keys. There are "somewhat" secure hash function algorithms like SipHash that can be used for this purpose.
It'll never happen in Go though! They're over a decade committed to this library shape.
Secure generators are still fast and the number of programs where you'd see a difference is very small.
For example: Math.RandomNotCrypto()
When someone uses that in production for cryptographic purposes (and, yes someone is going to do that), they have to wear a dunce cap to the office for a month.