Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB (github.com)

157 points by quesomaster9000 5 hours ago | 38 comments

nineteen999 3 hours ago [-]

This couldn't be more perfectly timed .. I have an Unreal Engine game with both VT100 terminals (for running coding agents) and Z80 emulators, and a serial bridge that allows coding agents to program the CP/M machines:

https://i.imgur.com/6TRe1NE.png

Thank you for posting! It's unbelievable how someone sometimes just drops something that fits right into what you're doing. However bizarre it seems.

quesomaster9000 2 hours ago [-]

Oh dear, it seems we've... somehow been psychically linked...

I developed a browser-based CP/M emulator & IDE: https://lockboot.github.io/desktop/

I was going to post that instead, but wanted a 'cool demo' instead, and fell down the rabbit hole.

simonjgreen 19 minutes ago [-]

Super intrigued but annoyingly I can’t view imgur here

sixtyj 2 hours ago [-]

Connections: Alternative History of Technology by James Burke documents these "coincidences".

TeMPOraL 1 hours ago [-]

Those "coincidences" in Connections are really no coincidence at all, but path dependence. Breakthrough advance A is impossible or useless without prerequisites B and C and economic conditions D, but once B and C and D are in place, A becomes obvious next step.

rahen 31 minutes ago [-]

I love it, instant Github star. I wrote an MLP in Fortran IV for a punched card machine from the sixties (https://github.com/dbrll/Xortran), so this really speaks to me.

The interaction is surprisingly good despite the lack of attention mechanism and the limitation of the "context" to trigrams from the last sentence.

This could have worked on 60s-era hardware and would have completely changed the world (and science fiction) back then. Great job.

vedmakk 3 hours ago [-]

If one would train an actual secret (e.g. a passphrase) into such a model, that a user would need to guess by asking the right questions. Could this secret be easily reverse engineered / inferred by having access to models weights - or would it be safe to assume that one could only get to the secret by asking the right questions?

Kiboneu 2 hours ago [-]

I don’t know, but your question reminds me of this paper which seems to address it on a lower level: https://arxiv.org/abs/2204.06974

“Planting Undetectable Backdoors in Machine Learning Models”

“ … On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. …”

ronsor 2 hours ago [-]

> this secret be easily reverse engineered / inferred by having access to models weights

It could with a network this small. More generally this falls under "interpretability."

Dwedit 3 hours ago [-]

In before AI companies buy up all the Z80s and raise the prices to new heights.

nubinetwork 35 seconds ago [-]

Too late, they stopped being available last year.

roygbiv2 3 hours ago [-]

Awesome. I've just designed and built my own z80 computer, though right now it has 32kb ROM and 32kb RAM. This will definitely change on the next revision so I'll be sure to try it out.

wewewedxfgdf 3 hours ago [-]

RAM is very expensive right now.

tgv 2 hours ago [-]

We're talking kilobytes, not gigabytes. And it isn't DDR5 either.

boomlinde 20 minutes ago [-]

Yeah, even an average household can afford 40k of slow DRAM if they cut down on luxuries like food and housing.

orbital-decay 2 hours ago [-]

Pretty cool! I wish free-input RPGs of old had fuzzy matchers. They worked by exact keyword matching and it was awkward. I think the last game of that kind (where you could input arbitrary text when talking to NPCs) was probably Wizardry 8 (2001).

anonzzzies 2 hours ago [-]

Luckily I have a very large amount of MSX computers, zx, amstrad cpc etc and even one multiprocessor z80 cp/m machine for the real power. Wonder how gnarly this is going to perform with bankswitching though. Probably not good.

Peteragain 1 hours ago [-]

There are two things happening here. A really small LLM mechanism which is useful for thinking about how the big ones work, and a reference to the well known phenomenon, commonly dismissively referred to as a "trick", in which humans want to believe. We work hard to account for what our conversational partner says. Language in use is a collective cultural construct. By this view the real question is how and why we humans understand an utterance in a particular way. Eliza, Parry, and the Chomsky bot at http://chomskybot.com work on this principle. Just sayin'.

Zee2 4 hours ago [-]

This is super cool. Would love to see a Z80 simulator set up with these examples to play with!

Imustaskforhelp 28 minutes ago [-]

100% Please do this! I wish the same

magicalhippo 2 hours ago [-]

As far as I know, the last layer is very quantization-sensitive, and is typically not quantized, or quantized lightly.

Have you experimented with having it less quantized, and evaluated the quality drop?

Regardless, very cool project.

kouteiheika 58 minutes ago [-]

(Not OP)

It depends on the model, but from my experiments (quantizing one layer of a model to 2-bit and then training the model with that layer in 2-bit to fix the damage) the first layer is the most sensitive, and yes, the last layer is also sensitive too. The middle layers take the best to quantization.

Different components of a layer also have a different sensitivity; e.g. the MLP downscale block damages the model the most when quantized, while quantizing the Q projection in self attention damages the model the least.

a_t48 3 hours ago [-]

Nice - that will fit on a Gameboy cartridge, though bank switching might make it super terrible to run. Each bank is only 16k. You can have a bunch of them, but you can only access one bank at a time (well, technically two - bank 0 is IIRC always accessible).

vatary 2 hours ago [-]

It's pretty obvious this is just a stress test for compressing and running LLMs. It doesn't have much practical use right now, but it shows us that IoT devices are gonna have built-in LLMs really soon. It's a huge leap in intelligence—kind of like the jump from apes to humans. That is seriously cool.

acosmism 1 hours ago [-]

i'll echo that practicality only surfaces once it is apparent what can be done. yea this feels like running "DOOM on pregnancy test devices" type of moment

pdyc 3 hours ago [-]

interesting, i am wondering how far can it go if we remove some of these limitations but try to solve some extremely specific problem like generating regex based on user input? i know small models(270M range) can do that but can it be done in say < 10MB range?

Waterluvian 2 hours ago [-]

Generate an LLM that is designed to solve one extremely specific problem: answering the ultimate question of life, the universe, and everything.

Even with modern supercomputing the computation would be outpaced by the heat death of the universe, so token output must be limited to a single integer.

dirkt 3 hours ago [-]

Eliza's granddaughter.

alfiedotwtf 3 hours ago [-]

An LLM in a .com file? Haha made my day

teaearlgraycold 2 hours ago [-]

SLM

quesomaster9000 2 hours ago [-]

All the 'Small' language models and the 'TinyML' scene in general tend to bottom out at a million parameters, hence I though 'micro' is more apt at ~150k params.

jasonjmcghee 4 hours ago [-]

For future projects and/or for this project, there are many LLMs available more than good enough to generate that kind of synthetic data (20 Qs) with permissive terms of use. (So you don’t need to stress about breaking TOS / C&D etc)

Zardoz84 2 hours ago [-]

Meanwhile, Eliza was ported to BASIC and was run on many home computers in the 80s.

NooneAtAll3 1 hours ago [-]

did you measure token/s?

codetiger 3 hours ago [-]

Imagine, this working on a Gameboy, in those days. Would've sounded like magic

Sharlin 3 hours ago [-]

I don’t think this could beat an ELIZA-style bot in how magical it feels, given the extreme terseness of its replies.

lodovic 3 hours ago [-]

I love these thought experiments. Looking at the code size, it would have been possible for someone to come up with this back in the days, similar to the idea of a million monkeys on a typewriter eventually producing Shakespeare.

alfiedotwtf 3 hours ago [-]

And would have lasted 3 minutes.

Speaking of - I remember my first digital camera (Fujitsu 1Mb resolution using SmartMedia)… it used so much power that you could take 20-30 photos and then needed to replace all 4 batteries lol

Rendered at 10:41:08 GMT+0000 (Coordinated Universal Time) with Vercel.