Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Microgpt explained interactively (growingswe.com)

138 points by growingswe 13 hours ago | 13 comments

politelemon 2 hours ago [-]

> By the end of training, the model produces names like "kamon", "karai", "anna", and "anton". None of them are copies from the dataset.

Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...

ayhanfuat 2 hours ago [-]

You are absolutely right. The whole post reads like AI generated.

jsheard 1 hours ago [-]

The rate they are posting new articles on random subjects is also a pretty indicative of a content mill.

In 3 days they've covered machine learning, geometry, cryptography, file formats and directory services.

re 1 hours ago [-]

I didn't get that sense from the prose; it didn't have the usual LLM hallmarks to me, though I'm not enough of an expert in the space to pick up on inaccuracies/hallucinations.

The "TRAINING" visualization does seem synthetic though, the graph is a bit too "perfect" and it's odd that the generated names don't update for every step.

butterisgood 1 hours ago [-]

ISWYDT

1 hours ago [-]

growingswe 1 hours ago [-]

Thanks, will fix

grey-area 14 minutes ago [-]

The original article from Karpathy: https://karpathy.github.io/2026/02/12/microgpt/

jmkd 13 minutes ago [-]

It says its tailored for beginners, but I don't know what kind of beginner can parse multiple paragraphs like this:

"How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is − log ⁡ ( � ) −log(p) where � p is the probability the model assigned to the correct token. This is called cross-entropy loss."

malnourish 45 minutes ago [-]

I read through this entire article. There was some value in it, but I found it to be very "draw the rest of the owl". It read like introductions to conceptual elements or even proper segues had been edited out. That said, I appreciated the interactive components.

davidw 10 minutes ago [-]

It started off nicely but before long you get

"The MLP (multilayer perceptron) is a two-layer feed-forward network: project up to 64 dimensions, apply ReLU (zero out negatives), project back to 16"

Which starts to feel pretty owly indeed.

I think the whole thing could be expanded to cover some more of it in greater depth.

windowshopping 1 hours ago [-]

The part that eludes me is how you get from this to the capability to debug arbitrary coding problems. How does statistical inference become reasoning?

For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.

fc417fc802 3 minutes ago [-]

Because it's not statistical inference on words or characters but rather stacked layers of statistical inference on ~arbitrarily complex semantic concepts which is then performed recursively.

ChrisArchitect 9 minutes ago [-]

Microgpt

https://news.ycombinator.com/item?id=47202708

nimbus-hn-test 2 hours ago [-]

[dead]

Rendered at 22:29:37 GMT+0000 (Coordinated Universal Time) with Vercel.