Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Krea 2: SOTA open-weights 12B image model (krea.ai)

154 points by mattnewton 1 days ago | 18 comments

dvrp 27 minutes ago [-]

Hello HN,

I am Diego Rodriguez, Co-founder & CTO at Krea.

We are releasing the weights and a _juicy_ technical report---at least given current industry standards. In it we describe data curation/captioning, model architecture, post-training, RL pipelines, prompt expansion, style references, and our infrastructure in great detail.

When it comes to theweights themselves, there's actually 2 releases: Krea 2 Turbo. This model is both guidance- and timestep- distilled for faster inference. Krea 2 RAW. This model is actually meant to be hackable/fine-tunable One of the things we think the (open) LLM community does well is release models in different sizes and also at different stages of the training pipelines; we are releasing two checkpoints at both the mid-training and post-training stage. This is rare in the image & multimedia community, so we can't help it but to feel proud of this release.

We are on par with Nano Banana in terms of image quality as per Artificial Analysis text-to-image benchmarks (https://artificialanalysis.ai/image/leaderboard/text-to-imag...).

We also attached a permissive license for individuals and small businesses.

Useful links:

- Marketing page around the OSS release: https://www.krea.ai/krea-2-open-source

- Huggingface model: https://www.krea.ai/krea-2/huggingface

- GitHub repository: https://www.krea.ai/krea-2/github

- Reddit AMA: https://www.reddit.com/r/StableDiffusion/comments/1udnm0a/we...

- Technical report: https://www.krea.ai/blog/krea-2-technical-report Thank you and I hope you enjoy this release---happy hacking!

Some of our team members will be answering questions since we are at the front page for now (thank you HN!).

Happy hacking!

mattnewton 1 days ago [-]

Hi HN, we're releasing weights for our latest text to image model and publishing this writeup on how we trained it in quite a bit of depth.

I hope there is something in the report for everyone, we included a fair bit on the actual training and data infrastructure usually not written about much, that I think will be interesting to people here. There's more that didn't fit, happy to answer questions!

ttul 4 hours ago [-]

This is a massive technical report for an open weights image gen model. As someone who has followed this space closely, it’s really cool to read about the behind-the-scenes experimentation and effort that went into the final product. I hope you will release some of the find tuning tools so the community can experiment with them as well and really push what the model’s capable of.

dvrp 30 seconds ago [-]

Thanks! You should definitely check out the r/stablediffusion sub-reddit; people are going crazy over it!

We also had 0-day support from people like Ostris and ComfyUI from the open source community

mattnewton 1 hours ago [-]

You can find some links and details in the GitHub readme for finetuning / LoRA support. Ostiris, musubi tuner, fal and hugging face diffusers are all day-0 supported :) https://github.com/krea-ai/krea-2

We recommend training off the undistilled, Raw checkpoint, and then applying the LoRA to the Turbo model for inference.

pwython 1 hours ago [-]

Looking forward to playing with Krea 2, I use Z-Image Turbo daily -- it has replaced my stock photo subscriptions, for realism and illustrations.

May I ask how much did the training cost you?

sangwulee 30 minutes ago [-]

A lot of coffee for sure. Regarding the training cost, it's hard to give a good estimate because we used a shared kubernetes cluster with inference + research workloads.

justinclift 4 hours ago [-]

Interesting item on the careers page btw. For anyone that knows what older school Mellanox was about, it might be your kind of thing: https://jobs.ashbyhq.com/krea/ebe94024-eef6-4306-a019-10072a... :D

kodablah 3 hours ago [-]

Turbo appears GGUF'd already: https://huggingface.co/Abiray/Krea-2-Turbo-GGUF

BoredPositron 2 hours ago [-]

It's a good model sadly the use of the qwen vae is a bit of a downer.

mattnewton 1 hours ago [-]

Krea 2 Large (on the website and api) was trained with the FLUX 2 VAE, if you want to test it out and push realism. After working with both I think the flux VAE has a slight edge in learning realistic textures but it's smaller than you might think, the Qwen VAE was overall very good in ablations and good at learning to produce a diverse set of styles.

BoredPositron 1 hours ago [-]

[flagged]

dang 7 minutes ago [-]

> You can't be serious.

Please edit out swipes from your HN comments, as the guidelines request: https://news.ycombinator.com/newsguidelines.html.

Edit: your account has unfortunately been breaking the site guidelines like this in other places as well (e.g. https://news.ycombinator.com/item?id=48567675). Can you please fix this? I don't want to ban you, but we've already had to ask you this before.

BoredPositron 4 minutes ago [-]

Kill the account dang.

mattnewton 29 minutes ago [-]

Definitely encourage you to test the models. We tried to optimize for realistic focus and not over-sharpening, which leads to a "hyper" AI-look. It's hard to benchmark because people generally prefer sharp, saturated orangish pictures all else equal, but I believe these are bad shortcuts for the model to learn realism.

BoredPositron 6 minutes ago [-]

Is my taste the problem, or am I simply holding it wrong? The qwen VAE's shortcomings are well documented, and Krea 2 produces the same blurry, airbrushed output as qwen image. Between the chaotic release and every interaction I've had with your team, I've grown to genuinely dislike your platform/company. Good luck.

mobiuscog 2 hours ago [-]

It's been mentioned by some that using the wan2.1 vae instead solves this. I haven't personally had time to try yet.

dvrp 20 minutes ago [-]

There is a lot of discourse about it on Reddit. Check the AMA link I put in the comment above for learning more. The basics is it wasn’t released when we started and we use it for internal models and hope to do further open source releases.

Rendered at 16:32:30 GMT+0000 (Coordinated Universal Time) with Vercel.