Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Steering interpretable language models with concept algebra (guidelabs.ai)

33 points by luulinh90s 23 hours ago | 3 comments

giang_at_glai 16 hours ago [-]

Author here.

This post shows “concept algebra” on language model: inject, suppress, and compose human-understandable concepts at inference time (no retraining, no prompt engineering).

There’s an interactive demo on the post.

Would love feedback on: (1) what steering tasks you’d benchmark, (2) failure cases you’d want to see, (3) whether this kind of compositional control is useful in real products.

anon291 2 hours ago [-]

I would personally like some quantification of how good this is compared to just replacing the system prompt of an off the shelf 8B parameter language model.

The suppression bit is very powerful. I would like to see a quantification of how often a steered 'normal' language model will mention things you asked it to suppress vs how often this one does

giang_at_glai 1 hours ago [-]

We will share a technical write-up soon that addresses both of your questions: (1) steering vs. prompt engineering, and (2) how effectively our steering suppresses undesired generations.

If you have joined our waitlist, we will notify you as soon as it is available.

Rendered at 22:33:12 GMT+0000 (Coordinated Universal Time) with Vercel.