NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
A 20-Year-Old Algorithm Can Help Us Understand Transformer Embeddings (ai.stanford.edu)
Lerc 41 minutes ago [-]
There's a second half of a two hour video on YouTube which talks about creating embeddings using some pre transforms followed by SVD with some distance shenanigans,

https://www.youtube.com/watch?v=Z6s7PrfJlQ0&t=3084s

It's 4 years old and seems to be a bit of a hidden gem. Someone even pipes up at 1:26 to say "This is really cool. Is this written up somewhere?"

[snapshot of the code shown]

    %%time
    cooc = vectorizers.TokenCooccurrenceVectorizer(
        window_orientation="after",
        kernel_function="harmonic",
        min_document_occurrences=5,
        window_radius=20,
    ).fit(tokenized_news)
    
    context_after_matrix = cooc.transform(tokenized_news)
    context_before_matrix = context_after_matrix.transpose()

    cooc_matrix = scipy.sparse.hstack([context_before_matrix, context_after_matrix])
    cooc_matrix = sklearn.preprocessing.normalize(cooc_matrix, norm="max", axis=0)
    cooc_matrix = sklearn.preprocessing.normalize(cooc_matrix, norm="l1", axis=1)
    cooc_matrix.data = np.power(cooc_matrix.data, 0.25)

    u, s, v = scipy.sparse.linalg.svds(cooc_matrix, k=160)
    word_vectors = u @ scipy.sparse.diags(np.sqrt(s))

CPU times: user 3min 5s, sys: 20.2 s, total: 3min 25s

Wall time: 1min 26s

nighthawk454 29 minutes ago [-]
That’s Leland McInnes - author of UMAP, the widely-used dimension reduction tool
Lerc 16 minutes ago [-]
I know, I mentioned his name in a post last week, Figured doing so again might seem a bit fanboy-ish. I am kind-of a fan but mostly a fan of good explanations. He's just self-selecting for the group.
chaps 8 hours ago [-]
To the authors: Please expand your acronyms at least once! I had to stop reading to figure out what "KSVD" stands for.

Learning what it stands for* wasn't particularly helpful in this case, but defining the term would've kept me on your page.

*K-Singular Value Decomposition

jmount 6 hours ago [-]
Strongly agree. I even searched to see I wasn't missing it. I mean yeah "SVD" is likely singular value decomposition, but in this context you have other acronyms bouncing around your head (like support vector machine- just need to get rid of the m).
JSteph22 5 hours ago [-]
I'm surprised the authors just completely abandon the standard first-use notation for acronyms.
sdenton4 36 minutes ago [-]
This is great, and very relevant to some problems I've been looking around on white boards lately. Exceptionally well timed.
djoldman 7 hours ago [-]
westurner 2 hours ago [-]
snovv_crash 5 hours ago [-]
Basically find the primary eigenvectors.
sdenton4 37 minutes ago [-]
It's not, though...

In sparse coding, you're generally using an over-complete set of vectors which decompose the data into sparse activations.

So, if you have a dataset of hundred dimensional vectors, you want to find a set of vectors where each vector is well described as a combination of ~4 of the "basis" vectors.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 23:27:10 GMT+0000 (Coordinated Universal Time) with Vercel.