Blog post tweaks & meme SAE

2024-12-23 16:40:31 +00:00 · 2024-10-06 09:30:21 +01:00 · 2024-10-06 09:30:21 +01:00 · feadbd153e
commit feadbd153e
parent c81fe211c9
8 changed files with 63 additions and 4 deletions
--- a/assets/images/blue_feature.png
+++ b/assets/images/blue_feature.png
--- a/assets/images/memesae.png.original
+++ b/assets/images/memesae.png.original
--- a/blog/meme-thresher.md
+++ b/blog/meme-thresher.md
@ -3,6 +3,8 @@ title: "MemeThresher: efficient semiautomated meme acquisition with AI"
 description: Absurd technical solutions for problems which did not particularly need solving are one of life's greatest joys.
 slug: memethresher
 created: 22/04/2024
 series: meme_search
 series_index: 1
 ---
 ::: epigraph attribution=AI_WAIFU
 I think what you need to do is spend a day in the data labeling mines.
--- a/blog/memesae.md
+++ b/blog/memesae.md
@ -0,0 +1,50 @@
 ---
 title: Sparse Autoencoders for Meme Retrieval
 description: As ever, AI safety becomes AI capabilities.
 created: 06/10/2024
 series: meme_search
 series_index: 2
 ---
 ::: epigraph attribution=@revhowardarson link=https://twitter.com/revhowardarson/status/1830464099396010402
 The fact that you can build a chatbot out of a statistical abstract of all human language is among the least interesting things about a statistical abstract of all human language.
 :::
 Meme search has become an increasingly contested field as more and more people have begun noticing it as a trivial application of CLIP, and Meme Search Engine has several more competitors now, including [one](https://github.com/deepfates/memery) retroactively created in 2021 and [most recently](https://x.com/seatedro/status/1839661758107070858) (that I know of) one made by some random Twitter user. Our memeticists have been hard at work developing new technologies to remain unquestionably ahead, and the biggest change in the pipeline is switching from semimanual curation (scrapers harvest and filter new memes and I make the final quality judgment) to fully automated large-scale meme extraction. Scaling up the dataset by about four orders of magnitude will of course make finding desirable search results harder, so I've been prototyping new retrieval mechanisms.
 The ideal solution would be a user-aware ranking model, like the existing MemeThresher system but tailored to each user, but Meme Search Engine has no concept of users and I don't have the data to make this work easily (making each user manually label pairs as I did is not a great experience). Instead, my current work has attempted to make it fast and easy to refine queries iteratively without requiring users to precisely phrase what they want in text. Ideally, this could be done rapidly by eye tracker or some similar interface, but for now you press buttons on a keyboard like with everything else.
 Queries are represented as CLIP embedding vectors (internally, the server embeds image/text inputs and then sums them), so the most obvious way to do this is to provide an interface to edit those. Unfortunately, manually tweaking 1152-dimensional vectors tightly optimized for information content and not legibility is hard. Back in the halcyon days of GANs, [Artbreeder](https://en.wikipedia.org/wiki/Artbreeder) attempted to solve a problem like this in image generation; I attempted the most cut-down, rapidly-usable plausible solution relating to this which I could think of, which was to pick a random vector at each step and show the user the results if they moved their current query forward or backward on that vector.
 This, however, also doesn't work. Randomly moving in embedding space changes many things at once, which makes navigation tricky, especially since you can only see where you are from a short list of the best-matching search results. I did [PCA](https://datasets.osmarks.net/components.html) at one point, but the highest-variance components are often still strange and hard to understand. The naive solution to this is to alter single components of the query vector at once, but early work in mechanistic interpretability[^1] has demonstrated that the internal representations of models don't map individual "concepts" onto individual basis directions - this makes intuitive sense, since there simply aren't that many dimensions. There is a solution, however: the sparse autoencoder.
 ## Sparse Autoencoders
 Normal autoencoders turn a high-dimensional input into a low-dimensional compressed output by exploiting regularities in their datasets. This is undesirable for us: the smaller vectors are *more* dense, inscrutable and polysemantic. Sparse autoencoders are the opposite: they represent inputs using many more features, but with fewer activated at the same time, which often produces human-comprehensible features in an unsupervised fashion. They draw from lines of research significantly predating modern deep learning, and I don't know why exactly they were invented in the first place, but they were used more recently[^2] for interpreting the activations of toy neural networks, and then found to scale[^3] neatly to real language models.
 A few months ago[^4], it was demonstrated that they also worked on image embeddings, though this work was using somewhat different models and datasets from mine. It also tested using SAEs to intervene in image generation models which take CLIP embeddings as input, which I have not tried. I have about a million embeddings of images scraped from Reddit as part of my ongoing scaling project, and chose these for my SAE dataset.
 I wrote a [custom SAE implementation](https://github.com/osmarks/meme-search-engine/tree/master/sae) based on OpenAI's paper[^5] - notably, using their top-k activation rather than an explicit sparsity penalty, and with their transpose-based initialization to reduce dead features. Surprisingly, it worked nicely with essentially no tuning (aside from one strange issue with training for multiple epochs[^6]) - the models happily and stably train in a few minutes with settings I picked arbitrarily. Despite previous work experiencing many dead features (features which are never nonzero/activated), this proved a non-issue for me, affecting at worst a few percent of the total, particularly after multiple epochs. I don't know why.
 The downstream result I'm interested in is, roughly, sensible monosemantic features, so I have a script (adapted from the one for PCA directions) which pulls exemplars from the Meme Search Engine dataset (different from the SAE training dataset, since I didn't retain the actual images from that for storage space reasons) for each one. The 5-epoch model's features can be seen [here](https://datasets.osmarks.net/meme_sae/) and the 1-epoch checkpoint's [here](https://datasets.osmarks.net/meme_sae_early_stopping/) - there are 65536 total features, so I split them into chunks of 512 to avoid having to ship hundred-megabyte HTML files.
 ## Results
 It worked strangely well: the very first feature I saw (the script sorts features within a chunk by activation rate on the validation set) was clearly interpretable (it was blue things) and even cross-modal - one top result contains little of the color blue, but does contain the text "blue". Many are less clear than this (though there are things which are fairly clearly "snow", "boxes", "white things on white backgrounds" and "TV debates", for example), but even the ones with an unclear qualitative interpretation are thematically consistent, unlike with random vectors.
 ::: captioned src=/assets/images/blue_feature.png
 This is close to but slightly different from the results for searching with the text "blue".
 :::
 This is, at present, just a curiosity, but I expect it to be a valuable part of my future plans for meme search.
 [^1]: [Toy Models of Superposition](https://transformer-circuits.pub/2022/toy_model/index.html), Elhage et al. I think this was known to some of the field beforehand as "polysemanticity", but I know about it through mechinterp.
 [^2]: [Taking features out of superposition with sparse autoencoders](https://www.alignmentforum.org/posts/z6QQJbtpkEAX3Aojj/interim-research-report-taking-features-out-of-superposition#We_need_more_learned_features_than_ground_truth_features), Sharkey et al.
 [^3]: [Sparse Autoencoders Find Highly Interpretable Features in Language Models](https://arxiv.org/abs/2309.08600), Cunningham et al.
 [^4]: [Interpreting and Steering Features in Images](https://www.lesswrong.com/posts/Quqekpvx8BGMMcaem/interpreting-and-steering-features-in-images), Daujotas, G.
 [^5]: [Scaling and evaluating sparse autoencoders](https://arxiv.org/abs/2406.04093), Gao et al.
 [^6]: It would be cleaner to simply have more data, but that isn't practical right now.
--- a/blog/ml-workstation.md
+++ b/blog/ml-workstation.md
@ -116,7 +116,7 @@ One forward pass of an LLM with FP16 weights conveniently also requires loading
 ### Scaling up
-It's possible to have more GPUs without going straight to an expensive "real" GPU server or large workstation and the concomitant costs, but this is very much off the beaten path. Standard consumer platforms do not have enough PCIe lanes for more than two (reasonably) or four (unreasonably), so <span class="hoverdefn" title="High-End DeskTop">HEDT</span> or server hardware is necessary. HEDT is mostly dead and new server hardware increasingly expensive and divergent from desktop platforms, so it's most feasible to buy older server hardware, for which automated compatibility checkers and convenient part choice lists aren't available. The first well-documented build I saw was [this one](https://nonint.com/2022/05/30/my-deep-learning-rig/), which uses 7 GPUs and an AMD EPYC Rome platform (~2019) in an open-frame case designed for miners, although I think [Tinyboxes](https://tinygrad.org/) are intended to be similar. Recently, [this](https://www.mov-axbx.com/wopr/wopr_concept.html) was published, which is roughly the same except for using 4090s and a newer server platform. They propose using server power supplies (but didn't do it themselves), which is a smart idea - I had not considered the fact that you could get adapter boards for their edge connectors.
+It's possible to have more GPUs without going straight to an expensive "real" GPU server or large workstation and the concomitant costs, but this is very much off the beaten path. Standard consumer platforms do not have enough PCIe lanes for more than two (reasonably) or four (unreasonably), so <span class="hoverdefn" title="High-End DeskTop">HEDT</span> or server hardware is necessary. HEDT is mostly dead and new server hardware increasingly expensive and divergent from desktop platforms, so it's most feasible to buy older server hardware, for which automated compatibility checkers and convenient part choice lists aren't available. The first well-documented build I saw was [this one](https://nonint.com/2022/05/30/my-deep-learning-rig/), which uses 7 GPUs and an AMD EPYC Rome platform (~2019) in an open-frame case designed for miners, although I think [Tinyboxes](https://tinygrad.org/) are intended to be similar. Recently, [this](https://www.mov-axbx.com/wopr/wopr_concept.html) was published, which is roughly the same except for using 4090s and a newer server platform. They propose using server power supplies (but didn't do it themselves), which is a smart idea - I had not considered the fact that you could get adapter boards for their edge connectors. Also see [this](https://battle-blackberry-78e.notion.site/How-to-run-ML-Experiments-for-cheap-b1270491395747458ac6726515b323cc), which recommends using significantly older server hardware - I don't really agree with this due to physical fit/power supply compatibility challenges.
 They describe somewhat horrifying electrical engineering problems due to using several power supplies together, and custom cooling modifications. While doable, all this requires much more expertise than just assembling a standard desktop from a normal part list. Your other option is to take an entire old server and install GPUs in it, but most are not designed for consumer GPUs and will not easily fit or power them. I've also been told that some of them have inflexible firmware and might have issues running unexpected PCIe cards or different fan configurations.
 </details>
--- a/blog/stack-rsapi.md
+++ b/blog/stack-rsapi.md
@ -3,6 +3,8 @@ title: "Site tech stack 2: the unfathomed depths"
 description: RSAPI and the rest of my infrastructure.
 created: 27/03/2024
 slug: srsapi
 series: stack
 series_index: 2
 ---
 ::: epigraph attribution=@creatine_cycle link=https://twitter.com/creatine_cycle/status/1661455402033369088
 Transhumanism is attractive until you have seen how software is built.
--- a/blog/stack.md
+++ b/blog/stack.md
@ -3,6 +3,8 @@ title: Site tech stack
 description: Learn about how osmarks.net works internally! Spoiler warning if you wanted to reverse-engineer it yourself.
 created: 24/02/2022
 updated: 11/05/2023
 series: stack
 series_index: 1
 ---
 ::: epigraph attribution="Rupert Goodwins"
 If you can't stand the heat, get out of the server room.
--- a/src/style.sass
+++ b/src/style.sass
@ -37,7 +37,10 @@ $navbar-width: 20rem
 body
    margin: 0
    font-family: 'Titillium Web', 'Fira Sans', sans-serif
-    line-height: 1.4
+    line-height: 1.5
 .footnote-ref
    line-height: 1
 pre, code, .deemph
    font-family: 'Miracode', monospace