copyedit

2025-07-07 04:22:50 +00:00 · 2025-04-19 22:40:07 +01:00 · 2025-04-19 22:40:07 +01:00 · 4c6f1e89e2
commit 4c6f1e89e2
parent 40f002cb7c
5 changed files with 23 additions and 10 deletions
--- a/blog/ar-glasses.md
+++ b/blog/ar-glasses.md
@ -12,7 +12,7 @@ AR glasses are obviously very cool in concept, but as consumer products they hav

 Note that I'm going to be talking specifically about optical AR, i.e. glasses which combine light from the outside world with overlays optically, rather than using a camera and compositing images digitally like the [Apple Vision Pro](https://www.apple.com/apple-vision-pro/). This is less flexible, but I think social barriers (to wearing a large opaque helmet constantly outside) and focusing issues favor optical AR in the near term[^1].

-Optical systems have a critical flaw: for [optics reasons](https://kguttag.com/2024/05/27/cogni-trax-why-hard-edge-occlusion-is-still-impossible-behind-the-magic-trick/), it's not possible to occlude objects, only to draw on top of them. This precludes replacement of objects and limits practical applications to annotations (on top of objects) or <span class="hoverdefn" title="heads-up display">HUDs</span> (overlays not anchored to anything in particular). Current hardware is mostly limited to HUDs due to <span class="hoverdefn" title="field of view">FOV</span> limitations, and low-cost options comprise "display glasses" such as those from [Xreal](https://www.xreal.com/uk/air2/) and [Epson](https://www.epson.co.uk/en_GB/products/smart-glasses/see-through-mobile-viewer/moverio-bt-40/p/31095), which aren't very useable outdoors due to brightness and transmissivity issues but which have high-resolution multicolor displays, and various normal-looking glasses with integrated [low-resolution green displays](https://kguttag.com/2024/08/18/even-realities-g1-minimalist-ar-glasses-with-integrated-prescription-lenses/). This will improve, of course, though it seems like known solutions have nasty tradeoffs.
+Optical systems have a critical flaw: for [optics reasons](https://kguttag.com/2024/05/27/cogni-trax-why-hard-edge-occlusion-is-still-impossible-behind-the-magic-trick/), it's not possible to occlude objects, only to draw on top of them. This precludes replacement of objects and limits practical applications to annotations (on top of objects) or <span class="hoverdefn" title="heads-up display">HUDs</span> (overlays not anchored to anything in particular). Current hardware is mostly limited to HUDs due to <span class="hoverdefn" title="field of view">FOV</span> limitations, and low-cost options comprise "display glasses" such as those from [Xreal](https://www.xreal.com/uk/air2/) and [Epson](https://www.epson.co.uk/en_GB/products/smart-glasses/see-through-mobile-viewer/moverio-bt-40/p/31095), which aren't very usable outdoors due to brightness and transmissivity issues but which have high-resolution multicolor displays, and various normal-looking glasses with integrated [low-resolution green displays](https://kguttag.com/2024/08/18/even-realities-g1-minimalist-ar-glasses-with-integrated-prescription-lenses/). This will improve, of course, though it seems like known solutions have nasty tradeoffs.

 Given this, I think AR is primarily competing with:

--- a/blog/scaling-meme-search.md
+++ b/blog/scaling-meme-search.md
@ -24,9 +24,9 @@ The concept for this project was developed in May, when I was pondering how to g

 The [unsanctioned datasets distributed via BitTorrent](https://academictorrents.com/details/9c263fc85366c1ef8f5bb9da0203f4c8c8db75f4), widely used in research and diligently maintained by [PushShift](https://github.com/Watchful1/PushshiftDumps) and [Arctic Shift](https://github.com/ArthurHeitmann/arctic_shift), were pleasantly easy to download and use, and after the slow process of decompressing and streaming all 500GB of historical submissions through some basic analysis tools on my staging VPS (it has a mechanical hard drive and two Broadwell cores...) I ran some rough estimates and realized that it would be possible for me to process *all* the images (up to November 2024)[^6] rather than just a subset.

-This may be unintuitive, since "all the images" was, based on my early estimates, about 250 million. Assuming a (slightly pessimistic) 1MB per image, I certainly don't have 250TB of storage. Usable thumbnails would occupy perhaps 50kB each with the best available compression, which would have been very costly to apply, but 12TB is still more than I have free. The trick is that it wasn't necessary to store any of that[^4]: to do search, only the embedding vectors, occupying about 2kB each, are needed (as well as some metadata for practicality). Prior work like [img2dataset](https://github.com/rom1504/img2dataset) retained resized images for later embedding: I avoided this by implementing the entire system as a monolithic minimal-buffering pipeline going straight from URLs to image buffers to embeddings to a very large compressed blob on disk, with backpressure to clamp download speed to the rate necessary to feed the GPU.
+This may be unintuitive, since "all the images" was, based on my early estimates, about 250 million. Assuming a (slightly pessimistic) 1MB per image, I certainly don't have 250TB of storage. Usable thumbnails would occupy perhaps 50kB each with the best available compression, which would have been very costly to apply, but 12TB is still more than I have free. The trick is that it wasn't necessary to store any of that[^4]: to do search, only the embedding vectors[^16], occupying about 2kB each, are needed (as well as some metadata for practicality). Prior work like [img2dataset](https://github.com/rom1504/img2dataset) retained resized images for later embedding: I avoided this by implementing the entire system as a monolithic minimal-buffering pipeline going straight from URLs to image buffers to embeddings to a very large compressed blob on disk, with backpressure to clamp download speed to the rate necessary to feed the GPU.

-I spent a day or two [implementing](https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs) this, with a mode to randomly sample a small fraction of the images for initial testing. This revealed some bottlenecks - notably, the inference server was slower than it theoretically could be and substantially CPU-hungry - which I was able to partly fix by [hackily rewriting](https://github.com/osmarks/meme-search-engine/blob/master/aitemplate/model.py) the model using [AITemplate](https://github.com/facebookincubator/AITemplate). I had anticipated running close to network bandwidth limits, but with my GPU fully loaded and the inference server improved I only hit 200Mbps down at first; a surprising and more binding limit was the CPU-based image preprocessing code, which I "fixed" by compromising image quality very slightly. I also had to increase a lot of resource limits (file descriptors and local DNS caching) to handle the unreasonable amount of parallel downloads. This more or less worked, but more detailed calculations showed that I'd need a month of runtime and significant additional storage for a full run, and the electricity/SSD costs were nontrivial so the project was shelved.
+I spent a day or two [implementing](https://github.com/osmarks/meme-search-engine/blob/master/src/reddit_dump.rs) this, with a mode to randomly sample a small fraction of the images for initial testing. This revealed some bottlenecks - notably, the inference server (for embedding the images) was slower than it theoretically could be and substantially CPU-hungry - which I was able to partly fix by [hackily rewriting](https://github.com/osmarks/meme-search-engine/blob/master/aitemplate/model.py) the model using [AITemplate](https://github.com/facebookincubator/AITemplate). I had anticipated running close to network bandwidth limits, but with my GPU fully loaded and the inference server improved I only hit 200Mbps down at first; a surprising and more binding limit was the CPU-based image preprocessing code, which I "fixed" by compromising image quality very slightly. I also had to increase several resource limits (file descriptors and local DNS caching) to handle the unreasonable quantity of parallel downloads. This more or less worked, but more detailed calculations showed that I'd need a month of runtime and significant additional storage for a full run, and the electricity/SSD costs were nontrivial so the project was shelved.

 Recently, some reprioritization and requiring a lot of additional storage anyway resulted in me resurrecting the project from the archives. I had to make a few final tweaks to integrate it with the metrics system, reduce network traffic by making it ignore probably-non-image URLs earlier, log some data I was missing and (slightly) handle links to things like Imgur galleries. After an early issue with miswritten concurrency code leading to records being read in the wrong order such that it would not correctly recover from a restart, it ran very smoothly for a few days. There were, however, several unexplained discontinuities in the metrics, as well as some gradual changes over time which resulted in me using far too much CPU time. I had to actually think about optimization.

@ -38,7 +38,7 @@ The metrics dashboard just after starting it up. The white stripe is due to plac
 While not constantly maxing out CPU, it was bad enough to worsen GPU utilization.
 :::

-There were, conveniently, easy solutions. I reanalyzed some code and realized that I was using an inefficient `msgpack` library in the Python inference server for no particular reason, which was easy to swap out; that having the inference server client code send images as PNGs to reduce network traffic was not necessary for this and was probably using nontrivial CPU time for encode/decode (PNG uses outdated and slow compression); and that a farily easy [replacement](https://lib.rs/crates/fast_image_resize) for the Rust image resizing code was available with significant speedups. This reduced load enough to keep things functioning stably at slightly less than 100% CPU for a while, but it crept up again later. [Further profiling](/assets/images/meme-search-perf.png) revealed no obvious low-hanging fruit other than [console-subscriber](https://github.com/tokio-rs/console/), a monitoring tool for tokio async code, using ~15% of runtime for no good reason - fixing this and switching again to slightly lower-quality image resizing fixed everything for the remainder of runtime. There was a later spike in network bandwidth which appeared to be due to there being many more large PNGs to download, which I worried would sink the project (or at least make it 20% slower), but this resolved itself after a few hours.
+There were, conveniently, easy solutions. I reanalyzed some code and realized that I was using an inefficient `msgpack` (de)serialization library in the Python inference server for no particular reason, which was easy to swap out; that having the inference server client code send images as PNGs to reduce network traffic was not necessary for this and was probably using nontrivial CPU time for encode/decode (PNG uses outdated and slow compression); and that a farily easy [replacement](https://lib.rs/crates/fast_image_resize) for the Rust image resizing code was available with significant speedups. This reduced load enough to keep things functioning stably at slightly less than 100% CPU for a while, but it crept up again later. [Further profiling](/assets/images/meme-search-perf.png) revealed no obvious low-hanging fruit other than [console-subscriber](https://github.com/tokio-rs/console/), a monitoring tool for `tokio` async code, using ~15% of runtime for no good reason - fixing this and switching again to slightly lower-quality image resizing fixed everything for the remainder of runtime. There was a later spike in network bandwidth which appeared to be due to there being many more large PNGs to download, which I worried would sink the project (or at least make it 20% slower), but this resolved itself after a few hours.

 ## Indexing

@ -86,7 +86,7 @@ This all required more time in the data labelling mines and slightly different a
 * I still think directly predicting winrates with a single model might be a good idea, but it would have been annoying to do, since I'd still have to train the score model, and I think most of the loss of information occurs elsewhere (rounding off preferences).
 * Picking pairs with predicted winrate 0.5 would also pick mostly boring pairs the model is confident in. The variance of predictions across the ensemble is more meta-uncertainty, which I think is more relevant. I did add some code to have me rate the high-rated samples, though, since I worried that the unfiltered internet data was broadly too bad to make the model learn the high end.

-It turns out that SigLIP is tasteful enough on its own that I don't need to do that much given a fairly specific query, and the classifier is not that useful - the bitter lesson in action.
+It turns out that the SigLIP model is tasteful enough on its own that I don't need to do that much given a fairly specific query, and the classifier is not that useful - the bitter lesson in action.

 I previously worked on [SAEs](/memesae/) to improve querying, but this seems to be unnecessary with everything else in place. Training of a bigger one has been completed for general interest - it can be downloaded [here](https://datasets.osmarks.net/big_sae/), along with the pages of the samples most/least in each feature direction. Subjectively, the negative samples seem somewhat more consistent and the features are more specific (I used 262144 features, up from 65536, and about ten times as many embeddings to train, and only did one epoch).

@ -142,3 +142,5 @@ The meme search master plan.
 [^14]: Also, most work doesn't seem interested in testing performance in the ~64B code size regime even though the winning method varies significantly by size.

 [^15]: Could it be (usefully) run on GPU instead? I don't *think* so, at least without major tweaks. You'd have less VRAM, so you'd need smaller graph shards, and the algorithm is quite branchy and does many random reads.
+
+[^16]: Embeddings are (roughly) a lossy summary of some input, represented as a large, mostly-opaque vector.
--- a/blog/stack.md
+++ b/blog/stack.md
@ -12,7 +12,7 @@ If you can't stand the heat, get out of the server room.

 As you may know, osmarks.net is a website, served from computers which are believed to exist. But have you ever wondered exactly how it's all set up? If not, you may turn elsewhere and live in ignorance. Otherwise, continue reading.

-Many similar personal sites are hosted on free static site services or various cloud platforms, but mine runs on a physical server. This was originally done because of my general distrust of SaaS/cloud platforms, to learn about Linux administration, and desire to run some non-web things, but now it's necessary to run the full range of weird components which are now important to the website. ~~The hardware has remained the same since early 2019, before I had a public site, apart from the addition of more disk capacity and a spare GPU for occasional machine learning workloads - I am using an old HP ML110 G7 tower server. Despite limited RAM and CPU power compared to contemporary rackmount models, it was cheap, has continued to work amazingly reliably, and is much more power-efficient than those would have been. It mostly only runs at about 5% CPU load and 2GB of RAM in use anyway, so it's not been an issue.~~ Due to the increasing compute demands of internal workloads, among other things, it has now been replaced with a custom build using a consumer Ryzen CPU. This has massively increased performance thanks to the CPU's much better IPC, clocks and core count, the 16x increase in RAM, and having an SSD[^1]. The main server currently idles at ~5% across cores and 30GB of RAM in use due to extensive caching.
+Many similar personal sites are hosted on free static site services or various cloud platforms, but mine runs on a physical server. This was originally done because of my general distrust of SaaS/cloud platforms, to learn about Linux administration, and desire to run some non-web things, but now it's necessary to run the full range of weird components which are now important to the website. ~~The hardware has remained the same since early 2019, apart from the addition of more disk capacity and a spare GPU for occasional machine learning workloads - I am using an old HP ML110 G7 tower server. Despite limited RAM and CPU power compared to contemporary rackmount models, it was cheap, has continued to work amazingly reliably, and is much more power-efficient than those would have been. It mostly only runs at about 5% CPU load and 2GB of RAM in use anyway, so it's not been an issue.~~ Due to the increasing compute demands of internal workloads, among other things, it has now been replaced with a custom build using a consumer Ryzen CPU. This has massively increased performance thanks to the CPU's much better IPC, clocks and core count, the 16x increase in RAM, and having an SSD[^1]. The main server currently idles at ~5% across cores and 30GB of RAM in use due to extensive caching.

 The main site itself, which you're currently reading, is primarily a simple static site, though it consumes several backend services and integrates several pieces of JavaScript (vanilla JS and Svelte) for controls and comments, as well as for individual experiments. Over the years the exact implementation has varied significantly, from the original not-very-static version using Caddy, some PHP scripts for Markdown and a few folders of HTML files to the later strange combination of Haskell (using Hakyll) and makefiles to the current somewhat cursed [Node.js program](https://github.com/osmarks/website/blob/master/src/index.js). The modern implementation of the compiler does templating, dependency resolution, Markdown, search indexing and some optimization tasks in several hundred lines of very dependency-heavy and undocumented JavaScript.

--- a/blog/superintelligence-lower-bounds.md
+++ b/blog/superintelligence-lower-bounds.md
@ -43,7 +43,7 @@ But focus on concrete tasks I can think of myself is rather missing the point. D

 Due to limited working memory and the necessity of distributing subtasks in an organization, humans design and model systems based on abstraction - rounding off low-level detail to produce a homogeneous overview with fewer free parameters. [Seeing Like a State](https://en.wikipedia.org/wiki/Seeing_Like_a_State)[^1] describes how this has gone wrong historically - states, wanting the world to be easier to manage, bulldoze fine-tuned local knowledge and install simple rules and neat rectangles which produce worse outcomes. I think this case is somewhat overstated, because abstraction does often work better than the alternatives. People can't simultaneously attend to the high-level requirements of their problem and every low-level point, so myopic focus on the low-level detracts from the overall quality of the result[^2] - given the limitations of humans.

-Abstraction amortises intellect, taking good solutions to simpler and more general problems and applying them on any close-enough substrate. This has brought us many successes like industrial farming, digital computers and assembly lines. But an end-to-end design not as concerned with modularity and legibility will usually outperform one based on generalities, if you can afford the intellectual labour, through better addressing cross-cutting concerns, precise tailoring to small quirks and making simplifications across layers of the stack. Due to organizational issues, the cost of human intelligence, and working memory limitations, this frequently doesn't happen. [This book](https://www.construction-physics.com/p/book-review-building-an-affordable) describes some object-level examples in house construction.
+Abstraction amortises intellect, taking good solutions to simpler and more general problems and applying them on any close-enough substrate. This has brought us many successes like industrial farming, digital computers and assembly lines. But an end-to-end design not as concerned with modularity and legibility will usually outperform one based on generalities, if you can afford the intellectual labour, through better addressing cross-cutting concerns, precise tailoring to small quirks and making simplifications across layers of the stack. Due to organizational issues, the cost of human intelligence, and working memory limitations, this frequently doesn't happen. [This book](https://www.construction-physics.com/p/book-review-building-an-affordable) describes some object-level examples in house construction and [this blog post](https://yosefk.com/blog/my-history-with-forth-stack-machines.html) suggests that Forth is this for computing.

 We see the abstractions still even when they have gaps, and this is usually a security threat. A hacker doesn't care that you think your code "parses XML" or "checks authentication" - they care about [what you actually wrote down](https://gwern.net/unseeing), and what the computer will do with it[^3], which is quite possibly [not what you intended](https://blog.siguza.net/psychicpaper/). Your nice "secure" cryptographic code is [running on hardware](http://wiki.newae.com/Correlation_Power_Analysis) which reveals correlates of what it's doing. Your "air-gapped" computer is able to emit [sounds](https://arxiv.org/abs/2409.04930v1) and [radio signals](https://arxiv.org/abs/2207.07413) and [is connected to power cables](https://pushstack.wordpress.com/2017/07/24/data-exfiltration-from-air-gapped-systems-using-power-line-communication/). A "blank wall" [leaks information](https://www.cs.princeton.edu/~fheide/steadystatenlos) through diffuse reflections. Commodity "communication" hardware can [sense people](https://www.usenix.org/system/files/nsdi24-yi.pdf), because the signals travel through the same physical medium as everything else. Strange side channels are everywhere and systematically underestimated. These are the examples we *have* found, but new security vulnerabilities are detected continually and I am confident that essentially all complex software is hopelessly broken in at least one way.

--- a/links_cache.json
+++ b/links_cache.json
@ -2683,12 +2683,15 @@
        "auto": true
    },
    "https://en.wikipedia.org/wiki/The_Case_Against_Education": {
-        "excerpt": "From Wikipedia, the free encyclopedia",
+        "excerpt": "Despite being immensely popular—and immensely lucrative—education is grossly overrated. In this explosive book, Bryan Caplan argues that the primary function of education is not to enhance students’ skill but to certify their intelligence, work ethic, and conformity—in other words, to signal the qualities of a good employee. Learn why students hunt for easy As and casually forget most of what they learn after the final exam, why decades of growing access to education have not resulted in better jobs for the average worker but instead in runaway credential inflation, how employers reward workers for costly schooling they rarely if ever use, and why cutting education spending is the best remedy.",
        "title": "The Case Against Education",
-        "author": "Contributors to Wikimedia projects",
+        "author": "Bryan Caplan",
        "date": "2019-08-15T03:02:06Z",
        "website": "Wikimedia Foundation, Inc.",
-        "auto": true
+        "auto": true,
+        "referenceIn": {
+            "progedu": ""
+        }
    },
    "https://projecteuler.net/": {
        "excerpt": "A website dedicated to the fascinating world of mathematics and programming",
@ -3863,5 +3866,13 @@
        "date": null,
        "website": "Planet Minecraft",
        "auto": true
+    },
+    "https://yosefk.com/blog/my-history-with-forth-stack-machines.html": {
+        "excerpt": "My VLSI tools take a chip from conception through testing. Perhaps 500 lines\nof source code. Cadence, Mentor Graphics do the same, more or less. With how much source/object\ncode?",
+        "title": "My history with Forth & stack machines",
+        "author": null,
+        "date": null,
+        "website": null,
+        "auto": true
    }
 }