new article

2025-05-23 17:54:07 +00:00 · 2024-02-25 21:30:23 +00:00 · 2024-02-25 21:30:23 +00:00 · 8cc8425795
commit 8cc8425795
parent b0c118bb65
11 changed files with 221 additions and 22 deletions
--- a/assets/images/ad102.jpg
+++ b/assets/images/ad102.jpg
--- a/assets/images/mlrig.png.original
+++ b/assets/images/mlrig.png.original
--- a/assets/images/rtx-4090-power-scaling.webp
+++ b/assets/images/rtx-4090-power-scaling.webp
--- a/assets/images/tesla-k80.jpg
+++ b/assets/images/tesla-k80.jpg
--- a/blog/computercraft.md
+++ b/blog/computercraft.md
@ -5,6 +5,8 @@ slug: computercraft
 created: 18/11/2023
 draft: yes
 ---
-I have been thinking about [ComputerCraft](https://tweaked.cc/) slightly recently, because of moving [several years of archived code](https://github.com/osmarks/random-stuff/tree/master/computercraft) from Pastebin and some private internal repositories to public view (and writing some minor patches to [PotatOS](https://potatos.madefor.cc/)), and it increasingly seems like a model of what computers *should* be like which highlights the shortcomings of everything else.
+I have been thinking about [ComputerCraft](https://tweaked.cc/) slightly recently[^1], because of moving [several years of archived code](https://github.com/osmarks/random-stuff/tree/master/computercraft) from Pastebin and some private internal repositories to public view (and writing some minor patches to [PotatOS](https://potatos.madefor.cc/)), and it increasingly seems like a model of what computers *should* be like which highlights the shortcomings of everything else.

-Computers undoubtedly grow more powerful every year, as fabs wrangle quantum electrodynamics into providing ever better and smaller transistors at great cost and the handful of companies still at the cutting edge refine their architectures slightly, but, [as has been noted](https://danluu.com/input-lag/), this doesn't actually translate into better user experience.
+Computers grow more powerful every year, as fabs wrangle ever more advanced machinery into printing ever better and smaller transistors and the handful of companies still at the cutting edge refine their architectures slightly, but, [as has been noted](https://danluu.com/input-lag/), this doesn't actually translate into better user experience.
+
+[^1]: This introductory sentence was written several months ago. 
--- a/blog/ml-workstation.md
+++ b/blog/ml-workstation.md
@ -0,0 +1,146 @@
+---
+title: So You Want A Cheap ML Workstation
+description: How to run local AI slightly more cheaply than with a prebuilt system. Somewhat opinionated.
+created: 25/02/2024
+slug: mlrig
+---
+
+## Summary
+
+- Most of your workstation should be like a normal gaming desktop, but with less emphasis on single-threaded performance and more RAM. These are not hard to build yourself.
+- Buy recent consumer Nvidia GPUs with lots of VRAM (*not* datacentre or workstation ones).
+- Older or used parts are good to cut costs (not overly old GPUs).
+- Buy a sufficiently capable PSU.
+
+## Long version
+
+Thanks to the osmarks.net crawlers scouring the web for bloggable information[^1], I've found out that many people are interested in having local hardware to run machine learning workloads (by which I refer to GPU-accelerated inference or training of large neural nets: anything else is [not real](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)), but are doing it wrong, or not at all. There are superficially good part choices which are, in actuality, extremely bad for almost anything, and shiny [prebuilt options](https://lambdalabs.com/gpu-workstations/vector-one) which are far more expensive than necessary. In this article, I will outline what to do to get a useful system at somewhat less expense[^2].
+
+## Do not fear hardware (much)
+
+If you mostly touch software, you might be worried about interacting with the physical world, such as by buying and assembling computer hardware. Don't be. Desktop computer hardware is heavily standardized, and assembly of a computer from parts can easily be done in a few hours by anyone with functional fine motor control and a screwdriver (there are many free high-quality guides available). As long as you're not doing anything exotic, part selections can be automatically checked for compatibility [by PCPartPicker](https://pcpartpicker.com/), and many online communities offer free human review. Part selection is also not extremely complicated in the average case, though some knowledge of your workload and basic computer architecture is necessary. I am not, however, going to provide part lists, because these vary with your requirements and with local pricing. You may want to ask [r/buildapc](https://www.reddit.com/r/buildapc/) or similar communities to review your part list.
+
+## GPU choice
+
+The most important decision you will make in your build is your choice of GPU(s) - the GPU will be doing most of your compute, and generally define how capable the rest of your components need to be. You can, practically, run at most two on consumer hardware (see [Scaling up](#scaling-up) for more).
+
+### Submit to Jensen
+
+Unless you want to spend lots of your time messing around with drivers, Nvidia is your only practical choice for compute workloads. Optimized kernels[^12] such as [Flash Attention](https://github.com/Dao-AILab/flash-attention) are generally only written for CUDA, hampering effective compute performance on alternatives. AMD make capable GPUs for gaming which go underappreciated by many buyers, and Intel... make GPUs... but AMD does not appear to be taking their compute stack seriously on consumer hardware[^3] and Intel's is merely okay[^4].
+
+AMD's CUDA competitor, ROCm, appears to only be officially supported on the [highest-end cards](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility.html), and (at least according to [geohot as of a few months ago](https://geohot.github.io/blog/jekyll/update/2023/06/07/a-dive-into-amds-drivers.html)) does not work very reliably even on those. AMD also lacks capable matrix multiplication acceleration, meaning its GPUs' AI compute performance is lacking - even the latest RDNA 3 hardware only has [WMMA](https://gpuopen.com/learn/wmma_on_rdna3/), which reuses existing hardware slightly more efficiently, resulting in the top-end RX 7900 XTX being slower than Nvidia's last-generation RTX 3090 in theoretical matrix performance.
+
+Intel GPUs have good matrix multiplication accelerators, but their most powerful (consumer) GPU product is not very performant and the software is problematic - [Triton](https://github.com/intel/intel-xpu-backend-for-triton) and [PyTorch](https://github.com/intel/intel-extension-for-pytorch) are supported, but not all tools will support Intel's integration code, and there is presently an issue with addressing more than 4GB of memory in one allocation due to their iGPU heritage which apparently causes many problems.
+
+### Do not buy datacentre cards
+
+Many unwary buyers have fallen for the siren song of increasingly cheap used Nvidia Tesla GPUs, since they offer very large VRAM pools at very low cost. However, these are a bad choice unless you *only* need that VRAM. The popular Tesla K80 is 9 years old, with lacking driver support, no FP16, extremely lacking general performance, high power consumption, and no modern optimization efforts, and it's not actually one GPU - it's two on a single card, so you have to deal with parallelizing anything big across GPUs. The next-generation Tesla M40 has similar problems, although it is a single GPU rather than two, and the P40 is not much different, though instead of *no* FP16 it has *unusably slow* FP16[^14]. Even a Tesla P100 is lacking in compute performance compared to newer generations. Datacentre cards newer than that are not available cheaply. There's also some complexity with cooling, since they're designed for server airflow with separate fans, unlike a consumer GPU.[^13]
+
+<div class="caption">
+    <img src="/assets/images/tesla-k80.jpg">
+    <div>It may look innocent, but it is a menace to unaware hobbyists.</div>
+</div>
+
+### Do not buy workstation cards
+
+Nvidia has a range of workstation graphics cards. However, they are generally worse than their consumer GPU counterparts in every way except for VRAM capacity, sometimes compactness, and artificial feature gating (PCIe P2P and ECC): the prices are drastically higher (the confusingly named RTX 6000 Ada Generation ("6000A") sells for about four times the price of the similar RTX 4090), the memory bandwidth lower (consumer cards use GDDR6X, which generally offers higher bandwidth, but workstation hardware uses plain GDDR6 due to power) and performance in practice actually worse even when on paper it should be better. The 6000A has an underpowered cooler and aggressively throttles back under high-power loads, resulting in drastically lower performance.[^11]
+
+### Workload characteristics
+
+As you can probably now infer, I recommend using recent consumer hardware, which offers better performance/$. Exactly which consumer hardware to buy depends on intended workload. There are typically only three relevant metrics (which should be easy to find in spec sheets):
+
+* Memory bandwidth.
+* Compute performance (FP16 tensor TFLOP/s).
+* VRAM capacity.
+
+VRAM capacity doesn't affect performance until it runs out, at which point you will incur heavy penalties from swapping and/or moving part of your workload to the CPU. Memory bandwidth is generally limiting with large models and small batch sizes (e.g. online LLM inference for chatbots[^5]), and compute the bottleneck for training and some inference (e.g. Stable Diffusion and some other vision models)[^6]. Within a GPU generation, these generally scale together, but between generations bandwidth usually grows slower than compute. Between Ampere (RTX 3XXX) and Ada Lovelace (RTX 4XXX) it has in some cases gone *down*[^7].
+
+As VRAM effectively upper-bounds practical workloads, it's best to get the cards Nvidia generously deigns to give outsized amounts of VRAM for their compute performance, unless you're sure of what you want to run. This usually means a RTX 3060 (12GB), RTX 3090 or RTX 4090. RTX 3090s are readily available used far below the official retail prices, and are a good choice if you're mostly concerned with inference, since their memory bandwidth is almost the same as a 4090's, but 4090s have over twice as much compute on paper and (in non-memory-bound scenarios) also bear this out in practice.
+
+### Multi-GPU
+
+You can run two graphics cards in a consumer system without any particularly special requirements - just make sure your power supply [can handle it](#power-consumption) and that you get a mainboard with PCIe slots with enough spacing between them. Each GPU will run with 8 PCIe lanes, via PCIe bifurcation. Any parallelizable workload which fits onto a single card should work at almost double speed with data parallelism, and larger models can be loaded across both via pipeline or tensor parallelism. Note that the latter requires fast interconnect between the GPUs. To spite users[^9], only the RTX 3090 has NVLink, which provides about 50GB/s (each direction) between GPUs[^8], and only workstation GPUs have PCIe P2P enabled, which reduces latency and increases bandwidth when using standard PCIe between two GPUs. However, you can get away without either of these if you don't need more than about 12GB/s (each direction) between GPUs, which I am told you usually don't.
+
+Technically, you *can* plug in more GPUs than this (up to 4), but they'll have less bandwidth and messing around with riser cables is usually necessary.
+
+### Power consumption
+
+GPUs are pretty power-hungry. PCPartPicker will make a good estimate of maximum power draw in most cases, but Ampere GPUs can briefly have power spikes to far above their rated TDP[^10]. A good PSU may handle these without tripping overcurrent/overpower protection, but it's safer to just assume that a RTX 3090 has a maximum power draw of 600W and choose a power supply accordingly.
+
+If you're concerned about reducing your power bill, Ada Lovelace GPUs are generally much more efficient than Ampere due to their newer manufacturing process. You can also power-limit your GPU using `nvidia-smi -pl [power limit in watts]` (note that this must be run each boot in some way): this does reduce performance, but nonlinearly.
+
+<div class="caption">
+    <img src="/assets/images/rtx-4090-power-scaling.webp">
+    <div>Thanks to "snowy χατγιρλ/acc" on #off-topic for the benchmark. Other GPUs will have different behaviour. This is something of a worst case though - you'll lose less to power limits in real workloads.</div>
+</div>
+
+## Other components
+
+Obviously computers contain parts other than the GPU. For the purposes of a pure ML workstation, these don't really matter, as they won't usually be bottlenecks (if you intend to debase your nice GPU by also running *games* and other graphical tasks on it, then you will of course need more powerful ones). Any recent consumer CPU should be more than capable of driving a GPU for running models. For more intensive work involving heavy data preprocessors or compilation you should prioritize core count over single-threaded performance (e.g. by buying a slightly older-generation higher-core-count CPU). Every good-quality NVMe SSD is fast enough for almost anything you might want to do with it. Your build will not be very different from a standard gaming computer apart from these minor details, so it's easiest to take a good build for one of those and make the necessary tweaks.
+
+One thing to be concerned about, however, is RAM. If you do anything novel, most of the code you will run will be <span class="hoverdefn" title="bad">"research-grade"</span> and consume far more RAM than it should. To work around this, you should make sure to buy plenty of RAM (at the very least, more CPU RAM than VRAM) or to use a very big swap file, as this is much more practical than fixing all the code. If possible, buy the biggest single DIMMs (memory modules) you can, as running more or fewer than two sticks will cut your CPU's memory bandwidth - while not performance-critical like *GPU* memory bandwidth, there's no reason to incur this hit unnecessarily.
+
+Also note that modern GPUs are very big. You should be sure that your case supports the length and width of your GPU, as well as the height of your GPU plus its power cables.
+
+<details>
+<summary><h2>Addenda</h2></summary>
+
+### CPU inference
+
+While I don't like this myself, you might be interested in slowly running very large language models interactively and nothing else. This is when datacentre GPUs might actually be sane (still not K80s), as well as running on CPU. To a first approximation, one token generated requires two FLOPS (one fused multiply-add) per parameter regardless of quantization, and loading every weight into cache from RAM once. Here is (roughly) the compute and memory bandwidth available with various hardware:
+
+<div class="wider">
+
+Hardware | TFLOP/s | Bandwidth (GB/s) | Ratio (FLOPS/B) | Capacity (GB) | Notes
+---|---|---|---|---|---
+Nvidia GeForce RTX 4090 | 165 | 1008 | 163 | 24 | FP16 dense tensor TFLOP/s from spec sheet (FP32 accumulate).
+Nvidia GeForce RTX 3090 | 71 | 936 | 75 | 24 | As above.
+Nvidia GeForce RTX 3060 (12GB) | 25 | 360 | 70 | 12 | As above.
+Nvidia Tesla K80 (one GPU) | 4 | 240 | 16 | 12 | Each Tesla K80 card contains two individual GPU chips. They do not have FP16, so I'm using FP32 numbers.
+Nvidia Tesla M40 | 7 | 288 | 24 | 24 | Still no FP16, but only one GPU per card. It has less aggregate bandwidth than a whole K80 card as a result.
+Nvidia Tesla P40 | 12 | 347 | 34 | 24 | It has hardware FP16 but crippled, so I use FP32 figures.
+AMD Ryzen 9 7950X | 2.5 | 83 | 30 | <=192 | TFLOP/s estimated from [AVX-512 figures here](https://www.mersenneforum.org/showthread.php?p=614191). Bandwidth is theoretical, assuming DDR5-5200 dual-channel (I think in practice Infinity Fabric links bottleneck this). Using four DIMMs will reduce rated RAM speed a lot.
+AMD Ryzen 7 7800X | 1.3 | 83 | 16 | <=192 | Basically half a 7950X in terms of compute.
+Intel Core i9-14900K | 2.5 | 90 | 27 | <=192 | No AVX-512, but the [same amount](https://chipsandcheese.com/2021/12/02/popping-the-hood-on-golden-cove/) of floating point execution capacity as AMD on P-cores, I think. Each E-core ([Gracemont](https://chipsandcheese.com/2021/12/21/gracemont-revenge-of-the-atom-cores/)) provides half as much per cycle. I am assuming maximum turbo frequencies on all cores at once. Rated memory bandwidth is slightly higher than AMD's (on DDR5).
+Intel Core i9-14600K | 1.5 | 90 | 16 | <=192 | As above.
+Intel Xeon Platinum 8280 | 4.8 | 141 | 34 | <=1024 | Just for fun (these, and boards for them, are hard to get, though easier/cheaper than modern server CPUs). Compute is overestimated as these downclock badly in heavy AVX-512 loads.
+Apple M1 Ultra | 21 | 819 | 27 | 128 | Apple Silicon has a bizarrely good memory subsystem. I'm counting its GPU TFLOP/s here.
+
+</div>
+
+One forward pass of an LLM with FP16 weights conveniently also requires loading two bytes per weight, so the FLOPS per byte ratio above is (approximately; I'm rounding off many, many details here) how many tokens can be processed in parallel without slowdown. Since sampling (generating outputs) is inherently serial you don't benefit from possible parallelism (except when processing the prompt), so quantization (which reduces memory bandwidth and slightly increases compute costs) has lots of room to work. In principle the FLOP/byte ratio should be high enough with everything that performance is directly proportional to bandwidth. This does not appear to be true with older GPUs according to [user reports](https://www.reddit.com/r/LocalLLaMA/search?q=p40&restrict_sr=on&sort=relevance&t=all), probably due to overheads I ignored - notably, nobody reports more than about 15 tokens/second. Thus, despite somewhat better software support, CPU inference is usually going to be slower than old-datacentre-GPU inference, but is at least the best way to get lots of memory capacity.
+
+### Scaling up
+
+It's possible to have more GPUs without going straight to an expensive "real" GPU server or large workstation and the concomitant costs, but this is very much off the beaten path. Standard consumer platforms do not have enough PCIe lanes for more than two (reasonably) or four (unreasonably), so <span class="hoverdefn" title="High-End DeskTop">HEDT</span> or server hardware is necessary. HEDT is mostly dead and new server hardware increasingly expensive and divergent from desktop platforms, so it's most feasible to buy older server hardware, for which automated compatibility checkers and convenient part choice lists aren't available. The only well-documented build I've seen is [this one](https://nonint.com/2022/05/30/my-deep-learning-rig/), which uses 7 GPUs and an AMD EPYC Rome platform (~2019) in an open-frame case designed for miners, although I think [Tinyboxes](https://tinygrad.org/) are intended to be similar.
+
+They describe somewhat horrifying electrical engineering problems due to using several power supplies together, and custom cooling modifications. While doable, all this requires much more expertise than just assembling a standard desktop from a normal part list. Your other option is to take an entire old server and install GPUs in it, but most are not designed for consumer GPUs and will not easily fit or power them. I've also been told that some of them have inflexible firmware and might have issues running unexpected PCIe cards or different fan configurations.
+</details>
+
+[^1]: Not really.
+
+[^2]: High-performance compute hardware is still not cheap in an absolute sense, and for infrequent loads you are likely better off with [cloud services](https://vast.ai/).
+
+[^3]: I'm told it works fine on their latest datacentre cards. You are not getting those. You aren't even renting those, for some reason.
+
+[^4]: Intel's is arguably better on consumer hardware than datacentre, as their datacentre hardware doesn't work.
+
+[^5]: Especially since most LLM quantization dequantizes to FP16 before doing the matrix multiplications, sparing no compute but lots of bandwidth and VRAM.
+
+[^6]: Tim Dettmers has a good [technical explanation](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/) of this, though many of the specific recommendations it makes are outdated, Nvidia is now known to artificially limit FP16 tensor performance with FP32 reduction on both Ada Lovelace and Ampere, and the structured sparsity feature has not had any real adoption.
+
+[^7]: Compare the RTX 3060 and RTX 4060, for instance. It's still faster for gaming because of caches compensating for this and higher clocks providing more compute.
+
+[^8]: I don't know the theoretical link rate, but it's [benchmarked here](https://www.boston.co.uk/blog/2021/03/09/boston-labs-tests-nvidia-nvlink.aspx).
+
+[^9]: The AD102 chip in the RTX 4090 even appears to have had NVLink removed late in development (see the blank areas around the perimeter): ![AD102 die shot by Fritzchens Fritz](/assets/images/ad102.jpg) (image source: <https://www.flickr.com/photos/130561288@N04/53156939446/>).
+
+[^10]: See the "PSUs" section [here](https://nonint.com/2022/05/30/my-deep-learning-rig/).
+
+[^11]: I don't seem to actually have a source for this (probably old Discord conversations), but I'm obviously right.
+
+[^12]: Meaning optimized code for a specific computing task, not *OS* kernels.
+
+[^13]: This is not hard to fix with aftermarket fans and a 3D printer and/or zip ties.
+
+[^14]: You should be able to hold weights in FP16 and do the maths in FP32, giving you FP32 speeds instead of the horrible slowdown, though.
--- a/blog/political-opinions.md
+++ b/blog/political-opinions.md
@ -2,20 +2,20 @@
 title: Stop having political opinions
 description: This is, of course, all part of my evil plan to drive site activity through systematically generating (meta)political outrage.
 created: 24/09/2023
+updated: 15/01/2024
 slug: opinion
-draft: yes
 ---
 This may sound strange coming from someone whose website contains things which are clearly [political opinions](/osbill/); I am being [hypocritical](https://www.overcomingbias.com/p/homo-hipocritushtml)/didn't notice/have updated my views since that/am writing hyperbolically or ironically to make a point/do not require myself to have self-consistent beliefs (select your favourite option). Regardless, I think that holding, forming and in various ways acting on political opinions is somewhere between unnecessary and significantly net harmful. I apologize in advance for not using concrete examples for anything in this post, but those would be political opinions.

 ## Importance, Tractability, Neglectedness

-Political interaction is often framed as altruistic or even morally necessary - most notably, voting is a "civic duty" and in some countries compulsory, and it's common for political movements and their participants to believe that they are helping to bring about a better world through their actions, or that they're preventing some other group from doing harm (and thus in some sense doing good) with their ill-posed opinions, misaligned values or sheer evilness. Thus, let's evaluate it as an altruistic act using the [ITN](https://forum.effectivealtruism.org/topics/itn-framework) framework favoured by Effective Altruism. In brief, Importance is the value of fully solving whatever problem you're targeting, Tractability is the marginal value of your input to it (how much an additional unit of work can affect the problem), and Neglectedness is how little the problem is already being worked on.
+Political interaction is often framed as altruistic[^1] or even morally necessary - most notably, voting is a "civic duty"[^2], and it's common for political movements and their participants to believe that they are helping to bring about a better world through their actions, or that they're preventing some other group from doing harm (and thus in some sense doing good) with their ill-posed opinions, misaligned values or sheer evilness. Thus, let's evaluate it as an altruistic act using the [ITN](https://forum.effectivealtruism.org/topics/itn-framework) framework favoured by Effective Altruism. In brief, Importance is the value of fully solving whatever problem you're targeting, Tractability is the marginal value of your input to it (how much an additional unit of work can affect the problem), and Neglectedness is how little the problem is already being worked on.

 Politics clearly fails at neglectedness. The majority of people are interested at least to the extent of thinking and talking about it regularly and voting. Very large chunks of media time are allotted to politics, and people readily seek out political content to read and debate. There is no shortage of advocacy groups, think tanks and public intellectuals engaging in politics. You might contend that your favourite political position is neglected and less popular than widely discussed ones, but given that you are aware of it and supporting it it probably still has a fairly large amount of supporters - the world population is quite large, after all - and since you're still in the same field as all the other positions you are competing with them for resources and attention.

 It does not do well on tractability. For mostly the same reasons as neglectedness, your marginal contribution is not big. [Voting](https://putanumonit.com/2015/12/30/010-voting/) is, even under fairly optimistic assumptions, very unlikely to change the outcome of an election. Discussing politics with people you know is notorious for never changing anyone's beliefs, and arguments on social media are even less effective - very little discussion surfaces novel ideas and it mostly serves as an ineffective attempt to apply social pressure. The situation with protests and similar activity is perhaps better because there are fewer people doing that, but I do not think their effectiveness is going to be affected much by the addition or removal of a person on the margin, and I am not convinced that they do much in general. Politics is also especially intractable because on many issues, people are actively working against you.

-Importance is somewhat more ambiguous. I have been playing fast and loose with the exact definition of "politics" here - while it's clearly true that the sum of everything people want solved via politics is very important, the plausible consequences of something like electing a party you like or having a policy you want implemented are significantly smaller, both from the perspectives of [conflict theory](https://slatestarcodex.com/2018/01/24/conflict-vs-mistake/) (the frame of political disagreements as battles between groups over values or resource allocation) and mistake theory (political disagreements as good-faith discussions of what the best thing to do is given a shared understanding of goals). Conflict-theoretically, any victory can be eroded by changing power dynamics later or nulified by enemies in the system surrounding it; mistake-theoretically, the impact of policies is very hard to test, let alone know in advance, and many of the issues policies are intended to solve are very complicated and any single solution is unlikely to work very well.
+Importance is somewhat more ambiguous. I have been playing fast and loose with the exact definition of "politics" here - while it's clearly true that the sum of everything people want solved via politics is very important, the plausible consequences of something like electing a party you like or having a policy you want implemented are significantly smaller, both from the perspectives of [conflict theory](https://slatestarcodex.com/2018/01/24/conflict-vs-mistake/) (the frame of political disagreements as battles between groups over values or resource allocation) and mistake theory (political disagreements as good-faith discussions of what the best thing to do is given a shared understanding of goals). I found out while researching for this that policy changes are [actually surprisingly robust](https://forum.effectivealtruism.org/posts/jCwuozHHjeoLPLemB/how-long-do-policy-changes-matter-new-paper), but there are still problems - mistake-theroretically, the world is very complex and a policy may not actually do what you want, and, conflict-theoretically, an uncooperative government will not implement a policy in the way you want.

 ## The Magic Fix-Everything Button

@ -23,20 +23,28 @@ A large amount of modern politics-as-practiced seems to take a specific kind of

 While there are absolutely some cases where a bad policy exists for conflict-theoretic reasons (e.g. one group wants to enrich itself at the expense of others and opposition is too diffuse to stop it), the biggest problems we face now have no clean complete solution, only a wide range of possible policy positions with a complex set of tradeoffs. Insistence on a particular consequence without thought to how it might actually be achieved, erasure of tradeoffs, or [ignorance of the reasons](https://en.wiktionary.org/wiki/Chesterton%27s_fence) someone else might be against an obviously-good-to-you policy result in prolonged conflict and ineffective results. Where possible, it's better to try and [move the Pareto frontier](https://www.overcomingbias.com/p/policy_tugowarhtml) with novel solutions rather than attempting to force through a result against others.

-This can also lead to, in effect, passivity: not considering solutions to problems other than wrangling large-scale governmental mechanisms. This is also harmful, since the government is [not omnicompetent](https://www.theonion.com/smart-qualified-people-behind-the-scenes-keeping-ameri-1819571706) and anything complicated is mired in horrifying bureaucratic quagmires of impenetrable dysfunction, as are most large-scale organizations.
+This can also lead to passivity and learned helplessness: not considering solutions to problems other than wrangling large-scale governmental mechanisms. This is also harmful, since the government is [not omnicompetent](https://www.theonion.com/smart-qualified-people-behind-the-scenes-keeping-ameri-1819571706) and anything complicated it does is mired in horrifying bureaucratic quagmires of impenetrable dysfunction, as are most large-scale organizations.

 ## Selfish Reasons To Not Participate

-Rather than merely not being a public good, I think involvement in politics is even individually harmful. The most obvious reason is opportunity cost - all the time spent reading political news, voting, forming opinions, or having conversations about it could be spent more effectively - but there is the further reason that because people often tie politics to their identities, political discussions are frequently damaging to relationships.
+Rather than merely not being a public good, I think involvement in politics is even individually harmful. The most obvious reason is opportunity cost - all the time spent reading political news, voting, forming opinions, or having conversations about it could be spent more effectively - but there is the further reason that because people often tie politics to their identities, political discussions are frequently damaging to relationships or prevent people who would otherwise get on fine from doing so.

-So if it's bad to participate, why is it so popular? The short answer is, to reuse the favourite adage of "ersatz" on the EleutherAI Discord server, "people are insane". We are [adaptation-executors, not fitness-maximizers](https://www.lesswrong.com/posts/XPErvb8m9FapXCjhA/adaptation-executers-not-fitness-maximizers), built on evolved cognitive heuristics optimized for ancient savannah environments in smaller tribes. It's plausible that in those, tractability and neglectedness were much lower and social missteps or groups moving against you significantly costlier, the resulting strategies misgeneralize to today's world of 8 billion people, and few people bother to explicitly reason about the cost/benefit and override this. The system is also hyperstitious: now that political interaction is considered altruistic and expected, people are incentivized to participate more for signalling reasons.
+So if it's bad to participate, why is it so popular? The short answer is, to reuse the favourite adage of "ersatz" on the EleutherAI Discord server, "people are insane". We are [adaptation-executors, not fitness-maximizers](https://www.lesswrong.com/posts/XPErvb8m9FapXCjhA/adaptation-executers-not-fitness-maximizers), built on evolved cognitive heuristics optimized for ancient savannah environments in smaller tribes. It's plausible that in those, tractability and neglectedness were much better and social missteps or groups moving against you significantly costlier, the resulting strategies misgeneralize to today's world of 8 billion people, and few people bother to explicitly reason about the cost/benefit and override this. The system is also self-reinforcing: now that political interaction is considered altruistic and expected, people are incentivized to participate more for signalling reasons.

-This can also be blamed on cultural evolution/memetics. As with religions, the most contagious ideologies are selected for and propagate, growing more able to effectively capture human attention regardless of actual value to their hosts. The incentives of media also help: receiving payment for clicks on your videos and articles incentivizes recapitulation of the same process through deliberate design, resulting in content optimized to spread through exploiting outrage and tribalism.
+This can also be blamed on cultural evolution/memetics. As with religions, the most contagious ideologies are selected for and propagate, growing more able to effectively capture human attention regardless of actual value to their hosts. The incentives of media also help: receiving payment for clicks on your videos and articles results in intentional optimization for the same thing, leading to content optimized to spread through exploiting outrage and tribalism.

 ## Universalizability

-The most common objection I've heard is along the lines of "but if everyone did this, no political improvement would occur and the world would be much worse off". This is true but irrelevant: I'm not a Kantian and don't only advocate for behaviors which need to apply to everyone at once. In the current state of the world, I think the marginal benefit (to everyone, and to you) of engagement is below the marginal cost and so it should be avoided - if a sufficiently large amount of people agreed with me on this and did so, some of my arguments would apply less and it would become more worthwhile, and I might then argue in favour of political engagement.
+The most common objection I've heard is along the lines of "but if everyone did this, no political improvement would occur and the world would be much worse off". This is true but irrelevant: I'm not a Kantian and don't only advocate for behaviors which need to apply to everyone at once. In the current state of the world, I think the marginal benefit (to everyone, and to you) of engagement is below the marginal cost and so it should be avoided - if a sufficiently large amount of people agreed with me on this and did so, my arguments would apply less and it would become more worthwhile, and I might then argue in favour of political engagement.

-Another is the claim that I am a privileged person who is only able to ignore politics because I'm not heavily threatened or discriminated against by existing instutions. This also misses the point somewhat - this affects importance, but not neglectedness or tractability, which are still, I think, so much lower than people's behaviour implies that this argument holds up.
+Another is the claim that I am a privileged person who is only able to ignore politics because I'm not heavily threatened or discriminated against by existing instutions. This is entirely missing the point: being more affected by something does not make you more able to affect it.

-If you have any arguments against my argument I haven't addressed here, please tell me so I can think about them.
+The best I've had[^3] is that even if standard political engagement doesn't do anything, there are some activites considered "politics" which do work and which are reasonably accessible to indviduals, such as local organization, engaging directly with figures in government or writing detailed policy proposals. This is plausibly true, but it's almost entirely orthogonal to most interaction, and having strong opinions on politics tends to bias your judgment of how effective and reasonable your actions actually are.
+
+If you have any arguments against my argument I haven't addressed here, please tell me so I can think about them.
+
+[^1]: In an amazingly titled essay (["Against Kind Informed Voters"](https://www.overcomingbias.com/p/against-kind-informed-voters), released on Christmas), Robin Hanson argues that becoming more politicially informed and loudly demonstrating that is *in itself* a selfish action which incentivizes politicians to pay more attention to you, via a retrospective voting model. This is cool but also seems impractically galaxybrained.
+
+[^2]: In some countries (e.g. Australia) it's even compulsory.
+
+[^3]: And the reason why this post was accidentally left as an unfinished draft for several months.
--- a/experiments/guihacker/index.html
+++ b/experiments/guihacker/index.html
@ -30,6 +30,9 @@ description: <a href="https://github.com/osmarks/guihacker">My fork</a> of GUIHa
    .output-console {
        position: fixed;
        overflow: hidden;
+        left: 0;
+        padding-left: 1em;
+        padding-right: 1em;
    }
    p {
        margin:0
--- a/src/page.js
+++ b/src/page.js
@ -377,7 +377,7 @@ window.points = (async () => {
 const footnotes = document.querySelector(".footnotes")
 const sidenotes = document.querySelector(".sidenotes")
 if (sidenotes && footnotes) {
-    const codeblocks = document.querySelectorAll("pre.hljs")
+    const codeblocks = document.querySelectorAll(".wider")
    const article = document.querySelector(".content")
    while (footnotes.firstChild) {
        sidenotes.appendChild(footnotes.firstChild)
--- a/src/style.sass
+++ b/src/style.sass
@ -42,6 +42,10 @@ pre, code, .deemph
 a
    text-decoration: none

+.blog-post a, .sidenotes a
+    text-decoration: underline
+
+
 nav
    display: flex
    align-items: center
@ -49,7 +53,7 @@ nav
    padding: 1em
    margin-bottom: 0.5em
    background: black
-    overflow-x: scroll
+    overflow-x: auto
    font-size: 1.1em

    .logo
@ -76,7 +80,11 @@ h1, h2, h3, h4, h5, h6
    margin: 0
    font-weight: 600
    a
-        color: inherit    
+        color: inherit
+        text-decoration: none !important
+
+summary h1, summary h2
+    display: inline

 // for easier viewing on big screen devices, narrow the width of text
 // also make links a bit more distinct
@ -168,11 +176,11 @@ button, select, input, textarea, .textarea
        font-weight: 600

 .caption
-    width: calc(100% - 2em)
+    width: calc(100% - 3em)
    background: lightgray
    border: 1px solid black
    padding: 1em
-    margin: -1px
+    margin: 0.5em
    img, picture
        width: 100%

@ -181,6 +189,15 @@ blockquote
    border-left: 0.4rem solid black
    margin-left: 0.2rem

+.wider
+    width: calc(100vw - 2 * $content-margin)
+    max-width: 80em
+    min-width: 40em
+    > *
+        min-width: 40em
+    position: relative
+    z-index: 1
+
 .microblog p
    margin: 0

@ -192,6 +209,9 @@ blockquote
    min-width: $sidenotes-width
    padding-left: 1.5rem
    position: relative
+    p
+        margin: 0
+
 .footnotes-sep
    display: none
 .footnotes-list
@ -249,7 +269,7 @@ blockquote
    a
        color: lightblue
        &:visited
-            color: mediumorchid
+            color: #e17701

    .caption
        background: #333
@ -259,4 +279,24 @@ blockquote
        --autocol-saturation: 50%

    nav .logocont
-        color: white
+        color: white
+
+.sidenotes img
+    width: 100%
+    max-width: 15em
+    display: block
+
+.hoverdefn
+    text-decoration-style: dotted
+    text-decoration-line: underline
+
+.section-header
+    margin-top: 0.5em
+
+table
+    border-collapse: collapse
+    td, th
+        border: 1px solid gray
+        padding: 0.4em
+    th
+        white-space: nowrap
--- a/templates/index.pug
+++ b/templates/index.pug
@ -2,7 +2,7 @@ extends layout.pug

 block content
    div.content
-        h2 Blog
+        h2.section-header Blog
        p.
            Read my opinions via the internet.
        div.blog
@ -16,14 +16,14 @@ block content
                        div.deemph= `${renderDate(post.created)} / ${metricPrefix(post.wordCount, "")} words`
                        div.description!= post.description

-        h2 Microblog
+        h2.section-header Microblog
        p.
            Short-form observations.
        div.microblog
            each entry in microblog
                != entry

-        h2 Experiments
+        h2.section-header Experiments
        p.
            Various web projects I have put together over many years. Made with at least four different JS frameworks. Some of them are bad.
        div.experiments