deploy webs

2025-05-27 11:44:09 +00:00 · 2023-08-31 13:00:53 +01:00 · 2023-08-31 13:00:53 +01:00 · f5165ea7dd
commit f5165ea7dd
parent e772af915f
22 changed files with 328 additions and 50 deletions
--- a/assets/images/maghammer.jpg
+++ b/assets/images/maghammer.jpg
--- a/assets/images/maghammer_1.png
+++ b/assets/images/maghammer_1.png
--- a/assets/images/maghammer_2.png
+++ b/assets/images/maghammer_2.png
--- a/assets/images/maghammer_3.png
+++ b/assets/images/maghammer_3.png
--- a/assets/images/maghammer_4.png
+++ b/assets/images/maghammer_4.png
--- a/assets/images/maghammer_5.png
+++ b/assets/images/maghammer_5.png
--- a/assets/images/progedu.jpg
+++ b/assets/images/progedu.jpg
--- a/assets/kreon.woff2
+++ b/assets/kreon.woff2
--- a/assets/share-tech-mono.woff2
+++ b/assets/share-tech-mono.woff2
--- a/assets/titillium-web-semibold.woff2
+++ b/assets/titillium-web-semibold.woff2
--- a/assets/titillium-web.woff2
+++ b/assets/titillium-web.woff2
--- a/blog/maghammer.md
+++ b/blog/maghammer.md
@ -0,0 +1,84 @@
+---
+title: "Maghammer: My personal data warehouse"
+created: 28/08/2023
+updated: 29/08/2023
+description: Powerful search tools as externalized cognition, and how mine work.
+slug: maghammer
+---
+I have had this setup in various bits and pieces for a while, but some people expressed interest in its capabilities and apparently haven't built similar things and/or weren't aware of technologies in this space, so I thought I would run through what I mean by "personal data warehouse" and "externalized cognition" and why they're important, how my implementation works, and other similar work.
+
+## What?
+
+Firstly, "personal data warehouse". There are a lot of names and a lot of implementations, but the general idea is a system that I can use to centrally store and query personally relevant data from various sources. Mine is mostly focused on text search but is configured so that it can (and does, though not as much) work with other things. Proprietary OSes and cloud platforms are now trying to offer this sort of thing, but not very hard. My implementation runs locally on my [server](/stack/), importing data from various sources and building full-text indices.
+
+Here are some other notable ones:
+
+* [Stephen Wolfram's personal analytics](https://writings.stephenwolfram.com/2012/03/the-personal-analytics-of-my-life/) - he doesn't describe much of the implementation but does have an impressively wide range of data.
+* [Dogsheep](https://dogsheep.github.io/) - inspired by Wolfram (the name is a pun), and the basis for a lot of Maghammer.
+* [Recoll](https://www.lesbonscomptes.com/recoll/pages/index-recoll.html) - a powerful file indexer I also used in the past.
+* [Rewind](https://www.rewind.ai/) - a shinier more modern commercial tool (specifically for MacOS...) based on the somewhat weird approach of constantly recording audio and screen content.
+* [Monocle](https://thesephist.com/posts/monocle/) - built, apparently, to learn a new programming language, but it seems like it works well enough.
+
+You'll note that not all of these projects make any attempt to work on non-text data, which is a reasonable choice, since these concerns are somewhat separable. I personally care about handling my quantitative data too, especially since some of it comes from the same sources, and designed accordingly.
+
+## Why?
+
+Why do I want this? Because human memory is very, very bad. My (declarative) memory is much better than average, but falls very far short of recording everything I read and hear, or even just the source of it (I suspect this is because of poor precision (in the information retrieval sense) making better recall problematic, rather than actual hard limits somewhere - there are documented people with photographic memory, who report remembering somewhat unhelpful information all the time - but without a way to change that it doesn't matter much). According to [Landauer, 1986](https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1004_4)'s estimates, the amount of retrievable information accumulated by a person over a lifetime is less than a gigabyte, or <0.05% of my server's disk space. There's also distortion in remembered material which is hard to correct for. Information is simplified in ways that lose detail, reframed or just changed as your other beliefs change, merged with other memories, or edited for social reasons.
+
+Throughout human history, even before writing, the solution to this has been externalization of cognitive processing: other tiers in the memory hierarchy with more capacity and worse performance. While it would obviously be [advantageous](/rote/) to be able to remember everything directly, just as it would be great to have arbitrarily large amounts of fast SRAM to feed our CPUs, tradeoffs are forced by reality. Oral tradition and culture were the first implementations, shifting information from one unreliable human mind to several so that there was at least some redundancy. Writing made for greater robustness, but the slowness of writing and copying (and for a long time expense of hardware) was limiting. Printing allowed mass dissemination of media but didn't make recording much easier for the individual. Now, the ridiculous and mostly underexploited power of contemporary computers makes it possible to literally record (and search) everything you ever read at trivial cost, as well as making lookups fast enough to integrate them more tightly into workflows. Roam Research popularized the idea of notes as a "second brain", but it's usually the case that the things you want to know are not ones you thought to explicitly write down and organize.
+
+More concretely, I frequently read interesting papers or blog posts or articles which I later remember in some other context - perhaps they came up in a conversation and I wanted to send someone a link, or a new project needs a technology I recall there being good content on. Without good archiving, I would have to remember exactly where I saw it (implausible) or use a standard, public search engine and hope it will actually pull the document I need. Maghammer (mostly) stores these and allows me to find them in a few seconds (fast enough for interactive online conversations, and not that much slower than Firefox's omnibox history search) as long as I can remember enough keywords. It's also nice to be able to conveniently find old shell commands for strange things I had to do in the past, or look up sections in books (though my current implementation isn't ideal for this).
+
+## How?
+
+I've gone through a lot of implementations, but they all are based on the general principle of avoiding excessive effort by reusing existing tools where practical and focusing on the most important functionality over minor details. Initially, I just archived browser history with [a custom script](https://github.com/osmarks/random-stuff/blob/master/histretention.py) and stored [SingleFile](https://addons.mozilla.org/en-US/firefox/addon/single-file/) HTML pages and documents, with the expectation I would set up search other than `grep` later. I did in fact eventually (November 2021) set up Recoll (indexing) and [Recoll WE](https://www.lesbonscomptes.com/recoll/faqsandhowtos/IndexWebHistory) (to store all pages rather than just selected ones, or I suppose all of the ones without only client-side logic), and they continued to work decently for some time. As usually happens with software, I got dissatisfied with it for various somewhat arbitrary reasons and prototyped rewrites.
+
+These were not really complete enough to go anywhere (some of them reimplemented an entire search engine for no particular reason, one worked okay but would have been irritating to design a UI for, one works for the limited scope of indexing Calibre but doesn't do anything else) so I continued to use Recoll until March 2023, when I found [Datasette](https://datasette.io/) and [the author's work on personal search engines](https://datasette.substack.com/p/dogsheep-personal-analytics-with) and realized that this was probably the most viable path to a more personalized system. My setup is of course different from theirs, so I wrote some different importer scripts to organize data nicely in SQLite and build full text search indices, and an increasingly complicated custom plugin to do a few minor UI tweaks (rendering timestamp columns, fixing foreign keys on single-row view pages, doing links) and reimplement something like [datasette-search-all](https://github.com/simonw/datasette-search-all/) (which provides a global search bar and nicer search UI).
+
+Currently, I have custom scripts to import this data, which are run nightly as a batch job:
+
+* Anki cards from [anki-sync-server](https://github.com/ankicommunity/anki-sync-server/)'s database - just the text content, because the schema is weird enough that I didn't want to try and work out how anything else was stored.
+* Unorganized text/HTML/PDF files in my archives folder.
+* Books (EPUB) stored in Calibre - overall metadata and chapter full text.
+* Media files in my archive folder (all videos I've watched recently) - format, various metadata fields, and full extracted subtitles with full text search.
+* [Miniflux](/rssgood/) RSS feed entries.
+* [Minoteaur](/minoteaur/) notes, files and structured data. I don't have links indexed since SQLite isn't much of a graph database (no, I will not write a recursive common table expression for it), and my importer reads directly off the Minoteaur database and writing a Markdown parser would have been annoying.
+* RCLWE web history (including the `circache` holding indexed pages in my former Recoll install).
+
+There are also some other datasets handled differently, because the tools I use for those happened to already use SQLite somewhere and had reasonably usable formats. Specifically, [Gadgetbridge](https://www.gadgetbridge.org/) data from my smartwatch is copied off my phone and accessible in Datasette, [Atuin](https://github.com/ellie/atuin)'s local shell history database is symlinked in, Firefox history comes from [my script](https://github.com/osmarks/random-stuff/blob/master/histretention.py) on my laptop rather than the nightly serverside batch job, and I also connected my Calibre library database, though I don't actually use that. 13GB of storage is used in total.
+
+This is some of what the UI looks like - it is much like a standard Datasette install with a few extra UI elements and some style tweaks I made:
+
+<div class="caption">
+    <img src="/assets/images/maghammer_1.png">
+    <div>Viewing browser history through the table view. This is not great on narrower screens. I'm intending to reengineer this a little at some point.</div>
+</div>
+<div class="caption">
+    <img src="/assets/images/maghammer_2.png">
+    <div>The redone search-all interface. My plugin makes clickable links pointing to my media server.</div>
+</div>
+<div class="caption">
+    <img src="/assets/images/maghammer_3.png">
+    <div>The front page, listing databases and tables and with the search bar.</div>
+</div>
+
+Being built out of a tool intended for quantitative data processing means that I can, as I mentioned, do some quantitative data processing. While I could in principle do things like count shell/browser history entries by date, this isn't very interesting, and the cooler datasets are logs from my watch (heart rate and step count), although I haven't gotten around to producing nice aggregates from these, and the manually written structured data entries from my journal. For the reasons described earlier I write up a lot of information in journal entries each day, including machine-readable standardized content. I haven't backfilled this for all entries as it requires a lot of work to read through them and write up the tags, but even with only fairly recent entries usable it's still provided significant insight.
+
+<div class="caption">
+    <img src="/assets/images/maghammer_4.png">
+    <div>A simple aggregate query of my notes' structured data. Redacted for privacy.</div>
+</div>
+<div class="caption">
+    <img src="/assets/images/maghammer_5.png">
+    <div>Not actually a very helpful format.</div>
+</div>
+
+While it's not part of the same system, [Meme Search Engine](https://mse.osmarks.net/) is undoubtedly useful to me for rapidly finding images (memetic images) I need or want - so much so that I have a separate internal instance run on my miscellaneous-images-and-screenshots folder. Nobody else seems to even be trying - while there are a lot of demos of CLIP image search engines on GitHub, and I think one with the OpenAI repository, I'm not aware of *production* implementations with the exception of [clip-retrieval](https://github.com/rom1504/clip-retrieval) and the LAION index deployment, and one iPhone app shipping a distilled CLIP. There's not anything like a user-friendly desktop app, which confuses me somewhat, since there's clearly demand amongst people I talked to. Regardless of the reason, this means that Meme Search Engine is quite possibly the world's most advanced meme search tool (since I bothered to design a nice-to-use query UI and online reindexing), although I feel compelled to mention someone's [somewhat horrifying iPhone OCR cluster](https://findthatmeme.com/blog/2023/01/08/image-stacks-and-iphone-racks-building-an-internet-scale-meme-search-engine-Qzrz7V6T.html). Meme Search Engine is not very well-integrated but I usually know which dataset I want to retrieve from anyway.
+
+## Future directions
+
+The system is obviously not perfect. As well as some minor gaps (browser history isn't actually put in a full-text table, for instance, due to technical limitations), many data sources (often ones with a lot of important content!) aren't covered, such as my emails and conversation history on e.g. Discord. I also want to make better use of ML - for instance, integrating things like Meme Search Engine better, local Whisper autotranscription of videos rather than having no subtitles or relying on awful YouTube ones, semantic search to augment the default [SQLite FTS](https://www.sqlite.org/fts5.html) (which uses term-based ranking - specifically, BM25), and OCR of screenshots. I still haven't found local/open-source OCR which is both good, generalizable and usable (Apple's software works excellently but it's proprietary). Some of the trendier, newer projects in this space use LLMs to do retrieval-augmented generation, but I don't think this is a promising direction right now - available models are either too dumb or too slow/intensive, even on GPU compute, and in any case prone to hallucination.
+
+Another interesting possibility for a redesign I have is a timeline mode. Since my integration plugin (mostly) knows what columns are timestamps, I could plausibly have a page display all relevant logs from a day and present them neatly.
+
+If you have related good ideas or correct opinions, you may tell me them below. The code for this is somewhat messy and environment-specific, but I may clean it up somewhat and release it if there's interest in its specifics.
--- a/blog/minoteaur.md
+++ b/blog/minoteaur.md
@ -2,20 +2,8 @@
 title: Minoteaur
 description: The history of the feared note-taking application.
 created: 06/06/2023
+updated: 28/08/2023
 ---
-<style>
-    .caption {
-        width: calc(100% - 2em);
-        background: lightgray;
-        border: 1px solid black;
-        padding: 1em;
-        margin: -1px;
-    }
-    .caption img {
-        width: 100%;
-    }
-</style>
-
 If you've talked to me, you've probably heard of Minoteaur.
 It was conceptualized in 2019, when I determined that it was a good idea to take notes on things in a structured way, looked at all existing software, and was (as usual, since all software is terrible; I will probably write about this at some point) vaguely unsatisifed by it.
 I don't actually remember the exact details, since I don't have notes on this which I can find, but this was around the time when Roam Research leaked into the tech noösphere, and I was interested in and generally agreed with the ideas of graph-structured note-taking applications, with easy and fast flat organization.
@ -63,7 +51,7 @@ It "mostly worked" at the level of Minoteaur 1, but also proved annoying to work

 When I got sufficiently annoyed by that again, I rewrote it in Nim for [Minoteaur 6](https://git.osmarks.net/osmarks/minoteaur).
 Nim is sort of how I would design a programming language, both in the sense that it makes a lot of nice decisions I agree with (extensive metaprogramming, style insensitivity) and in that it's somewhat quirky and I don't understand why some things happen (particularly with memory management, for which it has seemingly several different incompatible systems which can be switched between at compile time).
-It has enough working libraries for things like SQLite and webservers that I thought it worth trying anyway, and it was indeed the most functional Minoteaur at the time, incorporating good SQLite-based search, backlinks, a mostly functional UI, partly style-insensitive links, a reasonably robust parser, and a decent UI, and even DokuWiki-like drafts in the editor (a feature I end up using quite often due to things like accidentally closing or refreshing pages).
+It has enough working libraries for things like SQLite and webservers that I thought it worth trying anyway, and it was indeed the most functional Minoteaur at the time, incorporating good SQLite-based search, backlinks, a mostly functional UI, partly style-insensitive links, a reasonably robust parser, a decent UI, and even DokuWiki-like drafts in the editor (a feature I end up using quite often due to things like accidentally closing or refreshing pages).
 However, I got annoyed again by the server-rendered design, the terrible, terrible code I had to write to directly bind to a C-based GFM library (I think I at least managed to make it not segfault, even though I don't know why), and probably some things I forgot, leading to the *next* version.

 <div class="caption">
@ -82,7 +70,7 @@ However, I got annoyed again by the server-rendered design, the terrible, terrib
 Python is my go-to language for rapid prototyping, i.e. writing poor-quality code very quickly, so it made some sense for me to rewrite in that next in 2021.
 Minoteaur 7 was a short-lived variant using server rendering, which was rapidly replaced by Minoteaur 7.1, which used a frontend web framework called Svelte for its UI.
 It contained many significant departures from all previous Minoteaurs, mostly for the better: notably, it finally incorporated indirection for pages.
-While all previous implementations had just stored pages under their (somewhat normalized) title, I decided that not structuring it that way would be advantageous to allow pages to be renamed and referred to by multiple names, so instead pages have a unique, fixed ID and several switchable names.
+While all previous implementations had just stored pages under their (somewhat normalized) title, I decided that not structuring it that way would be advantageous in order to allow pages to be renamed and referred to by multiple names, so instead pages have a unique, fixed ID and several switchable names.
 This introduced the minor quirk that all Markdown parsing and rendering was done on the backend, which was not really how I'd usually do things but did actually make a good deal of the code simpler (since it is necessary to parse things there to generate plaintext for search).
 As a search mechanism, I also (since Python made this actually practical) used deep-learning-based semantic search (using [Sentence Transformers](https://www.sbert.net/)) rather than the term-based mechanisms in SQLite.
 This was actually quite easy to do thanks to the hard work of library developers, although I did write my own in-memory vector index for no clear reason, and frequently worked quite well (surfacing relevant content even if it didn't contain the right keywords) but with some unreliability (keyword matches were not always found).
@ -135,9 +123,11 @@ It can, however:
 Should you actually use it?
 Probably not: while it works reliably enough for me, this is because I am accustomed to its strangeness and deliberately designed it to my requirements rather than anyone else's, sometimes in ways which are very hard to change now (for example, adding things like pen drawings would be really hard structurally, and while there was a Minoteaur 8 prototype with a different architecture which would have made that easier, it was worse to write most code for so I didn't go ahead with that), and can rewrite and debug it easily enough if I have to.
 Other people cannot.
-I am not writing this in order to convince people to switch over (that would create suppot requests) but to provide context and show off my technical achievement, such as it is.
+I am not writing this in order to convince people to switch over (that would create support requests) but to provide context and show off my technical achievement, such as it is.

 ## Future directions

 While it works as-is, mostly, active real-world use has given me ideas about how it could be better.
-At this time, I'm mostly interested in improving the search mechanism to include phrase queries, negative queries and exact match queries, better integration with external tools (for example, with some engineering effort I could move Anki card specifications into notes and not have to maintain that separately), and a structured data mechanism for attaching machine-readable content to pages.
+~~At this time, I'm mostly interested in improving the search mechanism to include phrase queries, negative queries and exact match queries, better integration with external tools (for example, with some engineering effort I could move Anki card specifications into notes and not have to maintain that separately), and a structured data mechanism for attaching machine-readable content to pages.~~
+
+I actually did add some of these. The search mechanism does now allow "exact" and "negative" queries, although it still has some brokenness I intend to fix at some point, and there's a fully featured structured data mechanism. Pages can have a list of key/value pairs attached (numeric or textual) and can then be queried by those using a few operators in the search.
--- a/blog/other-stuff.md
+++ b/blog/other-stuff.md
@ -2,7 +2,7 @@
 title: Other things you may like
 description: A nonexhaustive list of... content/media... which I like and which you may also be interested in as a visitor of my site.
 created: 11/06/2020
-updated: 23/05/2023
+updated: 30/08/2023
 slug: otherstuff
 ---
 I'm excluding music from this because music preferences seem to be even more varied between the people I interact with than other stuff.
@ -13,7 +13,7 @@ Obviously this is just stuff *I* like; you might not like it, which isn't really
 * [The Hitchhiker's Guide to The Galaxy](https://www.goodreads.com/series/40957-hitchhiker-s-guide-to-the-galaxy) by Douglas Adams (a series). It is pretty popular but quite a few people aren't aware of it, which is a shame. Regarded as some of the best science-fiction comedy ever. Very surrealist.
 * [Sufficiently Advanced Magic](https://www.goodreads.com/book/show/34403860-sufficiently-advanced-magic) by Andrew Rowe. Progression fantasy with an interesting magic system. It's part of a series containing <del>two</del> four books so far (unfinished).
 * [Mistborn](https://www.goodreads.com/series/40910-mistborn) by Brandon Sanderson. Initially seems like a pretty standard "chosen one must defeat the evil empire"-type story, is actually much more complex.
-* [Discworld](https://en.wikipedia.org/wiki/Discworld) by Terry Pratchett, a *very* long-running (41 books, but it's sort of made of various miniserieses so you don't really need to read all of them or in order) fantasy series set on the "Discworld", a flat world on the back of four elephants on a turtle. As you might expect from that description, it's somewhat comedic, but also has long-running plot arcs, great character development, and a world not stuck in medieval stasis (as new technology is introduced and drives some of the plots).
+* [Discworld](https://en.wikipedia.org/wiki/Discworld) by Terry Pratchett, a *very* long-running (41 books, but it's sort of made of various miniserieses so you don't really need to read all of them or read in order) fantasy series set on the "Discworld", a flat world on the back of four elephants on a turtle. As you might expect from that description, it's somewhat comedic, but also has long-running plot arcs, great character development, and a world not stuck in medieval stasis (as new technology is introduced and drives some of the plots).
  * He has good collaboratively-written books like [The Long Earth](https://www.goodreads.com/book/show/13147230-the-long-earth) and Good Omens (mentioned below).
 * [Minecraft](https://www.minecraft.net/en-us/). You've probably heard of it, as it's apparently the most popular computer game ever, but it seems worth listing. It's a block-based sandbox game in which you can do a lot of stuff. 
  * Java Edition, which you should probably be playing anyway instead of the mobile version/Windows 10 Edition/Bedrock Edition/the console one/whatever else because it lacks the horrible, horrible microtransactions Microsoft implemented, has mod support, allowing you to use a *huge* range of extra content for free. This includes stuff like [programmable computers](https://www.curseforge.com/minecraft/mc-mods/cc-tweaked), [machines and stuff](https://www.curseforge.com/minecraft/mc-mods/thermal-expansion), [new "dimensions"](https://www.curseforge.com/minecraft/mc-mods/the-twilight-forest) (I do NOT like this use of this word but it's seeped into popular terminology), a complex [magic system](https://www.curseforge.com/minecraft/mc-mods/thaumcraft) (note that this is no longer updated, you should consider Astral Sorcery and Botania and other modern ones which are), and this [one modpack](https://www.technicpack.net/modpack/mcnewhorizons.677387) (well, there are probably others) with incredibly complex progression which could take [months](https://www.youtube.com/playlist?list=PLliiJ70rl2NvJjby2LoVuP1EuOvRAyf97) to finish.
@ -25,7 +25,6 @@ Obviously this is just stuff *I* like; you might not like it, which isn't really
 * [Sixteen Ways to Defend a Walled City](https://www.goodreads.com/book/show/37946419-sixteen-ways-to-defend-a-walled-city) by K. J. Parker. A funny book about an irreverent engineer running a city.
 * [Doing God's Work](https://www.royalroad.com/fiction/25442/doing-gods-work), a web serial about a rapidly escalating plot to dethrone God (which is "based").
 * [styropyro](https://youtube.com/user/styropyro/), the top search result for "crazy laser guy". Builds interesting lasery things (also Tesla coils and whatnot). Also has a [Discord server](https://discord.gg/ckGrMDR), which hosts many interesting discussions about primarily lasers and electronics, but many other things too.
-* [Towers of Heaven](https://www.goodreads.com/series/264587-towers-of-heaven) by Cameron Milan, a [LitRPG](https://en.wikipedia.org/wiki/LitRPG) series about someone travelling back in time to save humanity from extinction because of the arrival of the towers, invulnerable extremely tall... towers... containing challenges (and which also release monsters periodically on the world around them, hence the "extinction" thing).
 * [Ender's Game](https://en.wikipedia.org/wiki/Ender%27s_Game) by Orson Scott Card, a scifi book about children being trained to be the next leaders in soldiers in humanity's war with some aliens. I am not really a fan of the sequels.
 * [Chilli and the Chocolate Factory](https://www.fanfiction.net/s/13451176/1/Chili-and-the-Chocolate-Factory-Fudge-Revelation) by gaizemaize, a now-completed web serial. It is, unsurprisingly, Charlie and the Chocolate Factory fanfiction which is actually pretty good. It manages to capture the bizarre surreal spirit of the original one, and is very funny. I vaguely suspect that the whole thing might just be convoluted setup for a pun.
 * [UNSONG](http://unsongbook.com/) by Scott Alexander, which is *also* a now-completed web serial. A bizarre world in which, after Apollo 8 crashes into the crystal sphere surrounding the world, the planet switches over to running on kabbalistic Judaism. It sounds very strange, and it *is*, but Scott makes it work while demonstrating the power of ridiculous pareidolia.
@ -35,21 +34,30 @@ Obviously this is just stuff *I* like; you might not like it, which isn't really
 * [The Expanse](https://www.goodreads.com/series/56399-the-expanse) by James S. A. Corey, a near-future-ish scifi series in space which actually bothers with some level of realism. Also a TV series now if you prefer those.
 * [Three Parts Dead](https://www.goodreads.com/book/show/13539191-three-parts-dead) by Max Gladstone, a very neat fantasy book (part of the "Craft Sequence"; I have also read "Two Serpents Rise" now) with a fairly modern-but-different world built on "Craft" (essentially, human emulations of the gods' powers: this caused some conflict in the backstory) and applied theology.
 * [We Are Legion (We Are Bob)](https://www.goodreads.com/book/show/32109569-we-are-legion-we-are-bob) by Dennis E. Taylor, a story of von Neumann probes managed by uploaded human intelligences.
-* [The Combat Codes](https://www.goodreads.com/book/show/27790093-the-combat-codes) by Alexander Darwin. Vaguely like Ender's Game but hand-to-hand combat and an exotic-feeling sort of science fantasy world. I think they're doing a rerelease with edited versions soon.
+  * [IO.SYS](https://www.datapacrat.com/IO.SYS.html) is a short story with a somewhat similar concept but significantly darker.
+* [The Combat Codes](https://www.goodreads.com/book/show/27790093-the-combat-codes) by Alexander Darwin. Vaguely like Ender's Game but hand-to-hand combat and an exotic-feeling sort of science fantasy world. I think they're doing a rerelease with edited versions ~~soon~~ now, so it might be hard to find.
 * [Schlock Mercenary](https://www.schlockmercenary.com/), a *very* long-running space opera webcomic. It's been running for something like 20 years, and the art and such improve over time. Now finished (for now).
 * [Freefall](http://freefall.purrsia.com/), a hard-science-fiction webcomic.
-* [Mage Errant](https://www.goodreads.com/series/252085-mage-errant) - a moderately-long-by-now fantasy series with a very vibrant world, and which actually considers the geopolitical implications of there being beings around ("Great Powers") able to act as one-man armies. Now complete (I haven't read the last two books though).
+* [Mage Errant](https://www.goodreads.com/series/252085-mage-errant) - a moderately-long-by-now fantasy series with a very vibrant world, and which actually considers the geopolitical implications of there being beings around ("Great Powers") able to act as one-man armies. Now complete ~~(I haven't read the last two books though)~~ though I do not like the last book as much in some ways.
 * [Void Star](https://www.goodreads.com/book/show/29939057-void-star) - somewhat weird and good. The prose is very... poetic is probably the best word (it contains phrases like "isoclines of commitment and dread", "concentric and innumerable" and "high empyrean")... which I enjoyed, but it is polarizing. The setting seems like a generally reasonable extrapolation of a bunch of ongoing trends into the future, although it's unclear exactly *when* it is (some of the book implies 2150 or so, but this seems implausible). Its most interesting characteristic is that it absolutely does not tell you what's going on ever: an interview I read said it was written out of order, and that makes sense (another fun quirk of it is that the chapters are generally very short). I think I know most of what happens now, but it has taken a while.
 * [Firefall](https://www.goodreads.com/book/show/22838183-firefall) (Blindsight/Echopraxia) - one of those rare books which is actually decent at portraying very alien intelligences. I preferred Blindsight to Echopraxia but both are worth reading. Some people seem to have thought that it is "cosmic horror" (particularly Blindsight) and/or had their psyche shattered by the implications of what they read within, but this didn't happen to me for whatever reason. Also, Peter Watts somehow makes vampires work well scientifically.
 * [Endeavour](https://www.goodreads.com/book/show/22701594-endeavour) and the sequel, Erebus, are apparently science fiction I liked. I don't actually remember much about them, but empirically my fiction preferences are pretty consistent across time.
 * [12 Miles Below](https://www.royalroad.com/fiction/42367/12-miles-below/) - ongoing webserial (I am not fully caught up or close to it yet) with intelligent and well-written characters. It has more grammar/spelling errors than I would like (I would like none) but most people care about this less than me.
 * [Branches on the Tree of Time](https://www.fanfiction.net/s/9658524/1/Branches-on-the-Tree-of-Time) - Terminator fanfiction which manages to make Terminator make sense (somewhat).
-* [The Daily Grind](https://www.royalroad.com/fiction/15925/the-daily-grind) - ongoing (I think? I got distracted from following it at some point and it's now really very long) webserial 
+* [The Daily Grind](https://www.royalroad.com/fiction/15925/the-daily-grind) - ongoing (I think? I got distracted from following it at some point and it's now really very long) webserial about relentlessly and realistically exploiting a dungeon in the modern world.
 * [CORDYCEPS: Too clever for their own good](https://archiveofourown.org/works/6178036/chapters/14154868) - good short horror/mystery; I will not spoil it further.
 * [Schild's Ladder](https://www.goodreads.com/book/show/156780.Schild_s_Ladder) - essentially just Greg Egan showing off cool physics ideas, but I quite like that. Egan also manages to pull off an actually-futuristic future society and world.
+  * Egan has short story anthologies which I have also read and recommend.
 * [Stories of Your Life and Others](https://www.goodreads.com/book/show/223380.Stories_of_Your_Life_and_Others) - just very good short stories. Chiang has written a sequel, [Exhalation](https://www.goodreads.com/book/show/41160292-exhalation), which I also entirely recommend.
+  * He also write [Arrival](https://www.goodreads.com/book/show/32200035-arrival). I like this but not the movie, since the movie's scriptwriters clearly did not understand what was going on.
 * [A Hero's War](https://m.fictionpress.com/s/3238329/1/A-Hero-s-War) - bootstrapping industrialization in a setting with magic. Unfortunately, unfinished and seems likely to remain that way.
 * [Snow Crash](https://www.goodreads.com/book/show/40651883-snow-crash) - a fun action story even though I don't take the tangents into Sumerian mythology (?) very seriously.
+  * Since this list was written, I think it became notorious for introducing the "metaverse" as pushed by Facebook now. This is very silly. Everyone who is paying attention knows that the real metaverse is Roblox.
+* [Limitless](https://en.wikipedia.org/wiki/Limitless_(TV_series)) (the movie is also decent) - actually among the least bad depictions of superhuman intelligence I've seen in media, and generally funny.
+* [Pantheon](https://en.wikipedia.org/wiki/Pantheon_(TV_series)) - unfortunately cancelled and pulled from streaming (for tax purposes somehow?) and thus hard to watch, but one of about three TV series I've seen on the subject of brain uploads, and I think the smartest. Some day I want my own ominous giant cube of servers in Norway.
+* [Mark of the Fool](https://www.goodreads.com/series/346305-mark-of-the-fool) - somewhat standardly D&D-like world, but the characters are well-written and take reasonable decisions.
+* [Nice Dragons Finish Last](https://www.goodreads.com/series/128485-heartstrikers) - enjoyable urban fantasy.
+* [Street Cultivation](https://www.goodreads.com/series/287542-street-cultivation) - again, sane characters who do not make obviously stupid decisions for plot reasons.

 Special mentions (i.e. "I haven't gotten around to reading these but they are well-reviewed and sound interesting") to:
 * [The Divine Cities](https://www.goodreads.com/series/159695-the-divine-cities) by Robert Jackson Bennet.
@ -62,7 +70,7 @@ Special mentions (i.e. "I haven't gotten around to reading these but they are we
 * [The Books of Babel](https://www.goodreads.com/series/127130-the-books-of-babel) by Josiah Bancroft.
 * [House of Suns](https://www.goodreads.com/book/show/1126719.House_of_Suns) by Alastair Reynolds.
  * "house of suns is really very good, you should read" - baidicoot/Aidan, creator of the world-renowned [Emu War](/emu-war) game
-* [Singularity Sky](https://www.goodreads.com/book/show/81992.Singularity_Sky)
+* [Singularity Sky](https://www.goodreads.com/book/show/81992.Singularity_Sky) by Charlie Stross.

 If you want EPUB versions of the free web serial stuff here for your e-reader, there are tools to generate those, or you can contact me for a copy.

--- a/blog/programming-education.md
+++ b/blog/programming-education.md
@ -0,0 +1,30 @@
+---
+title: Programming education, tacit knowledge and LLMs
+created: 02/07/2023
+description: Why programming education isn't very good, and my thoughts on AI code generation.
+slug: progedu
+---
+It seems to be fairly well-known (or at least widely believed amongst the people I regularly talk to about this) that most people are not very good at writing code, even those who really "should" be because of having (theoretically) been taught to (see e.g. <https://web.archive.org/web/20150624150215/http://blog.codinghorror.com/why-cant-programmers-program/>). Why is this? In this article, I will describe my wild guesses.
+
+General criticisms of formal education have [already been done](https://en.wikipedia.org/wiki/The_Case_Against_Education), probably better than I can manage to do. I was originally going to write about how the incentives of the system are not particularly concerned with testing people in accurate ways, but rather easy and standardizable ways, and the easiest and most standardizable ways are to ask about irrelevant surface details rather than testing skill. But this isn't actually true: automated testing of code to solve problems is scaleable enough that things like [Project Euler](https://projecteuler.net/) and [Leetcode](https://leetcode.com/) can test vast amounts of people without human intervention, and it should generally be *less* effort to do this than to manually process written exams. It does seem to be the case that programming education tends to preferentially test bad proxies for actual skill, but the causality probably doesn't flow from testing methods.
+
+I think it's more plausible that teaching focuses on this surface knowledge because it's much easier and more legible, and looks and feels very much like "programming education" to someone who does not have actual domain knowledge (because other subjects are usually done in the same way), or who [isn't thinking very much about it](https://srconstantin.wordpress.com/2019/02/25/humans-who-are-not-concentrating-are-not-general-intelligences/), and then similar problems and a notion that testing should be "fair" and "cover what students have learned" lead to insufficiently outcome-oriented exams, which then sets up incentives biasing students in similar directions. The underlying issue is a matter of "tacit knowledge": being good at programming requires sets of interlocking and hard-to-describe mental heuristics rather than a long list of memorized rules, and since applying them feels natural and easy - and most people who are now competent don't accurately remember lacking them - it is not immediately obvious that this is the case, and someone asked how they can do something is likely to focus on the things which are, to them, easier to explain and notice.
+
+So why is programming education particularly bad? Shouldn't *every* field be harmed by tacit knowledge transmission problems? My speculative answer is that they generally are, but it's much less noticeable and plausibly also a smaller problem. The heuristics used in programming are strange and unnatural - I'll describe a few of the important ones later - but the overarching theme is that programming is highly reductionist: you have to model a system very different to your own mind, and every abstraction breaks down in some corner case you will eventually have to know about. The human mind very much likes pretending that other systems are more or less identical to it - [animism](https://en.wikipedia.org/wiki/Animism) is no longer a particularly popular explicitly-held belief system, but it's still common to ascribe intention to machinery, "fate" and "karma", animals without very sophisticated cognition, and a wide range of other phenomena. Computers are not at all human, in that they do exactly what someone has set them up to do, which is often [not what they thought they were doing](https://gwern.net/unseeing), while many beginners expect them to "understand what they meant" and act accordingly. Every simple-looking capability is burdened with detail: the computer "knows what time it is" (thanks to some [nontrivial engineering](https://en.wikipedia.org/wiki/Network_Time_Protocol) with some possible failure points); the out-of-order CPU "runs just like an abstract in-order machine, but very fast" (until security researchers [find a difference](https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability))); DNS "resolves domain names to IPs" (but is frequently intercepted by networks, and can also serve as a covert backchannel); video codecs "make videos smaller" (but are also [complex domain-specific programming languages](https://wrv.github.io/h26forge.pdf)); text rendering "is just copying bitmaps into the right places" ([unless you care about Unicode or antialiasing or kerning](https://faultlore.com/blah/text-hates-you/)).
+
+The other fields which I think suffer most are maths and physics. Maths education mostly [fails to convey what mathematicians actually care about](https://www.maa.org/external_archive/devlin/LockhartsLament.pdf) and, despite some attempts to vaguely gesture at it, does not teach "problem-solving" skills as much as sometimes set nontrivial multistep problems and see if some people manage to solve them. Years of physics instruction [fail to stop many students falling back to Aristotlean mechanics](https://www.researchgate.net/profile/Richard-Gunstone/publication/238983736_Student_understanding_in_mechanics_A_large_population_survey/links/02e7e52f8a2f984024000000/Student-understanding-in-mechanics-A-large-population-survey.pdf) on qualitative questions. This is apparently mostly ignored, perhaps because knowledge without deep understanding is sufficient for many uses and enough people generalize to the interesting parts to supply research, but programming makes the problems more obvious, since essentially any useful work will rapidly run into things like debugging.
+
+So what can be done? I don't know. Formal education is likely a lost cause: incentives aren't aligned enough that a better way to teach would be adopted any time soon, even if I had one, and enough has been invested in existing methods that any change would be extremely challenging. I do, at least, have a rough idea of what good programmers have which isn't being taught well, but I don't know how you *would* teach these things effectively:
+
+* Intuitive understanding of what a computer is doing, at least to the level of tracing control flow (obviously computers are able to run a lot faster than humans and do complicated maths we usually cannot do mentally). I deride "[helping] you become a computer" in ['Problem Solving' Tasks and Computer Science](/csproblem), but without the ability to do this you cannot really frame problems algorithmically or debug.
+* Edge case generation - devising weird edge cases for a particular problem obviously helps with testing, can help with debugging (especially of only partially observable systems), and may elucidate algorithms.
+* Effective tool use - things like using Git and basic knowledge of Linux commands are frequently taught, but this is not really what I mean. Good programmers can select a novel tool for a particular task and quickly learn its most important capabilities, and have a smaller set of tools they know very well and can operate fast. A particularly good quick-to-operate tool can become an extension of your mental processes, at least when at a computer. I think fast typing is somewhat underappreciated, though I may be biased.
+* Rough full-stack knowledge - while knowing everything about modern computers is probably literally impossible in a human lifetime, broad knowledge is necessary to guess at where bugs might arise, as many of them come from interactions between your code and other systems.
+* Security mindset: as well as being directly useful for ensuring security, always thinking about where your assumptions might be flawed or how something might go wrong is vital for reliability.
+* Good code structuring, e.g. knowing when to disaggregate or aggregate modules. I think that lots of people, particularly when using OOP, are too quick to try and "break apart" interdependent code in a way which makes development much slower without actually providing much flexibility, but thousand-line files with global variables everywhere are hard to work on.
+
+If you have been paying any attention to anything within the past [two years](https://openai.com/blog/openai-codex) or so, you're probably also aware that AI (specifically large language models) will obsolete, augment, change, or do nothing whatsoever to software engineering jobs. My previous list provides some perspective for this: ChatGPT (GPT-3.5 versions; I haven't used the GPT-4 one) can model computers well enough that it can [pretend to be a Linux shell](https://www.engraved.blog/building-a-virtual-machine-inside/) quite accurately, tracking decent amounts of state while it does so; big language models have vague knowledge of basically everything on the internet, even if they don't always connect it well; ChatGPT can [also](https://twitter.com/gf_256/status/1598104835848798208) find some vulnerabilities in code; [tool use](https://til.simonwillison.net/llms/python-react-pattern) [is continually](https://openai.com/blog/function-calling-and-other-api-updates?ref=upstract.com) [being](https://gorilla.cs.berkeley.edu/) [improved](https://twitter.com/emollick/status/1657050639644360706) (probably their quick-script-writing capability already exceeds most humans'). Not every capability is there yet, of course, and I think LLMs are significantly hampered by issues humans don't have, like context window limitations, lack of online learning, and bad planning ability, but these are probably not that fundamental.
+
+Essentially, your job is probably not safe, as long as development continues (and big organizations actually notice).
+
+You may contend that LLMs lack "general intelligence", and thus can't solve novel problems, devise clever new algorithms, etc. I don't think this is exactly right (it's probably a matter of degree rather than binary), but my more interesting objection is that most code doesn't involve anything like that. Most algorithmic problems have already been solved somewhere if you can frame them right (which is, in fairness, also a problem of intelligence, but less so than deriving the solution from scratch), and LLMs probably remember more algorithms than you. More than that, however, most code doesn't even involve sophisticated algorithms: it just has to move some data around or convert between formats or call out to libraries or APIs in the right order or process some forms. I don't really like writing that and try to minimize it, but this only goes so far. You may also have a stronger objection along the line of "LLMs are just stochastic parrots repeating patterns in their training data": this is wrong, and you may direct complaints regarding this to the comments or [microblog](https://b.osmarks.net/), where I will probably ignore them.
--- a/package-lock.json
+++ b/package-lock.json
@ -16,6 +16,7 @@
        "handlebars": "^4.7.6",
        "html-minifier": "^4.0.0",
        "markdown-it": "^13.0.1",
+        "markdown-it-anchor": "^8.6.7",
        "markdown-it-footnote": "^3.0.3",
        "mustache": "^4.0.1",
        "nanoid": "^2.1.11",
@ -57,6 +58,28 @@
        "node": ">=6.9.0"
      }
    },
+    "node_modules/@types/linkify-it": {
+      "version": "3.0.2",
+      "resolved": "https://registry.npmjs.org/@types/linkify-it/-/linkify-it-3.0.2.tgz",
+      "integrity": "sha512-HZQYqbiFVWufzCwexrvh694SOim8z2d+xJl5UNamcvQFejLY/2YUtzXHYi3cHdI7PMlS8ejH2slRAOJQ32aNbA==",
+      "peer": true
+    },
+    "node_modules/@types/markdown-it": {
+      "version": "13.0.1",
+      "resolved": "https://registry.npmjs.org/@types/markdown-it/-/markdown-it-13.0.1.tgz",
+      "integrity": "sha512-SUEb8Frsxs3D5Gg9xek6i6EG6XQ5s+O+ZdQzIPESZVZw3Pv3CPQfjCJBI+RgqZd1IBeu18S0Rn600qpPnEK37w==",
+      "peer": true,
+      "dependencies": {
+        "@types/linkify-it": "*",
+        "@types/mdurl": "*"
+      }
+    },
+    "node_modules/@types/mdurl": {
+      "version": "1.0.2",
+      "resolved": "https://registry.npmjs.org/@types/mdurl/-/mdurl-1.0.2.tgz",
+      "integrity": "sha512-eC4U9MlIcu2q0KQmXszyn5Akca/0jrQmwDRgpAMJai7qBWq4amIQhZyNau4VYGtCeALvW1/NtjzJJ567aZxfKA==",
+      "peer": true
+    },
    "node_modules/acorn": {
      "version": "7.4.1",
      "resolved": "https://registry.npmjs.org/acorn/-/acorn-7.4.1.tgz",
@ -615,6 +638,15 @@
        "markdown-it": "bin/markdown-it.js"
      }
    },
+    "node_modules/markdown-it-anchor": {
+      "version": "8.6.7",
+      "resolved": "https://registry.npmjs.org/markdown-it-anchor/-/markdown-it-anchor-8.6.7.tgz",
+      "integrity": "sha512-FlCHFwNnutLgVTflOYHPW2pPcl2AACqVzExlkGQNsi4CJgqOHN7YTgDd4LuhgN1BFO3TS0vLAruV1Td6dwWPJA==",
+      "peerDependencies": {
+        "@types/markdown-it": "*",
+        "markdown-it": "*"
+      }
+    },
    "node_modules/markdown-it-footnote": {
      "version": "3.0.3",
      "resolved": "https://registry.npmjs.org/markdown-it-footnote/-/markdown-it-footnote-3.0.3.tgz",
@ -1052,6 +1084,28 @@
        "to-fast-properties": "^2.0.0"
      }
    },
+    "@types/linkify-it": {
+      "version": "3.0.2",
+      "resolved": "https://registry.npmjs.org/@types/linkify-it/-/linkify-it-3.0.2.tgz",
+      "integrity": "sha512-HZQYqbiFVWufzCwexrvh694SOim8z2d+xJl5UNamcvQFejLY/2YUtzXHYi3cHdI7PMlS8ejH2slRAOJQ32aNbA==",
+      "peer": true
+    },
+    "@types/markdown-it": {
+      "version": "13.0.1",
+      "resolved": "https://registry.npmjs.org/@types/markdown-it/-/markdown-it-13.0.1.tgz",
+      "integrity": "sha512-SUEb8Frsxs3D5Gg9xek6i6EG6XQ5s+O+ZdQzIPESZVZw3Pv3CPQfjCJBI+RgqZd1IBeu18S0Rn600qpPnEK37w==",
+      "peer": true,
+      "requires": {
+        "@types/linkify-it": "*",
+        "@types/mdurl": "*"
+      }
+    },
+    "@types/mdurl": {
+      "version": "1.0.2",
+      "resolved": "https://registry.npmjs.org/@types/mdurl/-/mdurl-1.0.2.tgz",
+      "integrity": "sha512-eC4U9MlIcu2q0KQmXszyn5Akca/0jrQmwDRgpAMJai7qBWq4amIQhZyNau4VYGtCeALvW1/NtjzJJ567aZxfKA==",
+      "peer": true
+    },
    "acorn": {
      "version": "7.4.1",
      "resolved": "https://registry.npmjs.org/acorn/-/acorn-7.4.1.tgz",
@ -1473,6 +1527,12 @@
        }
      }
    },
+    "markdown-it-anchor": {
+      "version": "8.6.7",
+      "resolved": "https://registry.npmjs.org/markdown-it-anchor/-/markdown-it-anchor-8.6.7.tgz",
+      "integrity": "sha512-FlCHFwNnutLgVTflOYHPW2pPcl2AACqVzExlkGQNsi4CJgqOHN7YTgDd4LuhgN1BFO3TS0vLAruV1Td6dwWPJA==",
+      "requires": {}
+    },
    "markdown-it-footnote": {
      "version": "3.0.3",
      "resolved": "https://registry.npmjs.org/markdown-it-footnote/-/markdown-it-footnote-3.0.3.tgz",
--- a/package.json
+++ b/package.json
@ -11,6 +11,7 @@
    "handlebars": "^4.7.6",
    "html-minifier": "^4.0.0",
    "markdown-it": "^13.0.1",
+    "markdown-it-anchor": "^8.6.7",
    "markdown-it-footnote": "^3.0.3",
    "mustache": "^4.0.1",
    "nanoid": "^2.1.11",
--- a/src/global.json
+++ b/src/global.json
@ -1,7 +1,22 @@
 {
    "name": "osmarks' website",
    "domain": "osmarks.net",
-    "siteDescription": "osmarks is all. osmarks is everywhere. osmarks is truth. osmarks is light. osmarks is on the internet, on this website!",
+    "taglines": [
+        "I can be trusted with computational power and hyperstitious memetic warfare.",
+        "Wheels are turning. Wheels within wheels within wheels.",
+        "The Internet.",
+        "If you're reading this, we own your soul.",
+        "The future is already here - it's just not evenly distributed.",
+        "I don't always believe in things, but when I do, I believe in them alphabetically.",
+        "In which I'm very annoyed at a wide range of abstract concepts.",
+        "Now with handmade artisanal 1 bits!",
+        "What part of ∀f ∃g (f (x,y) = (g x) y) did you not understand?",
+        "Semi-trained quasi-professionals.",
+        "Proxying NVMe cloud-scale hyperlink...",
+        "There's nothing in the rulebook that says a golden retriever can't construct a self-intersecting non-convex regular polygon.",
+        "Part of the solution, not the precipitate.",
+        "If you can't stand the heat, get out of the server room."
+    ],
    "feeds": [
        "https://blogs.sciencemag.org/pipeline/feed",
        "https://www.rtl-sdr.com/feed/",
@ -11,5 +26,6 @@
        "https://qntm.org/rss.php",
        "https://aphyr.com/posts.atom",
        "https://os.phil-opp.com/rss.xml"
-    ]
+    ],
+    "dateFormat": "YYYY-MM-DD"
 }
--- a/src/index.js
+++ b/src/index.js
@ -32,6 +32,9 @@ const outDir = path.join(root, "out")
 const buildID = nanoid()
 globalData.buildID = buildID

+const randomPick = xs => xs[Math.floor(Math.random() * xs.length)]
+globalData.siteDescription = randomPick(globalData.taglines)
+
 const hexPad = x => Math.round(x).toString(16).padStart(2, "0")
 function hslToRgb(h, s, l) {
    var r, g, b;
@ -67,7 +70,14 @@ const hashColor = (x, s, l) => {
 const removeExtension = x => x.replace(/\.[^/.]+$/, "")

 const readFile = path => fsp.readFile(path, { encoding: "utf8" })
-const md = new MarkdownIt({ html: true }).use(require("markdown-it-footnote"))
+const anchor = require("markdown-it-anchor")
+const md = new MarkdownIt({ html: true })
+    .use(require("markdown-it-footnote"))
+    .use(anchor, {
+        permalink: anchor.permalink["headerLink"]({
+            symbol: "§"
+        })
+    })
 const minifyHTML = x => htmlMinifier(x, {
    collapseWhitespace: true,
    sortAttributes: true,
@ -127,14 +137,17 @@ const loadDir = async (dir, func) => {

 const applyTemplate = async (template, input, getOutput, options = {}) => {
    const page = parseFrontMatter(await readFile(input))
-    if (options.processMeta) { options.processMeta(page.data) }
-    if (options.processContent) { page.content = options.processContent(page.content) }
+    if (options.processMeta) { options.processMeta(page.data, page) }
+    if (options.processContent) { page.originalContent = page.content; page.content = options.processContent(page.content) }
    const rendered = template({ ...globalData, ...page.data, content: page.content })
    await fsp.writeFile(await getOutput(page), minifyHTML(rendered))
+    page.data.full = page
    return page.data
 }

-const addColors = R.map(x => ({ ...x, bgcol: hashColor(x.title, 1, 0.9) }))
+const addColor = x => {
+    x.bgcol = hashColor(x.title, 1, 0.9)
+}
 const addGuids = R.map(x => ({ ...x, guid: uuid.v5(`${x.lastUpdate}:${x.slug}`, "9111a1fc-59c3-46f0-9ab4-47c607a958f2") }))

 const processExperiments = async () => {
@ -154,10 +167,12 @@ const processExperiments = async () => {
                }))
                return path.join(out, "index.html")
            },
-            { processMeta: meta => { meta.slug = meta.slug || basename }})
+            { processMeta: meta => {
+                meta.slug = meta.slug || basename
+                addColor(meta) }})
    })
    console.log(chalk.yellow(`${Object.keys(experiments).length} experiments`))
-    globalData.experiments = addColors(R.sortBy(x => x.title, R.values(experiments)))
+    globalData.experiments = R.sortBy(x => x.title, R.values(experiments))
 }

 const processBlog = async () => {
@ -167,10 +182,14 @@ const processBlog = async () => {
            const out = path.join(outDir, page.data.slug)
            await fse.ensureDir(out)
            return path.join(out, "index.html")
-        }, { processMeta: meta => { meta.slug = meta.slug || removeExtension(basename) }, processContent: renderMarkdown })
+        }, { processMeta: (meta, page) => {
+            meta.slug = meta.slug || removeExtension(basename)
+            meta.wordCount = page.content.split(/\s+/).map(x => x.trim()).filter(x => x).length
+            addColor(meta)
+        }, processContent: renderMarkdown })
    })
    console.log(chalk.yellow(`${Object.keys(blog).length} blog entries`))
-    globalData.blog = addGuids(addColors(R.sortBy(x => x.updated ? -x.updated.valueOf() : 0, R.values(blog))))
+    globalData.blog = addGuids(R.sortBy(x => x.updated ? -x.updated.valueOf() : 0, R.values(blog)))
 }

 const processErrorPages = () => {
@ -184,7 +203,15 @@ const processErrorPages = () => {

 const outAssets = path.join(outDir, "assets")

-globalData.renderDate = date => date.format("DD/MM/YYYY")
+globalData.renderDate = date => date.format(globalData.dateFormat)
+const metricPrefixes = ["", "k", "M", "G", "T", "P", "E", "Z", "Y"]
+const applyMetricPrefix = (x, unit) => {
+    let log = Math.log10(x)
+    let exp = x !== 0 ? Math.floor(log / 3) : 0
+    let val = x / Math.pow(10, exp * 3)
+    return (exp !== 0 ? val.toFixed(3 - (log - exp * 3)) : val) + metricPrefixes[exp] + unit
+}
+globalData.metricPrefix = applyMetricPrefix

 const writeBuildID = () => fsp.writeFile(path.join(outDir, "buildID.txt"), buildID)
 const index = async () => {
@ -248,6 +275,9 @@ const copyAsset = subpath => fse.copy(path.join(assetsDir, subpath), path.join(o

 const doImages = async () => {
    copyAsset("images")
+    copyAsset("titillium-web.woff2")
+    copyAsset("titillium-web-semibold.woff2")
+    copyAsset("share-tech-mono.woff2")
    globalData.images = {}
    for (const image of await fse.readdir(path.join(assetsDir, "images"), { encoding: "utf-8" })) {
        globalData.images[image.split(".").slice(0, -1).join(".")] = "/assets/images/" + image
--- a/style.sass
+++ b/style.sass
@ -1,6 +1,34 @@
+@font-face
+    font-family: 'Titillium Web'
+    font-style: normal
+    font-weight: 400
+    font-display: swap
+    src: url(/assets/titillium-web.woff2) format('woff2')
+    unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD
+
+@font-face
+    font-family: 'Titillium Web'
+    font-style: normal
+    font-weight: 600
+    font-display: swap
+    src: url(/assets/titillium-web-semibold.woff2) format('woff2')
+    unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD
+
+@font-face
+    font-family: 'Share Tech Mono'
+    font-style: normal
+    font-weight: 400
+    font-display: swap
+    src: url(/assets/share-tech-mono.woff2) format('woff2')
+    unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD
+
 body
    margin: 0
-    font-family: 'Fira Sans', 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif
+    font-family: 'Titillium Web', 'Fira Sans', sans-serif
+    line-height: 1.3
+
+pre, code, .deemph
+    font-family: 'Share Tech Mono', monospace

 a
    text-decoration: none
@ -12,6 +40,8 @@ nav
    padding: 1em
    margin-bottom: 0.5em
    background: black
+    overflow-x: scroll
+    font-size: 1.1em

    .logo
        width: 1.5rem
@ -29,13 +59,16 @@ nav
    a, img
        margin-right: 0.5em
        
-    @for $i from 1 through 4
+    @for $i from 1 through 6
        a:nth-child(#{$i + 1})
-            color: hsl(180 + ($i * 30), 100%, 80%)
+            color: hsl(120 + ($i * 30), 100%, 80%)

 h1, h2, h3, h4, h5, h6
    margin: 0
-    font-weight: 400
+    font-weight: 600
+    a
+        text-decoration: none !important
+        color: inherit

 main, .header
    margin-left: 1em
@ -57,8 +90,8 @@ main.blog-post
    > div
        min-width: 20em
        background: #eee
-        margin: 0.25em
-        padding: 0.5em
+        margin: 0.5em
+        padding: 1em
        flex: 1 1 20%

 main
@ -118,3 +151,16 @@ button, select, input, textarea, .textarea
        padding-right: 1em
        height: 8em
        width: 8em
+
+    .title
+        font-size: 1.1em
+        font-weight: 600
+
+.caption
+    width: calc(100% - 2em)
+    background: lightgray
+    border: 1px solid black
+    padding: 1em
+    margin: -1px
+    img
+        width: 100%
--- a/templates/index.pug
+++ b/templates/index.pug
@ -4,7 +4,7 @@ block content
    main
        h2 Blog
        p.
-            Stuff I say, conveniently accessible on the internet.
+            Read my opinions via the internet.
        div.blog
            each post in posts
                .imbox(style=`background: ${post.bgcol}`)
@ -13,12 +13,12 @@ block content
                    div
                        div
                            a.title(href=`/${post.slug}/`)= post.title
-                        small= renderDate(post.updated)
+                        span.deemph= `${renderDate(post.created)} / ${metricPrefix(post.wordCount, "")} words`
                        div.description!= post.description

        h2 Experiments
        p.
-            Various random somewhat useless web projects I have put together over many years. Made with at least four different JS frameworks.
+            Various web projects I have put together over many years. Made with at least four different JS frameworks. Some of them are bad.
        div.experiments
            each experiment in experiments
                .imbox(style=`background: ${experiment.bgcol}`)
@ -30,8 +30,11 @@ block content
                        span.description!= experiment.description

        p Get updates to the blog (not experiments) in your favourite RSS reader using the <a href="/rss.xml">RSS feed</a>.
-        p View some of my projects (and whatever else) at 
+        p View some of my projects at 
            a(href=`https://git.${domain}/`) my git hosting.

        .ring!= openring
-        iframe(src="https://george.gh0.pw/embed.cgi?gollark", style="border:none;width:100%;height:50px", title="Acquiesce to GEORGE.")
+        iframe(src="https://george.gh0.pw/embed.cgi?gollark", style="border:none;width:100%;height:50px", title="Acquiesce to GEORGE.")
+
+block under-title
+    h2= name
--- a/templates/layout.pug
+++ b/templates/layout.pug
@ -25,13 +25,23 @@ html(lang="en")
            +nav-item(`https://i.${domain}/`, "Images")
            +nav-item("https://github.com/osmarks/website", "Contribute")
            +nav-item(`https://b.${domain}`, "Microblog")
+            +nav-item(`https://status.${domain}`, "Status")
+            +nav-item(`https://r.${domain}/login`, "Login")
            block nav-items
        .header
            h1.page-title= title
-            if updated
-                h3= `Updated ${renderDate(updated)}`
-            if created
-                h3= `Created ${renderDate(created)}`
+            block under-title
+            h3.deemph
+                if updated
+                    span= `Updated ${renderDate(updated)}`
+                    if created || wordCount
+                        span= " / "
+                if created
+                    span= `Created ${renderDate(created)}`
+                    if wordCount
+                        span= " / "
+                if wordCount 
+                    span= `${metricPrefix(wordCount, "")} words`
            if description
                em.description!= description
        block content