changes of some sort, new blog post

2025-05-09 10:54:08 +00:00 · 2025-01-09 21:06:14 +00:00 · 2025-01-09 21:06:14 +00:00 · d045a02a08
commit d045a02a08
parent 1c1648e0ff
11 changed files with 55 additions and 14 deletions
--- a/assets/images/bitalg.png.original
+++ b/assets/images/bitalg.png.original
--- a/assets/images/button_zeroptr.gif
+++ b/assets/images/button_zeroptr.gif
--- a/blog/about.md
+++ b/blog/about.md
@ -12,6 +12,7 @@ Reality does not bow to expectations. Expectations merely mask what is. This is
 I'm osmarks, also known as gollark in some places.
 I enjoy bending the world to my will and creating confusing and/or funny things through applied computer science and mathematics, among other things.
 You can contact me through [email](mailto:me@osmarks.net), <span class="hoverdefn" title="My username is 'gollark'">Discord</span>, <a href="https://apionet.gh0.pw/">IRC</a> sometimes, <span class="hoverdefn" title="please note that the monitoring sampling interval is 15 seconds">Morse code transmitted via HTTP request volumes</span>, carrier pigeon, or [ActivityPub (Mastodon)](https://b.osmarks.net/).
+Friend me on [Project Euler](https://projecteuler.net/friends)! My code is `1997726_vJHwaMCvTGs6hTAsxuhvUtwuF7TMgkBK`.

 ## My work

--- a/blog/ai-accelerator.md
+++ b/blog/ai-accelerator.md
@ -35,7 +35,7 @@ Excluding a few specialized and esoteric products like [Lightmatter](https://en.

 Better manufacturing processes make transistors smaller, faster and lower-power, with the downside that a full wafer costs more. Importantly, though, not everything scales down the same - recently, SRAM has almost entirely stopped getting smaller[^7], and analog has not scaled well for some time. Only logic is still shrinking fast.

-It's only possible to fit about 1GB of SRAM onto a die, even if you are using all the die area and the maximum single-die size. Obviously, modern models are larger than this, and it wouldn't be economical to do this anyway. The solution used by most accelerators is to use external DRAM (dynamic random access memory). This is much cheaper and more capacious, at the cost of worse bandwidth and greater power consumption. Generally this will be HBM (high-bandwidth memory, which is more expensive and integrated more closely with the logic via advanced packaging), or some GDDR/LPDDR variant.
+It's only possible to fit about 2GB of SRAM onto a die, even if you are using all the die area and the maximum single-die size. Obviously, modern models are larger than this, and it wouldn't be economical to do this anyway. The solution used by most accelerators is to use external DRAM (dynamic random access memory). This is much cheaper and more capacious, at the cost of worse bandwidth and greater power consumption. Generally this will be HBM (high-bandwidth memory, which is more expensive and integrated more closely with the logic via advanced packaging), or some GDDR/LPDDR variant.

 Another major constraint is power use, which directly contributes to running costs and cooling system complexity. Transistors being present and powered consumes power (static/leakage power) and transistors switching on and off consumes power (dynamic/switching power). The latter scales superlinearly with clock frequency, which is inconvenient, since performance scales slightly sublinearly with clock frequency. A handy Google paper[^8] (extending work from 2014[^9]), worth reading in its own right, provides rough energy estimates per operation, though without much detail about e.g. clock frequency:

--- a/blog/bitter-algebra.md
+++ b/blog/bitter-algebra.md
@ -0,0 +1,35 @@
+---
+title: Bitter-lesson computer algebra
+description: Computer algebra systems leave lots to the user and require task-specific manual design. Can we do better?
+slug: bitalg
+created: 09/01/2025
+---
+::: epigraph attribution=Viliam link="https://www.astralcodexten.com/p/are-woo-non-responders-defective/comment/16746415"
+Who does math these days anyway? If you keep saying it, GPT-5 will learn it as a fact, and if someone asks it to design an ancestor simulation, it will include it among its rules.
+:::
+
+Computer algebra systems have been one of the great successes of symbolic AI. While the real world contains infinite detail and illegible nuance which is hard to capture explicitly, a mathematical object is no more and no less than its definition. But they're still limited: mathematicians usually have to design new theory and/or go to significant manual effort for every new class of object they want a CAS to operate on[^1], and significant manual direction is still required to complete complex computations. In machine learning, it's often been possible to [substitute search and learning for domain expertise](http://www.incompleteideas.net/IncIdeas/BitterLesson.html), with very good results. Is this applicable here?
+
+The core of computer algebra systems is a rewrite rule engine: this takes an expression and applies substitutions to it. For example, applying the rules `x * 1 = x` and `a * (b+c) = a * b + a * c` (parsed using standard operator precedence) to `a * (x + 1)` yields `a * x + a`. [Mathematica](https://reference.wolfram.com/language/guide/RulesAndPatterns.html) is the best-known example of this, [ExpReduce](https://github.com/corywalker/expreduce) is an open-source implementation, and [osmarkscalculator](/osmarkscalculator/) is a janky prototype by me. The problem with this is that equalities are two-way. Consider the rule `a^b#Num#Gte[b, 2] = a*a^(b-1)`[^3], which is how osmarkscalculator expands powers (`(x+1)^3` becomes `(x+1)*(x+1)^2` becomes `(x+1)*(x+1)*(x+1)` which expands to `1+2*x+3*x^2+x^3+x`[^2] by distributivity): this is always *true* (as long as powers are well-behaved), but it's not always *useful*. This has the effect of trying to maximally expand expressions, which is a possible canonical form but often worse to read and worse for e.g. division and solution of polynomials. Putting this rule and its inverse into the same ruleset leads to an infinite loop: this is traditionally solved by having separate rulesets for each direction with separate operations to apply them (e.g. separate `Factor` and `Expand` operations), but this offloads problems to the user and means that every other operation has to explicitly decide on its own canonical forms and convert inputs into those. In principle, we can fix this by building a "make this look like this" operation which tries to apply transformations in either direction to make something fit a pattern, converting this to "merely" a search problem.
+
+The search problem is quite hard, however. Efficient search relies on a distance heuristic: [A* search](https://en.wikipedia.org/wiki/A*_search_algorithm) is effective for problems like pathfinding over grids because Euclidean distance lower-bounds the number of tiles stepped through, and fast [approximate nearest neighbour](https://en.wikipedia.org/wiki/Nearest_neighbor_search) algorithms rely on (roughly) the transitivity of "closeness". Available theory doesn't provide a natural, formal heuristic for expression distance: various [string edit distances](https://en.wikipedia.org/wiki/Levenshtein_distance) are available, but they aren't really valid as they're unaware of what rewrite rules exist, and directly computing an edit distance based on general rewrite rules is essentially the same as the search problem. Without this, we're limited to brute force.
+
+There is a well-developed [theory](https://en.wikipedia.org/wiki/Confluence_(abstract_rewriting)) of rewrite systems, which has produced the [Knuth-Bendix](https://en.wikipedia.org/wiki/Knuth%E2%80%93Bendix_completion_algorithm) algorithm, which is able to turn a set of equations (bidirectional rewrite rules) into a set of one-directional rewrite rules which produce the same result applied in any order. This is, however, a *semi-decision* algorithm: this is not always possible, and where this is the case the algorithm will never halt. This line of research [has also shown](https://en.wikipedia.org/wiki/Word_problem_(mathematics)) that for some possible rewrite systems, it is not even possible to tell whether two expressions are equivalent (reachable via rewrites). This doesn't imply that every useful rewrite system is problematic, but we [do know](https://en.wikipedia.org/wiki/Richardson%27s_theorem) that fairly simple expressions over real numbers are broken, and halting-problem-like behaviour has generally ruined our fun in all reasonably powerful systems.
+
+If a natural heuristic isn't available, we must simply create one, by using deep learning to approximate the functions we wish we could have. The approach used to significant success in [AlphaProof](https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/) is, according to the scant public descriptions, based on [AlphaZero](https://arxiv.org/abs/1712.01815). It works over formal proofs (in [Lean](https://lean-lang.org/)) rather than expression rewrites, and bootstraps a policy neural net (which generates a distribution over next steps) and value neural net (which rates probability of success) by using <span class="hoverdefn" title="Monte Carlo Tree Search">MCTS</span> to "amplify" the policy. This was, apparently, quite computationally expensive, and is perhaps not very practical for cheap home use. However, our potential design space is broader than this.
+
+Lean is a complex and heavy programming language, so executing a Lean program is expensive compared to neural net forward passes, so AlphaProof has to use a relatively large neural net to match up inference time and proof validation time[^4]. This may not be true of a well-written symbolic expression manipulator. There's a clever and little-known program called [RIES](https://www.mrob.com/pub/ries/index.html), which finds equations given their solutions (the opposite of a normal numerical evaluator). Using caching (roughly) and a heavily optimized expression generator/evaluator which essentially brute-forces short [RPN](https://en.wikipedia.org/wiki/Reverse_Polish_notation) programs sorted by complexity, it can generate and evaluate several billion expressions a second (and differentiate top candidates). While a flexible pattern matcher would have to be slower, it is likely possible to outperform a standard CAS with explicit design for performance and externalization of ease-of-use features. Larger-scale search with a worse neural-net heuristic trades expensive GPU time for cheaper GPU time and might still be competitive, as StockFish is in chess versus more heavily neural-net-driven engines (though this is down to properties of chess's search trees and is not, to my knowledge, true in e.g. Go).
+
+There has also been some work in the other direction - more neural net use and less search. [Deep Learning for Symbolic Mathematics](https://arxiv.org/abs/1912.01412) trains standard sequence-to-sequence models to do integration and solve ODEs with apparently good results (compared with traditional computer algebra systems). However, it relies on external methods to generate good training datsets (e.g. for integrals, differentiating random expressions and integration by parts). [PatternBoost](https://arxiv.org/abs/2411.00566) constructs extremal objects by repeatedly retraining an autoregressive model on the best of its outputs augmented with greedy search, and was good enough to solve an open problem despite little compute, somewhat cursed tokenization, and treating the problem as unconditionally sampling from "reasonably good" examples. Some [recent work in reinforcement learning](https://arxiv.org/abs/2408.05804) has also demonstrated directed exploration emerging from a simple contrastive algorithm which could be useful - PatternBoost-style construction could be reformulated as an RL problem amenable to this[^5], and a contrastive goal expression/current-expression-and-rule encoder is almost exactly the necessary distance heuristic I mentioned earlier.
+
+Despite these possible applications, and many successes in research, nothing like this has made it into generally usable tools, which are either based on purely symbolic heuristics or general-purpose LLMs trained slightly for this. It should be possible to build better general-purpose symbolic manipulation primitives, closer to the ideal of a system which only needs to be given definitions to work, and to do better in specific tasks like integration. I intend to work on this at some point, but am currently busy with other projects. Please tell me if you're interested in any of this!
+
+[^1]: For instance, the best system I'm aware of for integration is [Rubi](https://rulebasedintegration.org/), which contains 6700 handcrafted rules. Mathematica contains [hundreds of pages](https://reference.wolfram.com/language/tutorial/SomeNotesOnInternalImplementation.html) of code for many features. Adding support for something like generating function manipulation or computing limits or unnesting nth roots to CASes is often enough to warrant a PhD thesis.
+
+[^2]: Yes, this isn't actually the fully expanded expression. osmarkscalculator has some problems.
+
+[^3]: `#Num#Gte[b, 2]` is a bit of a hack to stop it infinitely recursing into negative powers.
+
+[^4]: I am not certain of this because Google doesn't document it, but a similar paper by [Facebook](https://arxiv.org/abs/2205.11491) uses 48 CPU cores per A100 GPU with bigger models than are usually used for MCTS, and I assume Google has *some* constraints.
+
+[^5]: There are some plausible technical issues like usefully defining the goal state encoder when your only knowledge of the goal is that it should have a bigger value by some fitness metric, but I think these are surmountable.
--- a/blog/minecraft-power-creep.md
+++ b/blog/minecraft-power-creep.md
@ -18,7 +18,7 @@ A very small BuildCraft pipe system.
 Organizing items: previously a hard enough problem that [blog posts](https://gamegenus.blogspot.com/2012/01/minecraft-sorting-and-smelting-system.html) were written on optimizing it.
 :::

-Somewhat later, [Feed The Beast](https://www.feed-the-beast.com/modpacks/76-ftb-ultimate?tab=mods) (1.4.7) brought, among *many* other things, Applied Energistics. [Applied Energistics](https://web.archive.org/web/20130301012622/http://ae-mod.info/ME-Storage-and-Automation/) replaced the complexity of pipe-based sorting and storage systems with a trivially extensible cabled network which also handled storage and on-demand autocrafting. This means that rather than requiring buildout of dedicated automation systems for every item and making everything else by hand, a much smaller set of machines could be shared between many recipes and only be invoked when needed[^1].
+Somewhat later, [Feed The Beast Ultimate](https://www.feed-the-beast.com/modpacks/76-ftb-ultimate?tab=mods) (1.4.7) brought, among *many* other things, Applied Energistics. [Applied Energistics](https://web.archive.org/web/20130301012622/http://ae-mod.info/ME-Storage-and-Automation/) replaced the complexity of pipe-based sorting and storage systems with a trivially extensible cabled network which also handled storage and on-demand autocrafting. This means that rather than requiring buildout of dedicated automation systems for every item and making everything else by hand, a much smaller set of machines could be shared between many recipes and only be invoked when needed[^1].

 Applied Energistics' developer seemingly considered this a mistake, because around the time of [1.7.10](https://www.feed-the-beast.com/modpacks/23-ftb-infinity-evolved-17?tab=mods), the mod was overhauled with "channels", which encourage more thoroughly planned and complex network designs, as well as moving away from from standard "have ores in the ground" world generation to include rare meteors containing "alien processor presses" necessary to start. The channel system also encouraged creative subnetwork design for certain problems and the mod incentivized organizing storage in certain ways which nobody ever did[^2] to optimize space use. Slightly earlier, Thermal Expansion introduced its "itemducts", essentially a simplified version of the RedPower pneumatic tubes which didn't need external support equipment, and [Ender IO](https://www.curseforge.com/minecraft/mc-mods/ender-io) has roughly contemporaneous conduits which simplify routing even further by fitting multiple pipes into a single block.

@ -64,7 +64,7 @@ One of these machines obsoletes a vast quantity of smaller generators and suppor
 As a reaction to perceived power creep, [Immersive Engineering](https://www.curseforge.com/minecraft/mc-mods/immersive-engineering) was released for 1.7.10 too; it has a slightly more complex RF-based power system and, to counter the "magic blocks" phenomenon, uses multiblocks instead. I don't consider this a good solution, as it really only alters the aesthetics and increases resource costs.

 ::: captioned src="/assets/images/immersive_engineering_biodiesel.webp
-This biodiesel plant is bigger than other mods might make it, but it has about five distinct machines in it.
+This biodiesel plant is bigger than other mods might make it, but it has about five distinct machines in it. The use of multiblocks adds "fake complexity": the player doesn't gain any new decisions to make, since the machines can only be assembled in one way and aren't made of multipurpose individual parts.
 :::

 While there have been numerous incremental changes since then - primarily in the direction of bigger numbers, utility mods adding more special-purpose machines to automate very specific tasks and fragmentation into smaller mods - the only big change I am aware of has been [Create](https://www.curseforge.com/minecraft/mc-mods/create), a very much ground-up tech mod in wide use since 1.14 which aims to make automation involve more world interaction and (literal) moving parts.
--- a/blog/ml-workstation.md
+++ b/blog/ml-workstation.md
@ -125,7 +125,7 @@ They describe somewhat horrifying electrical engineering problems due to using s

 [^2]: High-performance compute hardware is still not cheap in an absolute sense, and for infrequent loads you are likely better off with [cloud services](https://vast.ai/).

-[^3]: I'm told it works fine on their latest datacentre cards. You are not getting those. You aren't even renting those, for some reason.
+[^3]: I'm told it works ~~fine~~ slightly better on their latest datacentre cards. You are not getting those. You aren't even renting those, for some reason.

 [^4]: Intel's is arguably better on consumer hardware than datacentre, as their datacentre hardware doesn't work.

--- a/package-lock.json
+++ b/package-lock.json
@ -1714,9 +1714,9 @@
      "integrity": "sha512-gKLcREMhtuZRwRAfqP3RFW+TK4JqApVBtOIftVgjuABpAtpxhPGaDcfvbhNvD0B8iD1oUr/txX35NjcaY6Ns/A=="
    },
    "node_modules/msgpackr": {
-      "version": "1.11.0",
-      "resolved": "https://registry.npmjs.org/msgpackr/-/msgpackr-1.11.0.tgz",
-      "integrity": "sha512-I8qXuuALqJe5laEBYoFykChhSXLikZmUhccjGsPuSJ/7uPip2TJ7lwdIQwWSAi0jGZDXv4WOP8Qg65QZRuXxXw==",
+      "version": "1.11.2",
+      "resolved": "https://registry.npmjs.org/msgpackr/-/msgpackr-1.11.2.tgz",
+      "integrity": "sha512-F9UngXRlPyWCDEASDpTf6c9uNhGPTqnTeLVt7bN+bU1eajoR/8V9ys2BRaV5C/e5ihE6sJ9uPIKaYt6bFuO32g==",
      "license": "MIT",
      "optionalDependencies": {
        "msgpackr-extract": "^3.0.2"
@ -3569,9 +3569,9 @@
      "integrity": "sha512-gKLcREMhtuZRwRAfqP3RFW+TK4JqApVBtOIftVgjuABpAtpxhPGaDcfvbhNvD0B8iD1oUr/txX35NjcaY6Ns/A=="
    },
    "msgpackr": {
-      "version": "1.11.0",
-      "resolved": "https://registry.npmjs.org/msgpackr/-/msgpackr-1.11.0.tgz",
-      "integrity": "sha512-I8qXuuALqJe5laEBYoFykChhSXLikZmUhccjGsPuSJ/7uPip2TJ7lwdIQwWSAi0jGZDXv4WOP8Qg65QZRuXxXw==",
+      "version": "1.11.2",
+      "resolved": "https://registry.npmjs.org/msgpackr/-/msgpackr-1.11.2.tgz",
+      "integrity": "sha512-F9UngXRlPyWCDEASDpTf6c9uNhGPTqnTeLVt7bN+bU1eajoR/8V9ys2BRaV5C/e5ihE6sJ9uPIKaYt6bFuO32g==",
      "requires": {
        "msgpackr-extract": "^3.0.2"
      }
--- a/src/global.json
+++ b/src/global.json
@ -50,8 +50,8 @@
        "Cameron Harwick": "https://cameronharwick.com/feed/",
        "Money Stuff": "https://www.bloomberg.com/opinion/authors/ARbTQlRLRjE/matthew-s-levine.rss",
        "The Worlds of John Bierce": "https://johnbierce.com/blog/feed/",
-        "Dominic Cummings": "https://dominiccummings.substack.com/feed"
-
+        "Dominic Cummings": "https://dominiccummings.substack.com/feed",
+        "citrons": "https://citrons.xyz/a/journal/rss.xml"
    },
    "dateFormat": "YYYY-MM-DD",
    "microblogSource": "https://b.osmarks.net/outbox",
@ -82,7 +82,8 @@
        ["rss.png", "/rss.xml"],
        ["bee.png", "https://citrons.xyz/a/memetic-apioform-page.html"],
        ["perceptron.png", "https://en.wikipedia.org/wiki/Perceptron"],
-        ["rhombic_dodecahedron.gif", "https://en.wikipedia.org/wiki/Rhombic_dodecahedron"]
+        ["rhombic_dodecahedron.gif", "https://en.wikipedia.org/wiki/Rhombic_dodecahedron"],
+        ["zeroptr.gif", "https://zptr.cc/88x31/"]
    ],
    "mycorrhiza": "https://docs.osmarks.net"
 }
--- a/src/style.sass
+++ b/src/style.sass
@ -85,6 +85,9 @@ nav
    a, img, picture
        margin-right: 0.5em

+    a
+        margin-left: 0.5em
+
 h1, h2, h3, h4, h5, h6
    margin: 0
    font-weight: 600
--- a/templates/layout.pug
+++ b/templates/layout.pug
@ -31,6 +31,7 @@ html(lang="en")
        meta(content=`https://${domain}/assets/images/logo256.png`, property="og:image")
        if katex
            link(rel="stylesheet", href="/assets/katex.min.css")
+        link(rel="alternate", type="application/rss+xml", title="RSS", href="/rss.xml")
        style!= css
        if comments !== "off"
            script(src=`https://${domain}/rsapi/static/comments.js`, async=true)
@ -45,7 +46,7 @@ html(lang="en")
                +nav-item(`https://mse.${domain}/`, "Meme Search", "#5AF25A")
                +nav-item(`https://docs.${domain}/random`, "Documentation", "#F2A65A")
                +nav-item(`https://status.${domain}`, "Status", "#EEDC5B")
-                +nav-item(`https://r.${domain}/login`, "Login", "#12E193")
+                +nav-item(`#`, "Search", "#12E193")
                block nav-items
        .sidenotes-container
            main(class=!haveSidenotes ? "fullwidth" : "")