tweaks of some sort

2025-06-30 17:12:52 +00:00 · 2025-03-13 12:46:01 +00:00 · 2025-03-13 12:46:01 +00:00 · 1b8255118d
commit 1b8255118d
parent 0413b869e2
6 changed files with 70 additions and 13 deletions
--- a/blog/maghammer2.md
+++ b/blog/maghammer2.md
@ -37,7 +37,7 @@ This has the additional advantage of producing fewer sentence embedding vectors,
 This has now been replaced again with [ModernBERT-Embed-Large](https://huggingface.co/lightonai/modernbert-embed-large) for the greater context length, maybe better runtime and better retrieval performance. The long context leads to some VRAM issues with large batches, which I have not yet been able to resolve cleanly.
-Both models use a prefix to indicate whether an input is a query or a passage to match against, but the newer one seems to be more sensitive to them (or it could simply be the longer inputs), so I've also split columns into "short" and "long" to determine whether this prefixing mechanism is used for queries or not - without this, short passages are privileged, especially ones containing, for some ridiculous reason[^5], the literal text `passage`. This has its own problems, so I might need an alternative solution.
+The model uses a prefix to indicate whether an input is a query or a passage to match against, so I've also split columns into "short" and "long" to determine whether this prefixing mechanism is used for queries or not ~~- without this, short passages are privileged, especially ones containing, for some ridiculous reason[^5], the literal text `passage`. This has its own problems, so I might need an alternative solution.~~ ModernBERT-Embed-Large handles this better, so this mechanism is now legacy unless I also add a non-semantic index.
 ## The quantitative data is not all that helpful
--- a/blog/other-stuff.md
+++ b/blog/other-stuff.md
@ -288,6 +288,7 @@ The author, Zachary Mason, also wrote [The Lost Books of the Odyssey](https://ww
 * [The Finale of the Ultimate Meta Mega Crossover](https://m.fanfiction.net/s/5389450/1/).
 * [Shannon's Law](https://web.archive.org/web/20140628065855/http://www.tor.com/stories/2011/05/shannons-law).
 * [Dave Scum](https://docs.google.com/document/d/1SddGHeVfcVa5SCDHHTOA4RlKwnef-Q6IMw_Jqw9I0Mw/mobilebasic).
 * [Nate the Snake](https://natethesnake.com/) is a complex setup for a pun.
 * More to be added when I feel like it.
 ## Games
--- a/blog/superintelligence-lower-bounds.md
+++ b/blog/superintelligence-lower-bounds.md
@ -3,7 +3,6 @@ title: Against some assumed limits on superintelligence
 description: The TAM for God is very large.
 created: 02/03/2025
 slug: asi
 draft: yes
 ---
 ::: epigraph attribution="Void Star" link=/otherstuff/#void-star
 It’s not a trick. You’ll die if you go on, but it’s not a trick. I know something of your nature. Do you really want to go back to the decay of your biology and days like an endless reshuffling of a fixed set of forms? What the world has, you’ve seen. This is the only other way.
@ -36,15 +35,15 @@ As I address in the next section, capabilities can frequently go far beyond huma
 In open-ended adversarial and/or somewhat winner-takes-all domains like trading, there is no upper limit, only a succession of ever-escalating counterstrategies. You could reasonably argue that it wouldn't mean much for society as a whole if all humans were moved out of these, though. More compelling are situations where relatively minor-sounding improvements make something work as a practical product rather than a research curiosity, like [image recognition](https://paperswithcode.com/sota/image-classification-on-imagenet). The relatively minor-sounding 10-percentage-point improvement in ImageNet accuracy from 2015 to 2022 masks massive production deployment as image recognizers became smarter, cheaper, more robust and more practical. Protein modelling tools have crept up in capability over the last few years and now we can [design proteins](https://www.nature.com/articles/s41586-024-08393-x) tailored to particular tasks.
-This is most relevant when several areas advance at once, as you would expect from superintelligence. Quadcopter drones used to be impractical and/or expensive, but in the 2000s, mildly better batteries, cheaper MEMS IMUs (possibly because of smartphones) and smaller and better control/driver electronics (such as [specialized low-latency brushless motor drivers](/assets/misc/fascination_quadcopter.pdf#page=25)) brought them to hobbyists and then the general commercial market.
+This is most relevant when several areas advance at once, as you would expect from superintelligence. Quadcopter drones used to be impractical and/or expensive, but in the 2000s, mildly better batteries, cheaper <span class="hoverdefn" title="microelectromechanical systems">MEMS</span> sensors (possibly because of smartphones) and smaller and better control/driver electronics (such as [specialized low-latency brushless motor drivers](/assets/misc/fascination_quadcopter.pdf#page=25)) brought them to hobbyists and then the general commercial market.
-But focus on concrete tasks I can think of myself is rather missing the point. Doing things humans are currently doing but somewhat better is a waste of superintelligence. GPT-3 [was exciting](https://arxiv.org/abs/2005.14165) not because it pushed benchmark scores slightly further, and perplexity slightly lower, than GPT-2, but because it had the previously unconsidered ability to learn new tasks in context. The most important technological developments have been unexpected step changes opening up entirely new fields rather than "faster horses" incremental changes, even if they can be described in retrospect that way. The smartest humans are able to usefully reframe problems as easier ones rather than brute-force-execute a more obvious solution, and I expect this to continue.
+But focus on concrete tasks I can think of myself is rather missing the point. Doing things humans are currently doing but somewhat better is a waste of superintelligence. GPT-3 [was exciting](https://arxiv.org/abs/2005.14165) not because it pushed benchmark scores slightly further, and perplexity slightly lower, than GPT-2, but because it had the previously unconsidered ability to learn new tasks in context. The most important technological developments have been unexpected step changes opening up entirely new fields rather than "faster horses" incremental changes, even if they can be described in retrospect that way. The smartest humans are able to usefully reframe problems as easier ones rather than brute-force-execute a more obvious solution, and I expect this to continue. Intelligence grows more powerful in more open-ended domains as more options become available for exploration.
 ## Humans are not nearly optimal
 Due to limited working memory and the necessity of distributing subtasks in an organization, humans design and model systems based on abstraction - rounding off low-level detail to produce a homogeneous overview with fewer free parameters. [Seeing Like a State](https://en.wikipedia.org/wiki/Seeing_Like_a_State)[^1] describes how this has gone wrong historically - states, wanting the world to be easier to manage, bulldoze fine-tuned local knowledge and install simple rules and neat rectangles which produce worse outcomes. I think this case is somewhat overstated, because abstraction does often work better than the alternatives. People can't simultaneously attend to the high-level requirements of their problem and every low-level point, so myopic focus on the low-level detracts from the overall quality of the result[^2] - given the limitations of humans.
-Abstraction amortises intellect, taking good solutions to simpler and more general problems and applying them on any close-enough substrate. This has brought us many successes like industrial farming, digital computers and assembly lines. But an end-to-end design not as concerned with modularity and legibility will usually outperform one based on generalities, if you can afford the intellectual labour, through better addressing cross-cutting concerns, precise tailoring to small quirks and making simplifications across layers of the stack. [This book](https://www.construction-physics.com/p/book-review-building-an-affordable) describes some object-level examples in house construction.
+Abstraction amortises intellect, taking good solutions to simpler and more general problems and applying them on any close-enough substrate. This has brought us many successes like industrial farming, digital computers and assembly lines. But an end-to-end design not as concerned with modularity and legibility will usually outperform one based on generalities, if you can afford the intellectual labour, through better addressing cross-cutting concerns, precise tailoring to small quirks and making simplifications across layers of the stack. Due to organizational issues, the cost of human intelligence, and working memory limitations, this frequently doesn't happen. [This book](https://www.construction-physics.com/p/book-review-building-an-affordable) describes some object-level examples in house construction.
 We see the abstractions still even when they have gaps, and this is usually a security threat. A hacker doesn't care that you think your code "parses XML" or "checks authentication" - they care about [what you actually wrote down](https://gwern.net/unseeing), and what the computer will do with it[^3], which is quite possibly [not what you intended](https://blog.siguza.net/psychicpaper/). Your nice "secure" cryptographic code is [running on hardware](http://wiki.newae.com/Correlation_Power_Analysis) which reveals correlates of what it's doing. Your "air-gapped" computer is able to emit [sounds](https://arxiv.org/abs/2409.04930v1) and [radio signals](https://arxiv.org/abs/2207.07413) and [is connected to power cables](https://pushstack.wordpress.com/2017/07/24/data-exfiltration-from-air-gapped-systems-using-power-line-communication/). A "blank wall" [leaks information](https://www.cs.princeton.edu/~fheide/steadystatenlos) through diffuse reflections. Commodity "communication" hardware can [sense people](https://www.usenix.org/system/files/nsdi24-yi.pdf), because the signals travel through the same physical medium as everything else. Strange side channels are everywhere and systematically underestimated. These are the examples we *have* found, but new security vulnerabilities are detected continually and I am confident that essentially all complex software is hopelessly broken in at least one way.
@ -70,6 +69,8 @@ Training against simulation environments is also sometimes effective. As far as
 We also have very large amounts of experimental and observational data which could be reanalyzed with better methods. For instance, the [AlphaFold](https://pmc.ncbi.nlm.nih.gov/articles/PMC8371605/) models were trained on long-available protein datasets (including known amino acid sequences without known structures), but significantly outperformed every other solution at novel structure prediction through application of smarter deep learning. This did *not* require extremely expensive first-principles simulation.
 If real-world experimentation really is necessary to gather data somewhere despite this, it can be done at higher speed than the processes of academia and industry usually manage, with an eye towards gathering specifically information needed to improve simulation rather than to write a paper ([apparently](http://wavefunction.fieldofscience.com/2011/12/why-drug-design-is-like-airplane-design.html) not very well-rewarded in biology), and using large amounts of parallelism backed by enough mental power to analyze the results usefully (for example, using high-throughput screening technologies in chemistry, or thousands of centrally controlled robots to learn motor control).
 Finally, human development of theory and explanations sometimes precedes or postdates data significantly. The most obvious examples of this are in mathematics - the concepts of [group theory](https://en.wikipedia.org/wiki/Group_theory) were accessible through all of human history and did not require any special world knowledge, but weren't invented until the 1800s. Classical mechanics (as opposed to blatantly wrong Aristotelian physics) could probably have been invented millenia earlier as the simplest explanation for projectile trajectories and falling masses. General relativity was invented based on thought experiments and notions of elegance before it could be robustly tested, based on maths available significantly beforehand. Maxwell's equations predated their empirical tests by 30 years, and semiconductor switching devices were theorized 15 years before they were built. This does not happen as much in biology and chemistry, which are less theory-driven, but the periodic table made advance predictions of chemical element properties and the [triplet code](https://en.wikipedia.org/wiki/Genetic_code#History) structure of DNA was derived before it could be directly checked.
 ## Structural advantages
--- a/package-lock.json
+++ b/package-lock.json
@ -37,6 +37,7 @@
        "msgpackr": "^1.11.0",
        "mustache": "^4.0.1",
        "nanoid": "^2.1.11",
        "p-limit": "^6.2.0",
        "pako": "^2.1.0",
        "porter2": "^1.0.1",
        "pug": "^3.0.2",
@ -2453,6 +2454,21 @@
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
    "node_modules/p-limit": {
      "version": "6.2.0",
      "resolved": "https://registry.npmjs.org/p-limit/-/p-limit-6.2.0.tgz",
      "integrity": "sha512-kuUqqHNUqoIWp/c467RI4X6mmyuojY5jGutNU0wVTmEOOfcuwLqyMVoAi9MKi2Ak+5i9+nhmrK4ufZE8069kHA==",
      "license": "MIT",
      "dependencies": {
        "yocto-queue": "^1.1.1"
      },
      "engines": {
        "node": ">=18"
      },
      "funding": {
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
    "node_modules/pako": {
      "version": "2.1.0",
      "resolved": "https://registry.npmjs.org/pako/-/pako-2.1.0.tgz",
@ -3364,6 +3380,18 @@
        "heap": "^0.2.7"
      }
    },
    "node_modules/yocto-queue": {
      "version": "1.1.1",
      "resolved": "https://registry.npmjs.org/yocto-queue/-/yocto-queue-1.1.1.tgz",
      "integrity": "sha512-b4JR1PFR10y1mKjhHY9LaGo6tmrgjit7hxVIeAmyMw3jegXR4dhYqLaQF5zMXZxY7tLpMyJeLjr1C4rLmkVe8g==",
      "license": "MIT",
      "engines": {
        "node": ">=12.20"
      },
      "funding": {
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
    "node_modules/yoga-wasm-web": {
      "version": "0.3.3",
      "resolved": "https://registry.npmjs.org/yoga-wasm-web/-/yoga-wasm-web-0.3.3.tgz",
@ -4862,6 +4890,14 @@
        "is-wsl": "^3.1.0"
      }
    },
    "p-limit": {
      "version": "6.2.0",
      "resolved": "https://registry.npmjs.org/p-limit/-/p-limit-6.2.0.tgz",
      "integrity": "sha512-kuUqqHNUqoIWp/c467RI4X6mmyuojY5jGutNU0wVTmEOOfcuwLqyMVoAi9MKi2Ak+5i9+nhmrK4ufZE8069kHA==",
      "requires": {
        "yocto-queue": "^1.1.1"
      }
    },
    "pako": {
      "version": "2.1.0",
      "resolved": "https://registry.npmjs.org/pako/-/pako-2.1.0.tgz",
@ -5507,6 +5543,11 @@
        "heap": "^0.2.7"
      }
    },
    "yocto-queue": {
      "version": "1.1.1",
      "resolved": "https://registry.npmjs.org/yocto-queue/-/yocto-queue-1.1.1.tgz",
      "integrity": "sha512-b4JR1PFR10y1mKjhHY9LaGo6tmrgjit7hxVIeAmyMw3jegXR4dhYqLaQF5zMXZxY7tLpMyJeLjr1C4rLmkVe8g=="
    },
    "yoga-wasm-web": {
      "version": "0.3.3",
      "resolved": "https://registry.npmjs.org/yoga-wasm-web/-/yoga-wasm-web-0.3.3.tgz",
--- a/package.json
+++ b/package.json
@ -32,6 +32,7 @@
    "msgpackr": "^1.11.0",
    "mustache": "^4.0.1",
    "nanoid": "^2.1.11",
    "p-limit": "^6.2.0",
    "pako": "^2.1.0",
    "porter2": "^1.0.1",
    "pug": "^3.0.2",
--- a/src/index.js
+++ b/src/index.js
@ -27,6 +27,8 @@ const cssSelect = require("css-select")
 const domSerializer = require("dom-serializer")
 const domutils = require("domutils")
 const feedExtractor = require("@extractus/feed-extractor")
 const https = require("https")
 const pLimit = require("p-limit")
 const fts = require("./fts.mjs")
@ -42,6 +44,17 @@ const outDir = path.join(root, "out")
 const srcDir = path.join(root, "src")
 const nodeModules = path.join(root, "node_modules")
 const axiosInst = axios.create({
    timeout: 10000,
    headers: { "User-Agent": "osmarks.net static site compiler" },
    httpsAgent: new https.Agent({
        keepAlive: true,
        timeout: 10000,
        scheduling: "fifo",
        maxTotalSockets: 20
    })
 })
 const buildID = nanoid()
 globalData.buildID = buildID
@ -400,7 +413,7 @@ const fetchMicroblog = async () => {
        // We have a server patch which removes the 20-post hardcoded limit.
        // For some exciting reason microblog.pub does not expose pagination in the *API* components.
        // This is a workaround.
-        const posts = (await axios({ url: globalData.microblogSource, headers: { "Accept": 'application/ld+json; profile="https://www.w3.org/ns/activitystreams"' } })).data.orderedItems
+        const posts = (await axiosInst({ url: globalData.microblogSource, headers: { "Accept": 'application/ld+json; profile="https://www.w3.org/ns/activitystreams"' } })).data.orderedItems
        writeCache("microblog", posts)
        globalData.microblog = posts
    }
@ -433,16 +446,17 @@ const fetchFeeds = async () => {
        console.log(chalk.yellow("Using cached feeds"))
    } else {
        globalData.openring = {}
        const limiter = pLimit.default(4)
        const getOneFeed = async url => {
            try {
-                const data = await axios.get(url, { headers: { "User-Agent": "osmarks.net static site compiler" } })
+                const data = await axiosInst.get(url)
                return feedExtractor.extractFromXml(data.data)
            } catch (e) {
                console.warn(`${chalk.red("Failed to fetch")} ${url}: ${e.message} ${e.errors && e.errors.map(x => x.message).join("\n")}`)
            }
        }
        await Promise.all(Object.entries(globalData.feeds).map(async ([name, url]) => {
-            const feed = await getOneFeed(url)
+            const feed = await limiter(getOneFeed, url)
            if (feed) {
                globalData.openring[name] = feed
            }
@ -482,9 +496,8 @@ const fetchFeeds = async () => {
 }
 const compileCSS = async () => {
-    let css = sass.renderSync({
+    let css = sass.compile(path.join(srcDir, "style.sass"), {
-        data: await readFile(path.join(srcDir, "style.sass")),
+        style: "compressed",
        outputStyle: "compressed",
        indentedSyntax: true
    }).css
    css += "\n"
@ -601,12 +614,12 @@ const doImages = async () => {
 }
 const fetchMycorrhiza = async () => {
-    const allPages = await axios({ url: globalData.mycorrhiza + "/list" })
+    const allPages = await axiosInst({ url: globalData.mycorrhiza + "/list" })
    const dom = htmlparser2.parseDocument(allPages.data)
    const urls = cssSelect.selectAll("main > ol a", dom).map(x => x.attribs.href)
    for (const url of urls) {
        // TODO: this can run in parallel
-        const page = await axios({ url: globalData.mycorrhiza + url })
+        const page = await axiosInst({ url: globalData.mycorrhiza + url })
        const dom = htmlparser2.parseDocument(page.data)
        const title = domutils.innerText(cssSelect.selectAll(".navi-title a, .navi-title span", dom).slice(2))
        const article = cssSelect.selectOne("main #hypha article", dom)