From 9d9a78a950bbed6fe7b6246c572ba12f58d8ee20 Mon Sep 17 00:00:00 2001 From: osmarks Date: Fri, 24 Jan 2025 15:17:41 +0000 Subject: [PATCH] emphasis blocks --- blog/ml-workstation.md | 10 +++++++--- blog/scaling-meme-search.md | 6 ++++-- src/index.js | 4 +++- src/style.sass | 7 +++++++ 4 files changed, 21 insertions(+), 6 deletions(-) diff --git a/blog/ml-workstation.md b/blog/ml-workstation.md index 6ea08b8..67957a9 100644 --- a/blog/ml-workstation.md +++ b/blog/ml-workstation.md @@ -5,9 +5,7 @@ created: 25/02/2024 updated: 14/04/2024 slug: mlrig --- -::: epigraph attribution=@jckarter link=https://twitter.com/jckarter/status/1441441401439358988 -Programmers love talking about the “bare metal”, when in fact the logic board is composed primarily of plastics and silicon oxides. -::: +::: emphasis ## Summary @@ -16,6 +14,12 @@ Programmers love talking about the “bare metal”, when in fact the logic boar - Older or used parts are good to cut costs (not overly old GPUs). - Buy a sufficiently capable PSU. +::: + +::: epigraph attribution=@jckarter link=https://twitter.com/jckarter/status/1441441401439358988 +Programmers love talking about the “bare metal”, when in fact the logic board is composed primarily of plastics and silicon oxides. +::: + ## Long version Thanks to the osmarks.net crawlers scouring the web for bloggable information[^1], I've found out that many people are interested in having local hardware to run machine learning workloads (by which I refer to GPU-accelerated inference or training of large neural nets: anything else is [not real](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)), but are doing it wrong, or not at all. There are superficially good part choices which are, in actuality, extremely bad for almost anything, and shiny [prebuilt options](https://lambdalabs.com/gpu-workstations/vector-one) which are far more expensive than necessary. In this article, I will outline what to do to get a useful system at somewhat less expense[^2]. diff --git a/blog/scaling-meme-search.md b/blog/scaling-meme-search.md index cb89d67..697faf6 100644 --- a/blog/scaling-meme-search.md +++ b/blog/scaling-meme-search.md @@ -6,14 +6,16 @@ created: 24/01/2025 series_index: 3 slug: memescale --- +::: emphasis +Try the new search system [here](https://nooscope.osmarks.net/). I don't intend to replace the existing [Meme Search Engine](https://mse.osmarks.net/), as its more curated dataset is more useful to me for most applications. +::: + ::: epigraph attribution="Brian Eno" Be the first person to not do something that no one else has ever thought of not doing before. ::: Computers are very fast. It is easy to forget this when they routinely behave so slowly, and now that many engineers are working on heavily abstracted cloud systems, but even my slightly outdated laptop is in principle capable of executing 15 billion instructions per core in each second it wastes stuttering and doing nothing in particular. People will sometimes talk about how their system has to serve "millions of requests a day", but a day is about 105 seconds, and the problem of serving tens of queries a second on much worse hardware than we have now was solved decades ago. The situation is even sillier for GPUs - every consumer GPU is roughly as fast as entire 1990s supercomputers[^1] and they mostly get used to shade triangles for games. In the spirit of [Production Twitter on One Machine](https://thume.ca/2023/01/02/one-machine-twitter/), [Command-line Tools can be 235x Faster than your Hadoop Cluster](https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html) and projects like [Marginalia](https://search.marginalia.nu/), I have assembled what I believe to be a competitively sized image dataset and search system on my one ["server"](/stack/)[^2] by carefully avoiding work. -Try the new search system [here](https://nooscope.osmarks.net/). I don't intend to replace the existing [Meme Search Engine](https://mse.osmarks.net/), as its more curated dataset is more useful to me for most applications. - ## Scraping The concept for this project was developed in May, when I was pondering how to get more memes and a more general collection without the existing semimanual curation systems, particularly in order to provide open-domain image search. [MemeThresher](/memethresher/)'s crawler pulls from a small set of subreddits, and it seemed plausible that I could just switch it to `r/all`[^3] to get a decent sample of recent data. However, after their IPO and/or some manager realizing unreasonably late that people might be willing to pay for unstructured text data now, Reddit [does not want you](https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy) to scrape much, and this consistently cut off after a few thousand items. Conveniently, however, in the [words](https://www.reddit.com/r/reddit4researchers/comments/1co0mqa/our_plans_for_researchers_on_reddit/) of Reddit's CTO: diff --git a/src/index.js b/src/index.js index 5dc86ab..5a40662 100644 --- a/src/index.js +++ b/src/index.js @@ -127,6 +127,8 @@ const renderContainer = (tokens, idx) => { out += `${md.utils.escapeHtml(button[0])}` } return out + } else if (blockType === "emphasis") { + return `
` } } else { if (blockType === "captioned") { @@ -141,7 +143,7 @@ const renderContainer = (tokens, idx) => { ret = `
${md.utils.escapeHtml("— ") + inner}
` + ret } return ret - } else if (blockType === "buttons") { + } else if (blockType === "buttons" || blockType === "emphasis") { return `
` } } diff --git a/src/style.sass b/src/style.sass index 6bbacbc..b44e329 100644 --- a/src/style.sass +++ b/src/style.sass @@ -411,3 +411,10 @@ table #citebox width: 100% + +.emphasis + margin-top: 16px + margin-bottom: 16px + padding: 16px + p + margin: 0