changes of some sort, new blog post

2025-09-05 11:57:57 +00:00 · 2025-01-09 21:06:14 +00:00
parent 1c1648e0ff
commit d045a02a08
11 changed files with 55 additions and 14 deletions
--- a/blog/ai-accelerator.md
+++ b/blog/ai-accelerator.md
@@ -35,7 +35,7 @@ Excluding a few specialized and esoteric products like [Lightmatter](https://en.

 Better manufacturing processes make transistors smaller, faster and lower-power, with the downside that a full wafer costs more. Importantly, though, not everything scales down the same - recently, SRAM has almost entirely stopped getting smaller[^7], and analog has not scaled well for some time. Only logic is still shrinking fast.

-It's only possible to fit about 1GB of SRAM onto a die, even if you are using all the die area and the maximum single-die size. Obviously, modern models are larger than this, and it wouldn't be economical to do this anyway. The solution used by most accelerators is to use external DRAM (dynamic random access memory). This is much cheaper and more capacious, at the cost of worse bandwidth and greater power consumption. Generally this will be HBM (high-bandwidth memory, which is more expensive and integrated more closely with the logic via advanced packaging), or some GDDR/LPDDR variant.
+It's only possible to fit about 2GB of SRAM onto a die, even if you are using all the die area and the maximum single-die size. Obviously, modern models are larger than this, and it wouldn't be economical to do this anyway. The solution used by most accelerators is to use external DRAM (dynamic random access memory). This is much cheaper and more capacious, at the cost of worse bandwidth and greater power consumption. Generally this will be HBM (high-bandwidth memory, which is more expensive and integrated more closely with the logic via advanced packaging), or some GDDR/LPDDR variant.

 Another major constraint is power use, which directly contributes to running costs and cooling system complexity. Transistors being present and powered consumes power (static/leakage power) and transistors switching on and off consumes power (dynamic/switching power). The latter scales superlinearly with clock frequency, which is inconvenient, since performance scales slightly sublinearly with clock frequency. A handy Google paper[^8] (extending work from 2014[^9]), worth reading in its own right, provides rough energy estimates per operation, though without much detail about e.g. clock frequency: