1
0
mirror of https://github.com/osmarks/website synced 2025-09-06 20:37:55 +00:00

blog external link tracking

This commit is contained in:
osmarks
2025-04-12 13:17:31 +01:00
parent 1a271f69f3
commit 71e8e39b3b
11 changed files with 8765 additions and 324 deletions

View File

@@ -31,7 +31,7 @@ While "generative AI" now comprises the majority of interest in AI, a large frac
But what, exactly, are the constraints on hardware driving these limits? I'm not an electrical engineer or chip designer, but much of this is public and relatively easy to understand, and some vendors provide helpful information in whitepapers.
Excluding a few specialized and esoteric products like [Lightmatter](https://en.wikipedia.org/wiki/Lightmatter) and [Mythic](https://mythic.ai/)'s, AI accelerators are built on modern digital logic semiconductor processes. They contain at least one logic die - it can be more than one thanks to modern advanced packaging like Intel Foveros and TSMC CoWoS - with some mix of analog circuitry for IO, SRAM (static random access memory) for fast on-chip memory, and logic gates for the control and computation. The main limit on the complexity of the GPU is die area: each transistor in a logic circuit or IO interface or memory array consumes some area, and the cost increases somewhat superlinearly with die area. This is because die are made on a fixed-size wafer which is then cut up ("singulated"), bigger die have a higher total number of random defects ("worse yields") and so need to be discarded more often, and there's a maximum size (the "reticle limit"), above which it's necessary to combine several with expensive advanced packaging.
Excluding a few specialized and esoteric products like [Lightmatter](https://web.archive.org/web/20240909133702/https://en.wikipedia.org/wiki/Lightmatter) and [Mythic](https://mythic.ai/)'s, AI accelerators are built on modern digital logic semiconductor processes. They contain at least one logic die - it can be more than one thanks to modern advanced packaging like Intel Foveros and TSMC CoWoS - with some mix of analog circuitry for IO, SRAM (static random access memory) for fast on-chip memory, and logic gates for the control and computation. The main limit on the complexity of the GPU is die area: each transistor in a logic circuit or IO interface or memory array consumes some area, and the cost increases somewhat superlinearly with die area. This is because die are made on a fixed-size wafer which is then cut up ("singulated"), bigger die have a higher total number of random defects ("worse yields") and so need to be discarded more often, and there's a maximum size (the "reticle limit"), above which it's necessary to combine several with expensive advanced packaging.
Better manufacturing processes make transistors smaller, faster and lower-power, with the downside that a full wafer costs more. Importantly, though, not everything scales down the same - recently, SRAM has almost entirely stopped getting smaller[^7], and analog has not scaled well for some time. Only logic is still shrinking fast.