1
0
mirror of https://github.com/osmarks/website synced 2025-05-30 21:24:08 +00:00

fix error

This commit is contained in:
osmarks 2025-05-28 09:50:12 +01:00
parent 9680903b48
commit 4b4fdb4c4d

View File

@ -47,7 +47,7 @@ The closest things made available to consumers are in networking, as the most co
## Power density
People complain about the RTX 5090 having 600W of rated power draw and the "inefficiency" of modern client CPUs, but power density in servers has similarly been trending upwards. At the top end, Nvidia is pushing increasingly deranged [600kW racks](https://www.tomshardware.com/pc-components/gpus/nvidia-shows-off-rubin-ultra-with-600-000-watt-kyber-racks-and-infrastructure-coming-in-2027), equivalent to roughly half the power draw of a small legacy datacentre, but we see a rough [exponential trend](https://www.servethehome.com/why-servers-are-using-so-much-power-tdp-growth-over-time-supermicro-vertiv-intel-amd-nvidia/) in mainstream dual-socket CPUs, which now have maximum TDPs you would struggle to run your desktop at[^9]. Desktop chassis are roomy and permit large, quiet cooling systems: most servers are one or two rack units (1.25 inches) tall, so they've historically used terrifying 10k-RPM fans which can use as much as [10% of a server's power budget](https://www.servethehome.com/deep-dive-into-lowering-server-power-consumption-intel-inspur-hpe-dell-emc/). To mitigate this, high-performance systems are moving to liquid cooling. Unlike enthusiast liquid cooling systems, which exist to dump heat from power-dense CPUs into the probably-cool-enough air quickly, datacentres use liquid cooling to manage temperatures at the scale of racks and above, and might have facility-level water cooling.
People complain about the RTX 5090 having 600W of rated power draw and the "inefficiency" of modern client CPUs, but power density in servers has similarly been trending upwards. At the top end, Nvidia is pushing increasingly deranged [600kW racks](https://www.tomshardware.com/pc-components/gpus/nvidia-shows-off-rubin-ultra-with-600-000-watt-kyber-racks-and-infrastructure-coming-in-2027), equivalent to roughly half the power draw of a small legacy datacentre, but we see a rough [exponential trend](https://www.servethehome.com/why-servers-are-using-so-much-power-tdp-growth-over-time-supermicro-vertiv-intel-amd-nvidia/) in mainstream dual-socket CPUs, which now have maximum TDPs you would struggle to run your desktop at[^9]. Desktop chassis are roomy and permit large, quiet cooling systems: most servers are one or two rack units (1.75 inches) tall, so they've historically used terrifying 10k-RPM fans which can use as much as [10% of a server's power budget](https://www.servethehome.com/deep-dive-into-lowering-server-power-consumption-intel-inspur-hpe-dell-emc/). To mitigate this, high-performance systems are moving to liquid cooling. Unlike enthusiast liquid cooling systems, which exist to dump heat from power-dense CPUs into the probably-cool-enough air quickly, datacentres use liquid cooling to manage temperatures at the scale of racks and above, and might have facility-level water cooling.
::: captioned src=/assets/images/supermicro_water_cooling.jpg
A SuperMicro GPU server with direct-to-chip liquid cooling, via [ServeTheHome](https://www.servethehome.com/liquid-cooling-next-gen-servers-getting-hands-on-3-options-supermicro/4/). Unlike consumer liquid cooling, this is designed for serviceability, with special quick-disconnect fittings.
@ -55,7 +55,7 @@ A SuperMicro GPU server with direct-to-chip liquid cooling, via [ServeTheHome](h
## Disaggregation
Even as individual servers grow more powerful, there is demand for pulling hardware out of them and sharing it between them to optimize utilization. This is an old idea for bulk storage ([storage area networks](https://en.wikipedia.org/wiki/Storage_area_network)), although there are some new ideas like [directly Ethernet-connected SSDs](https://www.servethehome.com/ethernet-ssds-hands-on-with-the-kioxia-em6-nvmeof-ssd/). With the increased bandwidth of PCIe and RAM costs making up an increasing fraction of server costs ([about half](https://www.nextplatform.com/2020/04/03/cxl-and-gen-z-iron-out-a-coherent-interconnect-strategy/) for Azure), modern servers now have the [CXL](https://www.servethehome.com/cxl-is-finally-coming-in-2025-amd-intel-marvell-xconn-inventec-lenovo-asus-kioxia-montage-arm/) protocol for adding extra memory over PCIe (physical-layer) links. This is most important for [cloud providers](https://semianalysis.com/2022/07/07/cxl-enables-microsoft-azure-to-cut/)[^10], who deal with many VMs at once which may not fill the server they are on perfectly, and which need to have all the memory they're paying for "available" but which may not actively use much of it at a time. This creates inconsistent memory latency, but servers already had to deal with this - even single-socket servers now have multiple [NUMA](https://en.wikipedia.org/wiki/Non-uniform_memory_access) nodes because of use of chiplets.
Even as individual servers grow more powerful, there is demand for pulling hardware out of them and sharing it between them to optimize utilization. This is an old idea for bulk storage ([storage area networks](https://en.wikipedia.org/wiki/Storage_area_network)), although there are some new ideas like [directly Ethernet-connected SSDs](https://www.servethehome.com/ethernet-ssds-hands-on-with-the-kioxia-em6-nvmeof-ssd/). With the increased bandwidth of PCIe and RAM costs making up an increasing fraction of server costs ([about half](https://www.nextplatform.com/2020/04/03/cxl-and-gen-z-iron-out-a-coherent-interconnect-strategy/) for Azure), modern servers now have the [CXL](https://www.servethehome.com/cxl-is-finally-coming-in-2025-amd-intel-marvell-xconn-inventec-lenovo-asus-kioxia-montage-arm/) protocol for adding extra memory over PCIe (physical-layer) links. This is most important for [cloud providers](https://semianalysis.com/2022/07/07/cxl-enables-microsoft-azure-to-cut/)[^10], who deal with many VMs at once which may not fill the server they are on perfectly, and which need to have all the memory customers are paying for "available" but which may not actively use much of it at a time. This creates inconsistent memory latency, but servers already had to deal with this - even single-socket servers now have multiple [NUMA](https://en.wikipedia.org/wiki/Non-uniform_memory_access) nodes because of use of chiplets.
::: captioned src=/assets/images/cxl_memory_expander.jpg
A CXL memory expander which can use older DDR4 DIMMs, via [ServeTheHome](https://www.servethehome.com/cxl-is-finally-coming-in-2025-amd-intel-marvell-xconn-inventec-lenovo-asus-kioxia-montage-arm/).