Hopper
Transformer Engine with FP8. First generation tuned for LLM training and inference at scale. H200 adds HBM3e for larger KV caches.
Still a strong on-prem choice for 70B-class production inference. Cheaper per node than Blackwell and shipping in volume.
