LLM GPU Sizing Calculator

LLM GPU Sizing Calculator

Calculate GPU memory requirements and server sizing for Large Language Models

by ADAPTiZY

Training / Fine-Tuning

Inference

Note: These are theoretical estimates. Practical optimizations like LoRA, QLoRA, or model sharding can reduce requirements by 75-90%.

The sizing for inference is done assuming this is a multihead attention model.

INT4 data types may not be supported on last GPU generation. Please refer to NVIDIA specifications to verify compatibility.