GPU Sizer

LLM GPU Sizing Calculator

Training / Fine-Tuning

Inference

Model Size (Billion Parameters)

Precision

Concurrent Users

Context Length (Tokens)

Number of Layers

Attention Heads

Head Dimension

Note: These are theoretical estimates. Practical optimizations like LoRA, QLoRA, or model sharding can reduce requirements by 75-90%.

The sizing for inference is done assuming this is a multihead attention model.

INT4 data types may not be supported on last GPU generation. Please refer to NVIDIA specifications to verify compatibility.

LLM GPU Sizing Calculator

Training / Fine-Tuning

Inference

GPU Recommendations

NAVIGATION

Pages Annexes