Interactive tools
Ten calculators for the numbers interviewers expect you to know cold. Covers inference, training, cost, retrieval, and reliability.
KV cache Attention O(s^2) Latency budget Training memory LoRA params Cost per 1M tokens Quantization savings Little's Law Availability Error budget
KV cache memory calculator
Formula: 2 x layers x kv_heads x head_dim x bytes_per_param x sequence_length x batch_size
KV cache size
0 GB
Attention memory scaling
See how the O(s^2) score matrix grows with sequence length.
Score matrix
16M
entries (s x s)
Memory (bf16)
32 MB
per head
vs FlashAttention
O(s)
never materializes
Inference latency budget
Estimate TTFT and total generation time.
Queue
Prefill (TTFT)
Decode
Total end-to-end
Training memory estimator
Weights + gradients + optimizer state + activations. Ch 4, 12.
LoRA parameter calculator
Trainable params = (d_in + d_out) x r x num_adapted_layers. Ch 15.
Cost per 1M tokens
Self-hosted vs API breakeven. Ch 30.
Quantization memory savings
See how INT4/INT8/FP8 reduces model weight memory. Ch 26.
Little's Law calculator
L = lambda x W. Concurrent requests = arrival rate x avg latency. Ch 75.
Concurrent (L)
800
Compound availability
Serial dependencies multiply. A1 x A2 x ... x An. Ch 95.
Compound availability
99.6%
Error budget burn rate
How fast you're consuming your monthly error budget. Ch 95.