Interactive tools

Ten calculators for the numbers interviewers expect you to know cold. Covers inference, training, cost, retrieval, and reliability.

KV cache memory calculator

Formula: 2 x layers x kv_heads x head_dim x bytes_per_param x sequence_length x batch_size

KV cache size
0 GB

Attention memory scaling

See how the O(s^2) score matrix grows with sequence length.

Score matrix
16M
entries (s x s)
Memory (bf16)
32 MB
per head
vs FlashAttention
O(s)
never materializes

Inference latency budget

Estimate TTFT and total generation time.

Queue
Prefill (TTFT)
Decode
Total end-to-end

Training memory estimator

Weights + gradients + optimizer state + activations. Ch 4, 12.

LoRA parameter calculator

Trainable params = (d_in + d_out) x r x num_adapted_layers. Ch 15.

Cost per 1M tokens

Self-hosted vs API breakeven. Ch 30.

Quantization memory savings

See how INT4/INT8/FP8 reduces model weight memory. Ch 26.

Little's Law calculator

L = lambda x W. Concurrent requests = arrival rate x avg latency. Ch 75.

Concurrent (L)
800

Compound availability

Serial dependencies multiply. A1 x A2 x ... x An. Ch 95.

Compound availability
99.6%

Error budget burn rate

How fast you're consuming your monthly error budget. Ch 95.