Chapter 129: The day of the interview

Everything in this book has been about substance. This chapter is about execution. There is a gap between knowing the material cold and performing well under interview conditions, and it is not filled by more studying. It is filled by a different kind of preparation: knowing exactly what to do in the first five minutes, how to allocate your forty minutes, how to handle the moments when things go wrong, and how to read the signals the interview is sending you in real time.

Candidates who fail technical interviews after solid preparation almost always fail in one of three ways: they spend too long clarifying and run out of time for design, they don’t know a specific number and let that uncertainty crater their confidence for the rest of the session, or they misread an interviewer signal and spend fifteen minutes going in the wrong direction without course-correcting. All three are preventable. This chapter is the prevention.

Outline:

The night before.
The first five minutes of every interview.
Budgeting a 45-minute system design interview.
When to whiteboard vs talk.
Back-of-envelope under pressure.
When you don’t know: the five recovery moves.
Reading the interviewer’s silence.
The Q&A at the end.
The “why this company” question.
After the interview.
Red flags you should respond to.

129.1 The night before

The worst thing you can do the night before a senior interview is read one more chapter or run one more mock. You will not learn anything new. You will compound fatigue and anxiety. You will walk in less sharp than you would have been if you’d stopped studying the previous afternoon.

What to review. One hour maximum. Spend it on:

The one-page summary of the technical topic most likely to come up (from the job description and your research on the team).
Your ten behavioral stories, from bullet points only — not transcripts. You want the shape of each story to be fresh, not recited.
The numerical kit from Chapter 115: GPU throughput, KV cache sizing, tokens-per-day arithmetic. Thirty minutes of number review is worth more than anything else the night before.

What not to do. Do not read new material. Do not attempt a new system design from scratch. Do not run a full mock interview — it will consume two hours, leave you drained, and surface concerns you have no time to resolve. Do not read the company’s engineering blog for the first time (if you haven’t read it in the past two weeks, it’s too late).

Logistics. Confirm the interview time and format. If virtual, test your audio, camera, screen share, and the whiteboarding tool. Check that you know the time zone. Nothing derails confidence faster than a ten-minute scramble over a Zoom link that expired.

The one-page cheat sheet. This is the artifact that separates prepared candidates from very prepared candidates. One page, handwritten or printed:

The four estimations you might need (tokens/sec per H100, KV cache formula, QPS-from-DAU chain, and one problem-specific number you’ve identified).
Three key vocabulary phrases you want to use (from Chapter 122 — the ones you’ve been forgetting in mocks).
The five phases of the interview framework (clarify, estimate, design, drill, ops) with time budgets.
A one-sentence reminder of your strongest behavioral story.

You will not be allowed to look at this sheet during the interview. Write it anyway. The act of writing it consolidates the material, and the preparation ritual gives you a sense of control over the session.

Sleep. This is not optional advice. Sleep is the most performance-relevant variable on the day of the interview. Eight hours of sleep before an interview where you are expected to do back-of-envelope math, reason about distributed systems, and tell compelling professional stories under mild social pressure is not a luxury. It is the minimum condition for performing at the level you’ve been training for.

Eat before the interview. Arrive or log in five minutes early. Bring water. These are trivial details and they matter more than they should.

129.2 The first five minutes of every interview

The first five minutes set the frame for the entire session. The interviewer is forming their initial model of you during these minutes. You have almost no ability to upgrade that model later; you can only confirm or contradict it.

The opening move is simple and almost universally neglected: repeat back what you heard before you say anything else.

The interviewer says: “Design a recommendation system for a short-form video platform.”

The wrong response: launching into clarifying questions immediately, or — worse — launching into a design.

The right response: “Okay — so, a recommendation system for a short-form video platform. Before I start, let me make sure I understand the problem correctly: we’re building the system that decides which videos to show each user in their feed, not the content creation or video delivery pipeline. The scale is probably something like a consumer product with tens of millions of DAU. I want to spend the first five minutes clarifying requirements — does that work?”

This thirty-second reflection does three things. First, it gives the interviewer a chance to correct a misunderstanding before you’ve built twenty minutes of design on a wrong premise. Second, it buys you time to organize your thinking. Third, it signals immediately that you are a methodical, process-oriented engineer who won’t race past ambiguity.

Then clarify. The specific questions depend on the problem (Chapter 114, §114.3 has the full list), but the clarification move is always the same:

Reflect what you heard.
State the scope you’re assuming.
Ask the four to six questions that will most change the design.
Write down the answers.
Ratify the requirements: “So I’m designing for 50M DAU, a 200ms latency target for feed refresh, with no hard freshness requirement — recommendations can be up to 24 hours stale. Does that capture the constraints?”

Only then start the design.

The interviewer who is not correcting you is a happy interviewer. The interviewer who says “actually, the latency target is 50ms” three minutes into your design is an interviewer who needed you to ask that question five minutes earlier.

129.3 Budgeting a 45-minute system design interview

The forty minutes of design time — after a brief intro and before the closing Q&A — is a resource that must be budgeted explicitly. Most candidates who run out of time do so not because they are slow, but because they never decided how to allocate the time.

The allocation:

The drill is the phase interviewers weight most in the debrief — if you find yourself starting it at minute 30 instead of minute 25, you have five minutes to compress the close or sacrifice the ops phase, and sacrificing ops is the worse choice.

The revised allocation from Chapter 114 gives the design phase more room (fifteen minutes instead of ten) and collapses drill into a single sustained phase. The reason: for senior candidates, the design phase is where you demonstrate breadth, and the drill is where you demonstrate depth. Both matter. Compressing either collapses the signal.

How to stay on schedule. Keep the clock visible — on your phone, on your laptop, on the wall. Announce the plan at the start: “I’ll spend five minutes clarifying, five on estimation, fifteen on the high-level design, fifteen on a drill of whichever component you find most interesting, and keep five for ops and failure modes. Does that work?” This does two things: it commits you to the schedule, and it gives the interviewer a chance to tell you they want more time on a specific phase. If the interviewer says “I want to spend most of our time on retrieval,” you now know exactly where to compress.

The three most common time failures:

Over-clarifying. You ask twelve questions instead of six. Cap clarification at five minutes, hard.
Under-estimating. You skip the estimation phase to get to the diagram faster. This looks like impatience and costs you the numerical fluency signal. Always do the estimation.
Sprawling on design. You spend twenty minutes drawing a perfect diagram. The diagram is not graded; the reasoning behind it is. Fifteen minutes means a complete block diagram with labeled technologies — not a fully annotated architecture review. Speed matters here.

129.4 When to whiteboard vs talk

The single most useful rule: always whiteboard components and data flow; never whiteboard prose or lists.

A diagram earns its place when the spatial relationships between components carry meaning. Boxes and arrows showing that the API gateway feeds the admission controller which feeds the serving fleet — this is worth drawing because the reader or interviewer can see the dependency structure at a glance that would take three sentences to describe.

A list does not earn a drawing. Your clarifying questions do not need to go on a whiteboard. The tradeoffs between HNSW and IVF-PQ do not need a diagram. The six things you’re going to cover in the ops phase do not need a bullet list on the board. Prose goes in your mouth, not on the board.

The failure mode: the narrator. A narrator talks about the system instead of drawing it. “So, we’d have an API gateway that would handle auth and rate limiting, and then it would forward to an admission controller, and the admission controller would route to the serving fleet…” This is oral documentation. It tells the interviewer nothing about whether you can reason spatially about a system.

The success mode: the designer. A designer draws the box, labels it, draws the arrow, and says one sentence that explains the decision behind the design — not the description of what the box does. “Gateway handles OpenAI-compatible auth — I’m using Envoy here specifically because we get OpenTelemetry for free and the JWT validation plugin saves a week of work.” The box is already drawn; the sentence explains why, not what.

The rule of thumb: if you can point to something on the board while you say it, draw it first. If you cannot point to it, say it without drawing.

Corollary: if you’ve been talking for more than ninety seconds without drawing anything, pick up the pen.

The whiteboard matters in virtual interviews too. Most companies use a shared document or a shared whiteboard tool (Excalidraw, Miro, a Google Doc with tables). If they give you a drawing surface, use it. If they give you a text document, draw your diagram in ASCII or a table — it is imperfect but it is better than having nothing to point to.

129.5 The art of back-of-envelope under pressure

Three estimations you must be able to do in under sixty seconds. These cover 80% of what senior ML systems interviews require. Know the numbers before you walk in, not as trivia but as intuition — you should be able to feel whether an answer is plausible before you calculate it.

Estimation 1: Serving capacity of one H100.

The question behind the question: Given N requests per second at T output tokens per request, how many H100s do I need?

The template:

Realistic decode throughput, single H100, 70B BF16 model, continuous batching: ~1,000 tokens/sec.
Realistic decode throughput, single H100, 7B BF16 model, continuous batching: ~8,000 tokens/sec.
Rule of thumb: tokens/sec scales roughly linearly with the inverse of parameter count (halving the model roughly doubles throughput), with diminishing returns from batching above batch size ~32.
To compute GPU count: total tokens/sec needed at peak ÷ per-GPU throughput × 1.3 (headroom).

In practice: 1M DAU, 5 sessions/day, 10 turns/session, 300 output tokens/turn → 15B tokens/day → 174k tokens/sec average → ~500k tokens/sec peak (3×) → 500 H100s for a 70B model.

60-second version: “1M DAU, 5k tokens per user per day, 5B tokens/day, 58k tokens/sec average, 175k at peak. At 1k tokens/sec per H100, that’s 175 GPUs at peak plus 30% headroom, call it 225 H100s.”

Estimation 2: KV cache memory per concurrent user.

The question behind the question: How much GPU memory does the KV cache consume, and how does that constrain concurrency?

The template:

KV cache memory per token = 2 (K and V) × num_layers × hidden_dim × bytes_per_element / num_attention_heads × num_kv_heads.
For Llama 3 70B, BF16: approximately 0.5 MB per token.
For Llama 3 8B, BF16: approximately 0.1 MB per token.
An H100 has 80 GB HBM. After model weights (70B BF16 ≈ 140 GB across 2 GPUs), roughly 20 GB per GPU is available for KV cache.
At 0.25 MB per token per GPU (averaged across TP), 20 GB ≈ 80,000 tokens of KV cache per GPU.
For a session with 2,000 prompt tokens + 500 output tokens, that’s 2,500 tokens per in-flight request.
KV cache capacity per GPU: 80,000 ÷ 2,500 ≈ 32 concurrent requests per GPU.

In practice: this number tells you the batch size ceiling and therefore the throughput ceiling. If you need 200 concurrent requests per GPU for the throughput target, you need smaller KV cache footprint (quantization, multi-query attention, smaller context windows).

60-second version: “H100 has 80 GB. 70B BF16 across two GPUs leaves about 20 GB per GPU for KV cache. That’s ~80k tokens of cache. At 2k tokens per request, I can keep ~40 requests in flight per GPU simultaneously. That caps my batch size and therefore caps throughput — I need to know the request distribution before I can say whether that’s enough.”

Estimation 3: QPS from DAU.

The question behind the question: The problem statement says “N million daily active users” — what QPS does that translate to?

The template:

Daily active users × sessions/day × requests/session ÷ 86,400 = average QPS.
Peak QPS ≈ 3–5× average QPS (rule of thumb for consumer apps; enterprise apps have a sharper peak).
For a chat app: 3 sessions/day, 10 requests/session = 30 requests/user/day.
1M DAU × 30 requests ÷ 86,400 = 347 average QPS. Peak ~1,000–1,750 QPS.
For a search/retrieval API: often a single request per “query” but latency requirement is tighter.

Sanity check: if the answer comes out above 100,000 QPS from the template, double-check your sessions/day number — that’s approximately Twitter-scale for chat and is probably wrong for a first-pass estimate.

60-second version: “10M DAU, 3 sessions/day, 5 turns each — that’s 150M turns/day, 1,736 average QPS, call it 7k QPS at peak with a 4× peak factor. At 500ms p99 latency, Little’s Law gives me 3,500 in-flight requests simultaneously, which drives my replica count.”

Practice all three until the numbers are reflexive. The goal is not to get an exact answer — the goal is to produce a defensible estimate within two minutes, name your assumptions, and proceed. An interviewer who sees you do this correctly is immediately shifted from “might be L5” to “probably is L5.”

129.6 When you don’t know: the five recovery moves

At some point in every interview, you will hit a gap. A question you can’t fully answer, a number you don’t know, a technology you’ve heard of but never used. How you handle that moment determines more of the outcome than any other single event in the session.

graph TD
  GAP["You hit a knowledge gap"]
  GAP --> Q1{"Do you know\nwhat you'd measure?"}
  Q1 -->|Yes| M1["Move 1: Say what\nyou'd measure first"]
  Q1 -->|No| Q2{"Can you name\nthe tradeoff?"}
  Q2 -->|Yes| M2["Move 2: Name the\ntradeoff explicitly"]
  Q2 -->|No| Q3{"Can you commit\nto a direction?"}
  Q3 -->|Yes| M3["Move 3: Pick a\ndirection and commit"]
  Q3 -->|No| Q4{"Know a\nsimilar system?"}
  Q4 -->|Yes| M4["Move 4: Reference\nthe analogous system"]
  Q4 -->|No| M5["Move 5: Ask the\ninterviewer a targeted\nclarifying question"]
  M1 --> OK["Continue interview\nwith credibility intact"]
  M2 --> OK
  M3 --> OK
  M4 --> OK
  M5 --> OK
  style GAP fill:var(--fig-accent-soft),stroke:var(--fig-accent)
  style M3 fill:var(--fig-surface),stroke:var(--fig-border)

Never fake knowledge — the five moves cover every gap scenario without requiring fabrication.

Move 1: Say what you would measure.

If you don’t know the answer, you almost always know what data would let you find it. “I don’t know the exact throughput characteristics of Mamba at 7B scale — I know it’s better than Transformers for long sequences because of the linear recurrence, but I don’t have a number I trust. What I’d do: run a microbenchmark with the ShareGPT distribution of sequence lengths, measure tokens/sec at batch sizes 1, 8, and 32, and compare to an equivalent Transformer. That’s about two days of work and it gives a directly comparable number.”

This move signals calibration. You don’t know the answer, but you know how to get the answer. That is the actual senior skill.

Move 2: Name the tradeoff explicitly.

If you’re not sure which of two approaches is correct, name the tradeoff: “I genuinely don’t know whether DiskANN or FAISS+HNSW is better for this workload — it depends on the index size and the recall/latency requirement. At under 200 GB, HNSW is probably better. Above 200 GB, DiskANN is probably better because it avoids OOM. I’d benchmark both and pick based on the p99 recall at the target QPS.”

This is not hiding the gap. It is showing that you understand why the question is hard. That is a better signal than a confident wrong answer.

Move 3: Pick a direction and commit.

Sometimes the right move is just to decide. “I don’t have enough information to know which is better, so I’m going to assume HNSW and proceed. If that turns out to be wrong — if the corpus is too large for RAM — I’ll revisit.” Committing to a direction and making it explicit that it is a commitment under uncertainty is better than stalling. Engineers make provisional decisions under uncertainty constantly. Showing you can do this is a senior signal.

Move 4: Reference a similar system you do know.

“I’ve never shipped a recommendation system at TikTok scale, but I’ve worked on recommendation systems at roughly 5M DAU, and the architecture we used was a two-tower retrieval followed by a cross-encoder reranker. I’d expect the same architecture to hold at 10× scale — the retrieval index gets distributed and the reranker gets batched more aggressively. The main thing I’d want to understand before committing is the latency budget for the reranker at this scale.”

Analogical reasoning is explicit good sense. It shows you can transfer knowledge from familiar ground to unfamiliar ground rather than claiming either omniscience or total ignorance.

Move 5: Ask the interviewer a targeted clarifying question.

“Before I answer that, can I ask a clarifying question? Is the constraint here primarily memory, or primarily latency? The answer changes which direction I’d go.”

This is a controlled use of a question to buy time while also producing useful information. It works best for genuinely underspecified questions. Do not use it as a stalling tactic — interviewers recognize that pattern. Use it when the question genuinely has two valid answers depending on a constraint you don’t have.

The sixth move that is never acceptable: fake knowledge. Never fabricate a number, never pretend you’ve used a system you haven’t, never claim a fact you’re uncertain about as if it were settled. The interviewer either knows the right answer (likely — they wrote the question) or will follow up with a question that exposes the fabrication. Either way, you lose. The five moves above cover every scenario where you’d be tempted to fabricate. Use one of them instead.

129.7 Reading the interviewer’s silence

Silence is the most underrated element of a live interview. Most candidates treat it as dead air and rush to fill it. Senior candidates read it.

There are three distinct silences in an interview, and they require different responses.

The thinking silence. The interviewer is processing what you just said. They’re writing a note, they’re formulating a follow-up question, or they’re just giving you the space to keep talking if you want to. This silence lasts three to seven seconds and usually ends with a question or a nod.

The right response: wait. Give the silence three full seconds before you add anything. If you add too quickly, you interrupt the interviewer’s processing. If you add nothing, the natural rhythm carries you forward to their next question.

The wrong response: immediately restating what you just said in different words. This is a nervous habit that wastes time and signals anxiety.

The bored silence. The interviewer has stopped engaging. Their responses are shorter. Their questions are perfunctory. They’re not pushing back on your design, not following up on your technical claims, not redirecting. This silence is diagnostic: you are either going in a direction they don’t find interesting, or you are narrating instead of reasoning, or you are at the wrong depth for the question.

The right response: change something. If you’ve been narrating, draw something. If you’ve been describing a component at a high level, go one level deeper. If you’ve been talking for five minutes without a check-in, pause and ask: “I’ve been going deep on the retrieval layer — is that the direction you want, or would you rather I move to the ranking layer?” A check-in question breaks the bored silence and gives the interviewer a chance to redirect.

The wrong response: interpreting the silence as approval and continuing unchanged. Bored silence rarely resolves itself. You have to change the input.

The disagreeing silence. The interviewer has heard something they think is wrong. The silence is thoughtful — they’re deciding whether to challenge you or let you continue. It often comes after a technical claim, and it has a different texture than bored silence: there is engagement in it, not absence.

The right response: notice it. “I want to check — does that reasoning land for you, or is there something wrong with it?” This gives the interviewer permission to express the disagreement directly, which is almost always better than letting it fester into a debrief note. If there is a disagreement and you surface it, you can engage it. If it stays implicit, it becomes “didn’t handle pushback well” in the write-up.

The wrong response: ignoring it and pressing on. A senior candidate who can’t read social signals is a liability in cross-functional work. The interviewer is checking whether you can.

Practice reading silence in mock interviews. Ask your mock interviewer to use long silences deliberately and vary them across the three types. The ability to distinguish them is a real skill that improves with practice.

129.8 The Q&A at the end

The Q&A at the end of every interview is part of the evaluation. This is stated in Chapter 123 for the behavioral interview; it applies here too, for technical interviews, and the principle is the same.

The interviewer writes their score between when you leave and when they submit the debrief. The last thing they experienced in the session was your closing questions. End well.

Questions that signal seniority:

“What’s the dominant bottleneck in the system your team maintains right now — the thing that wakes up the on-call most often?” This signals that you care about real operational experience, not org-chart position. And it gets you real information you can use to calibrate the role.

“What’s the biggest architectural decision the team has made in the last 12 months that it would reverse if it could?” A question about regret gets at honesty and at what the team has learned. Teams that can’t identify any regrets are either perfect (unlikely) or incurious about their own mistakes (concerning).

“What does success look like for the person who takes this role in the first 12 months? Not the official JD version — the actual version that would make you say, ‘that hire worked out.’” This is the most valuable question you can ask. It gives you the real measurement criteria and it gives the interviewer a chance to describe what they actually need.

“What keeps you at this company versus going somewhere else?” A genuine question about what the interviewer finds compelling about the team. It’s personal and direct and generates real signal about whether the culture is good.

Questions that signal junior:

“What’s the tech stack?” You have the job description. If the tech stack was not on the job description, a brief check of the team’s engineering blog would have surfected it. Asking this signals you either didn’t do the research or you’re asking for something to say.

“Is there a lot of room for growth here?” Unanswerable and generic. If you’re interested in growth, ask the specific version: “Has anyone on this team been promoted in the last year? What was the work that drove it?”

“What are the hours like?” The right question to ask a recruiter. Asking an interviewer about hours signals that you’re optimizing for comfort rather than contribution.

Nothing. Always ask at least one question. Silence at the end of an interview is a signal that you’ve already mentally checked out.

129.9 The “why this company” question

This question appears in approximately 80% of behavioral interviews and some technical ones. It is routinely answered poorly, usually in one of two ways: generic enthusiasm (“I’ve always admired this company’s culture and engineering excellence”) or obvious sycophancy (“You’re the leader in AI infrastructure and every engineer wants to work here”).

Neither of these is wrong exactly. Both signal that you haven’t thought hard about the question.

The strong answer has three components:

Component 1: A specific technical reason. “The team is working on disaggregated prefill-decode at scale, and that’s a problem I’ve been thinking about from the application side — I’d like to be closer to the infrastructure layer to understand where the real bottlenecks are.” You know what the team is actually working on. You have a specific opinion about it. You have a reason why this particular problem is interesting to you at this particular time.

Component 2: A specific timing reason. “I’ve spent the last three years building on top of inference infrastructure. I think the next three years are more interesting one level down, closer to the hardware abstractions.” Why now? What has changed about your career goals or your perspective that makes this role the right move at this point?

Component 3: Something honest. “And honestly, the team’s reputation for technical depth is part of it — I want to be in an environment where the engineering bar is above mine, and the research and engineering output I’ve seen from this team suggests it is.” This is the part candidates usually only say generically. A specific, honest statement about what you’re looking for in the people around you is not sycophancy — it is a real answer.

What to avoid: “I’m really passionate about AI” (every candidate is), “this company is a leader in X” (obvious), “I love the mission” (generic unless followed by a specific example of how the mission intersects with your work). Lead with the technical reason. The culture and mission can follow, but only if they’re specific.

The test: would this answer make sense for a different company in the same space? If yes, it’s too generic. If no — if the answer is specific to this team’s work, this problem, this moment — it’s the right answer.

129.10 After the interview

The interview is over. You have two to four hours before your memory of the details degrades substantially. Use them.

Write down immediately:

The design questions you were asked.
The specific moments where you were uncertain or wrong.
The technical questions you couldn’t answer well — the specific thing you didn’t know.
The behavioral stories you told and which follow-up questions they generated.
Anything the interviewer said that seemed like signal — a technical claim they volunteered, a question they repeated, a component they pushed back on.

This is not about self-criticism. It is about building the golden set for your next interview. Every interview is data about the current state of the question bank, the depth expected at this company, and the specific gaps in your preparation. If you don’t write it down within four hours, the specifics will be gone by morning.

The self-debrief. After writing, spend fifteen minutes scoring yourself on the five dimensions from Chapter 114 §114.1: judgment under ambiguity, numerical fluency, tradeoff awareness, operational realism, recovery and self-correction. Not to beat yourself up — to identify which dimension was weakest and focus the next preparation cycle on that.

This is the blameless postmortem model (Chapter 122, phrase 37) applied to your own performance. The goal is system improvement, not self-assessment.

When to reach out to the recruiter. If the company’s timeline is longer than a week and you haven’t heard anything, a brief, professional check-in after seven to ten business days is appropriate: “Hi [name], I wanted to follow up on my interview on [date]. I remain very interested and would appreciate any update on the timeline.” One email. Not two.

If the recruiter contacts you, respond within 24 hours. If you have competing offers with overlapping deadlines, tell the recruiter immediately — they would rather know than have you go dark.

The outcome. If you get an offer: congratulations. Read the offer carefully, understand the equity structure, negotiate if you have grounds, and make the decision deliberately.

If you don’t get an offer: ask for feedback. Most companies won’t give specific feedback for legal reasons, but some will. If you can get any signal — “the loop was weak on systems design” or “we were looking for more seniority in the technical questions” — write it down and add it to the next preparation cycle.

One rejection from one company tells you nothing about you. Five rejections with a consistent pattern tells you something specific. Track the pattern.

129.11 Red flags you should respond to

The interview experience is also data about the company. Not everything unusual is a red flag — interviews are chaotic and humans are fallible. But some patterns are worth paying attention to.

The interviewer is ten or more minutes late, with no acknowledgment. A single late interviewer in a five-person loop is probably a one-off. An interviewer who shows up twelve minutes late and doesn’t acknowledge it or apologize signals that the company’s culture around other people’s time may be careless. This is more informative about the day-to-day working culture than anything the interviewer says about culture.

How to respond: don’t make it weird. Proceed with professionalism. Note it as a data point. Ask about team culture in your closing question.

The question is ambiguous on purpose with no acknowledgment that it’s ambiguous. There is a difference between a deliberately under-specified question (which is a standard and valid interview technique — you’re supposed to clarify) and a question that was just not thought through. You can usually tell the difference by how the interviewer responds to your clarifying questions: a deliberately under-specified question leads to clean answers; a poorly constructed question leads to vague, inconsistent, or contradictory answers.

How to respond: if the interviewer is contradicting themselves on the requirements, say gently: “I want to make sure I’m working from a consistent set of requirements. Earlier I thought the latency target was 100ms, but now it sounds like 500ms — can we settle on one?” If the contradiction continues, pick the more interesting constraint and proceed.

The interviewer interrupts constantly, not to ask a question but to insert their own design. A small number of interviewers use interviews as opportunities to show off their own technical knowledge. They interrupt you mid-sentence to say “actually, I would have done it this way…” and then describe their preferred design. This is bad interview technique. More importantly, it is diagnostic of a person who will be difficult to work with in design reviews.

How to respond: engage the interruption as if it were a question. “That’s interesting — you’d use [X] here. Can you say more about the advantage over [Y]?” This shows technical engagement and redirects the interviewer back to evaluating you rather than presenting to you. Note this as a data point about the person and the team.

The interviewer seems checked out. They’re not writing notes, they’re barely making eye contact, their questions are minimal, they seem to be elsewhere mentally. This is the most common red flag and the hardest to interpret. It might mean your answers are so strong that there’s nothing to write. It might mean they’ve already made a negative decision. It might mean they had a bad morning. It might mean they do this at every interview.

How to respond: a check-in question. “I want to make sure I’m going in the direction you find most interesting — do you want me to keep going on the retrieval layer, or would you rather I move to the ranker?” This gives the checked-out interviewer an on-ramp back into the conversation. If they return, proceed. If they don’t, proceed anyway and write the best interview you can. The other interviewers in the loop will be part of the debrief.

The interviewer asks questions that reveal very little understanding of the domain. If you are interviewing for an ML infrastructure role and the interviewer asks you questions that are clearly generic software engineering questions without any ML specificity — and they seem unfamiliar with the terminology you’re using — this might mean you’re talking to a generalist SWE who was added to the loop without domain expertise. This is not necessarily bad, but it means the interviewer cannot accurately evaluate the depth of your ML systems knowledge.

How to respond: meet the interviewer where they are. If they’re a strong generalist SWE, treat the interview as a systems design interview and lean on the distributed systems vocabulary. Do not make the person feel inadequate for not knowing the ML-specific terms. And note it: this company may not be staffing their ML interview loop with ML engineers, which tells you something about the maturity of the ML practice.

The mental model

Eight points to carry out of this book:

The night before is for consolidation, not new learning. Review your numerical kit, your behavioral bullet points, and the five-phase framework. Then sleep.
The first five minutes are disproportionately high-signal. Reflect what you heard, clarify explicitly, write down the requirements, ratify them. Never start designing before you have ratified requirements.
The 45-minute budget is fixed. Announce your allocation plan in the first sixty seconds. The drill starts at minute 25. If it starts at minute 35, the session is already compromised.
Always whiteboard components and data flow. Never whiteboard prose. The goal is to be a designer, not a narrator.
Three estimations cover 80% of what you’ll need: GPU throughput, KV cache per user, QPS from DAU. Know them cold.
When you don’t know, use one of the five recovery moves. Never fabricate. Name what you’d measure, name the tradeoff, commit to a direction, reference an analogy, or ask a clarifying question. All five are better than silence or invention.
Read the silence. Three silences, three responses: thinking (wait), bored (change direction), disagreeing (surface it).
The Q&A and the debrief window are part of the evaluation. Ask one real question at the end. Write down what happened within four hours. Do the blameless postmortem.

Those are the eight bullet points of Chapter 129. The interview is tomorrow. You are ready.

Read it yourself

Jeff Atwood, “We Hire the Best, Just Like Everyone Else” — on the gap between interview performance and job performance, and why process matters.
Gayle McDowell, Cracking the Coding Interview, chapter on the soft skills of interviews — the tactical framework applies equally to system design loops.
Kahneman and Klein, “Conditions for Intuitive Expertise: A Failure to Disagree” (2009) — on why calibrated confidence under uncertainty is a learnable skill, not a trait.
The Google SRE book, chapter on postmortems — for the mental model behind the post-interview self-debrief.
Patrick McKenzie (patio11) on salary negotiation — read the archived posts. The tactical advice is dated; the mental model is permanently valid.

Practice

Do one full mock interview — 45 minutes, full five phases — and record the audio. Time each phase. Note which phase ran over and why.
Rehearse the three canonical estimations (§129.5) out loud, from memory, until you can do each in under sixty seconds.
Practice the five recovery moves (§129.6) by having a mock interviewer ask you a question you genuinely don’t know the answer to, for five rounds in a row. The constraint: you must use a different move each round.
In your next mock interview, deliberately pause three times and check in with the interviewer rather than continuing. Note their response to each check-in.
Write five closing questions for your target company’s ML platform team. None of them are on the “signals junior” list from §129.8.
Write down the specific numbers from §129.5 on a piece of paper by memory. Check against the text. Repeat until they are exact.
Stretch: Write a pre-mortem before your next real interview: three things most likely to go wrong, the specific recovery move for each, and what “good enough” looks like for each phase. Read it the night before. Put it away before you walk in.