Part X · ML System Design Interview Playbook
Chapter 128 ~31 min read

Behavioral interviews and the levels ladder

"The technical interview proves you can think. The behavioral interview proves you can be trusted with the scope"

Every chapter in this book up to this point has been about technical substance — architecture, estimation, tradeoffs, ops vocabulary. This chapter is about the other half of the loop. Behavioral questions make up 30–50% of a senior interview, and most ML engineers under-invest in them to the point of self-sabotage. They spend four weeks memorizing KV cache internals and ninety minutes on behavioral prep, then get passed over at the debrief because the interviewers couldn’t tell what level this person actually operates at.

Behavioral interviews are not soft. They are a structured signal extraction exercise, and they are just as adversarially designed as any system design question. The interviewer is trying to determine, in 45 minutes, whether your instincts and your judgment match the scope of the role they’re hiring for. Every story you tell is evidence for a level claim, and the interviewers are adding up that evidence the whole time.

This chapter is the minimum you need. Not a complete guide to every company’s rubric, not a philosophy of authentic storytelling, not a therapist’s take on professional narrative. It is: what interviewers are grading, what each level expects, how to structure your answers, ten stories you must have ready, and the specific language that separates a senior from a mid-level answer.

Outline:

  1. Why the behavioral interview exists.
  2. The levels ladder: L3 through L7 in one page.
  3. The STAR framework and why candidates butcher it.
  4. The ten canonical stories you must have ready.
  5. The bar-raising question: “What would you do differently?”
  6. What L6+ candidates say that L4 candidates don’t.
  7. Red flags interviewers tag silently.
  8. The behavioral interview as a two-way conversation.

128.1 Why the behavioral interview exists

The premise behind behavioral interviewing is that past behavior is the best available predictor of future behavior. Not what you say you’d do — what you’ve actually done. An interviewer can write a hard technical question and observe your reasoning in real time. Behavioral questions do something different: they ask you to retrieve evidence from your own history, and the retrieval itself is data.

The interviewer is solving a problem: they need to know whether you will operate at the right level when you join the team. Technical fluency is necessary but not sufficient. A very strong L4 might have the same technical depth as a weak L6, but the L6 will define the problem, set the quality bar for the team, resolve a cross-team conflict without escalating, and build something that outlasts their tenure. That difference is not visible in a system design discussion. It is visible in the behavioral interview.

The concrete things behavioral interviewers score:

Scope signal. What was the radius of the work? Did you own a component, a system, a platform, a strategic direction? A story about debugging a one-person bug fix tells one story; a story about redesigning a data pipeline that 12 teams depended on tells another.

Ambiguity signal. Did you receive a clear spec and execute it, or did you identify the problem, define the requirements, and then execute? The higher the level, the more the interviewer wants to see you running toward the fog, not away from it.

Impact signal. Did the thing you built matter? Can you quantify it? “We shipped a feature” is not impact. “We reduced p99 serving latency by 40%, which unblocked three product launches and reduced infrastructure cost by $2M/year” is impact.

Leadership signal. Did you shape other people’s work? Did you change the direction of something bigger than yourself? Did you mentor someone visibly? Did you push back on a product direction you thought was wrong, and did that push-back matter?

Learning signal. What do you do when things go wrong? Interviewers listen for specific language: “I was wrong about X, and what I changed was Y.” That sentence is worth more than ten stories about success.

The behavioral interview is not a character test. It is a scope-and-judgment test. Every interviewer who runs one is trying to answer the question: if I extrapolate this person’s past behavior into the future, what level does the data support? Your job is to make that extrapolation obvious, concrete, and unambiguous.

128.2 The levels ladder: L3 through L7 in one page

Different companies use different numbering schemes — some have E3/E4/E5, some have IC1 through IC6, some use SDE I / II / Senior / Staff / Principal. The underlying behavioral expectations are the same everywhere and map onto a five-point ladder that is roughly consistent across the industry. This section uses L3–L7 as convenient labels.

The five dimensions interviewers grade behavioral stories against:

Levels ladder comparison across L3 through L7 on five dimensions: scope, ambiguity, cross-team influence, quality bar, and mentorship. Dimension L3 L4 L5 L6 L7 Scope What you own Component or feature System or service area Cross-service platform area Multi-team org-wide platform Company-level technical strategy Ambiguity Problem definition Clear spec given to you Some definition required Define problem and solution Define the problem space Define which problems matter Influence Without authority Within team only Adjacent team collaboration Drives cross-team alignment Shapes org technical direction External and industry influence Quality bar Standard setting Follows team standards Contributes to standards Sets quality bar for team Sets bar for the org Sets bar for the industry Mentorship Developing others Mentored, not mentoring Helps peers informally Actively grows junior engineers Multiplies other senior engineers Shapes hiring and career ladders SWE I / MLE I SWE II / MLE II Senior Staff Principal / Sr Staff
L5 (Senior) is the inflection point — below it, scope is handed to you; at and above it, scope is claimed by you. Every behavioral story should be placed on this ladder before you tell it.

The critical insight from this table: the difference between L4 and L5 is not technical depth. It is the transition from “I was given a well-defined problem and solved it” to “I identified a problem, defined it, and drove the solution.” An L4 story told in an L5 interview is not just under-leveled — it actively undermines the candidate’s case.

The other critical insight: L6 and L7 are multipliers, not individual contributors. An L6 story is almost always about how you changed the direction of other engineers’ work, not just your own. The L7 story is about changing what the organization believes is worth doing.

Map your ten prepared stories to this ladder before you walk into any interview. Know which level each story demonstrates. Bring mostly L5 stories to a senior interview; bring mostly L6 stories to a staff interview. Bringing the wrong level of stories to the wrong interview is the most common behavioral failure mode, and nobody ever tells the candidate that’s why they didn’t get the offer.

128.3 The STAR framework and why candidates butcher it

STAR stands for Situation, Task, Action, Result. Most people know this. Most people still butcher it.

The structure:

  • Situation: What was the context? One to two sentences. No more.
  • Task: What specifically was your responsibility? One sentence.
  • Action: What did you do? This is the long part — 60–70% of the answer.
  • Result: What happened? Quantified.

The butchering happens in four places.

Problem 1: The Situation takes too long. A candidate spends three minutes explaining the company background, the team structure, the product roadmap, and the business context before they get to the actual story. The interviewer does not care. They need enough context to understand why the decision mattered, not a quarterly business review. Cap the situation at thirty seconds.

Problem 2: The Action is “we.” “We re-architected the pipeline. We reduced latency by 40%. We got buy-in from leadership.” Every “we” is a missed opportunity to demonstrate what you specifically did. The interviewer is evaluating you, not your team. If the team re-architected the pipeline, what was your specific contribution? You designed the new partitioning scheme. You wrote the migration plan. You ran the stakeholder review. Those are the sentences that matter.

The rule: first-person ownership throughout the Action section. “I proposed the re-architecture. I designed the new partitioning scheme. I ran the A/B test that confirmed the latency improvement. I wrote the runbook and trained the on-call rotation.” The team helped. Your sentences are about what you drove.

Problem 3: The Result has no numbers. “It made a big impact” is not a result. “We reduced p99 serving latency from 800 ms to 480 ms, which reduced user-facing error rate by 60% and removed the serving cost as a blocker for the mobile launch that shipped two months later.” That is a result. Every story should have at least one concrete number in the Result section. If you genuinely don’t remember the number, say “I don’t have the exact figure memorized but it was approximately X% improvement, which was significant enough to show up in the user-facing metrics.”

Problem 4: The story has no tension. A story where everything went smoothly according to plan is not interesting and is not informative. Good stories have a moment where something went wrong, a decision that was genuinely difficult, a person who disagreed, or a constraint that forced a hard tradeoff. If your story has no tension, you are either misremembering it or you are picking the wrong story.

The anti-pattern that summarizes all four: “We were building a RAG system, and our team worked really hard to optimize the retrieval layer, and we improved recall quite a bit, and leadership was happy with the results.” Every word of that sentence is the wrong shape.

The correct shape: “We had a retrieval recall problem — our RAG system was surfacing relevant documents only 62% of the time, which was causing a 4× human review rate that was unsustainable. I identified that the embedding model was the bottleneck, not the vector search — the ANN was retrieving correctly but the embeddings themselves were semantically off. I proposed and ran a three-week fine-tuning experiment on domain-specific data, which got recall to 81%. I then worked with the product team to define a quality gate that made 80% the ship threshold. We shipped on schedule, the human review rate dropped to 1.2×, and the cost of running the pipeline dropped by roughly $40k/month.”

That is what STAR should sound like.

128.4 The ten canonical stories you must have ready

You cannot prepare answers for every behavioral question. But you can prepare ten core stories that, with slight reframing, cover the full range of questions. The interviewers at every major tech company are drawing from the same question bank. The ten stories below cover at least 90% of what you will be asked.

For each story, have it rehearsed to a length of two to four minutes. Know the numbers. Know the specific things you did. Know what went wrong. Know what you learned.


Story 1: The hardest technical decision you’ve made.

Why it’s asked: This is an L5+ probe. The interviewer wants to see whether you can own a decision under uncertainty, reason through tradeoffs, and commit to a path even when the right answer is not obvious.

Strong answer template: The context: a decision with two or more genuinely valid options, where the right answer depended on assumptions you had to make. Your analysis: what information you gathered, what tradeoffs you identified, how you weighted them. Your decision: what you chose, and why. The outcome: what happened, and whether you were right.

The trap: Candidates describe a decision that was not actually hard — the technically correct answer was obvious in hindsight and the “difficulty” was execution, not judgment. The story has to be a real fork in the road where a reasonable engineer could have chosen differently.

Weak candidate says: “We had to decide whether to use Postgres or DynamoDB. I researched both and recommended DynamoDB because we needed high write throughput. Leadership agreed.”

Strong candidate says: “We had to decide whether to migrate from Postgres to DynamoDB for our session store. The case for migration was real — we were hitting write throughput limits — but migration would have delayed the product launch by three months and disrupted three teams. I was the only one making the case to stay on Postgres and invest in read replicas plus a write-ahead log sharding pattern. I documented both options, ran a load test, and presented the trade: the Postgres path was 6 weeks of work and bought us 18 months, at which point we’d revisit. I was outvoted initially. I asked for 72 hours to run the load test before we committed. The load test showed the Postgres path actually met our peak load target. We stayed on Postgres. The migration was never needed — 18 months later, the product had pivoted and DynamoDB would have been wasted.”


Story 2: Time you disagreed with a senior person.

Why it’s asked: Tells the interviewer whether you have enough judgment and backbone to push back when you’re right. A candidate who has never disagreed with anyone senior is either not being honest or has never had an independent technical opinion.

Strong answer template: The disagreement must be substantive and technical, not interpersonal. The senior person held position X. You held position Y. You were both reasoning in good faith. You articulated your position specifically, you listened to their response, and the outcome was either: you changed your position because of new information, they changed theirs because of your reasoning, or you reached a principled impasse and escalated to data.

The trap: Candidates tell a story where they were obviously right and the senior person was obviously wrong. This signals poor judgment — real disagreements are rarely that asymmetric — and it signals a tendency to perceive seniority as an obstacle rather than a signal.

Weak candidate says: “My manager wanted to use a really outdated approach. I pushed back and was right.”

Strong candidate says: “Our VP of engineering wanted to build our vector search in-house instead of using an external vendor. His reasoning was cost and data privacy. My position was that the engineering time would be better spent on the application layer, where we had differentiated work to do. I acknowledged the cost argument was real — I ran the build-vs-buy analysis and the 3-year TCO of building was indeed lower, by about 15%. But the 2-year cost was higher by 40%, and the team had no ANN expertise, so the first year would be learning time, not shipping time. I presented the analysis, not the opinion. He maintained his position; I asked if we could run a 60-day vendor evaluation while starting the in-house scoping, so we could make the decision with real data. He agreed. The vendor evaluation showed the build timeline was 9 months; the vendor deployment was 3 weeks. We went with the vendor. I was right about the timeline estimate, but I was wrong about the 3-year TCO — in-house would have been cheaper at scale. I said so clearly in the retrospective.”


Story 3: Time you were wrong about something big.

Why it’s asked: Calibration. The most trustworthy engineers are the ones who acknowledge their mistakes cleanly, analyze why they were wrong, and change their behavior. A candidate who has never been wrong about anything big is a risk.

Strong answer template: A genuine mistake, not a failure that was obviously someone else’s fault. What you believed, why you believed it, what the real world showed you, what you changed.

The trap: Candidates describe a mistake that is actually a success story in disguise — they made a small error, caught it themselves, and the outcome was fine. The story has to be one where the mistake had real consequences, and where acknowledging it reflects genuine intellectual honesty, not just a rehearsed line about humility.

Weak candidate says: “I once underestimated how long a project would take. I learned to be more careful with estimates.”

Strong candidate says: “I was convinced our recommendation system’s quality problem was a model problem — we needed a better embedding model. I spent six weeks pursuing that direction, got 3% improvement on offline eval, and our online A/B test showed no signal. It was a distribution shift problem, not a model quality problem. Our prod features were computed on stale data, and the model had never seen the actual serving distribution. I had focused on the part of the problem I understood — model architecture — and ignored the data pipeline signals that should have told me earlier. The six weeks were largely wasted, and we had to start over on the data freshness investigation. I now always run a data quality audit before starting model work. That reflex has saved me twice since.”


Story 4: Project that failed and what you learned.

Why it’s asked: Mirrors Story 3 but at a project scope. The interviewer wants to see whether you can analyze failure at a systems level, not just a personal level.

Strong answer template: A project that failed to meet its goals. The specific reason it failed. What you would do differently. What changed in how you work as a result.

The trap: Candidates describe a project that “failed” but actually shipped a slightly worse product than planned. A real failure is one where the project was cancelled, the impact was zero, or the outcome was actively negative. Smaller-than-expected success is not failure. The interviewer can tell the difference.


Story 5: Time you had to influence without authority.

Why it’s asked: The senior-engineer superpower test. At L5+, most of the work you need to do requires people who don’t report to you to do things differently. Influence without authority is the whole job.

Strong answer template: A situation where you needed an outcome and you had no direct authority to compel it. What you did to build alignment: data, demos, coalition building, one-on-one conversations, writing a doc, making the cost of inaction visible. The outcome.

The trap: Candidates describe using their seniority to tell someone what to do, which is authority, not influence. The story only works if there was genuine resistance, and you had to change minds through reasoning rather than position.

Weak candidate says: “I wanted another team to adopt our new API, so I presented it at an all-hands and they adopted it.”

Strong candidate says: “I needed the platform team to prioritize a caching layer that would have unblocked four teams including mine. The platform team’s roadmap was full and they had three higher-priority asks. I didn’t have authority to change their priorities. What I did: I quantified the cost of absence — the four blocked teams were collectively spending 80 engineering-hours per month on workarounds. I put that number in writing with specifics. I got the four tech leads to co-sign a doc making the joint ask, so the platform team’s manager was hearing it from four leaders, not one. I offered to do the first version of the work myself and hand it to them for ownership, which removed the staffing cost objection. The ask went from ‘no, not this quarter’ to ‘yes, we’ll co-own it’ in three weeks. I scoped the initial implementation, they took it from there.”


Story 6: Time you pushed back on product or leadership.

Why it’s asked: Product direction and engineering feasibility are often in tension. The ability to push back professionally, with data, and to know when to push back versus when to execute on a direction you disagree with is a fundamental senior skill.

Strong answer template: A direction you disagreed with, the specific concern you raised, how you raised it, and what happened. The story should demonstrate both technical judgment and professional tact — you raised the concern clearly but didn’t blow up the relationship.

The trap: Candidates describe either never pushing back (which signals deference over judgment) or winning an argument that turns out to have been a personality conflict dressed up as a technical one.


Story 7: Biggest impact project and how you measured it.

Why it’s asked: This is the impact signal. What does the candidate point to as their most significant contribution, and can they measure it?

Strong answer template: The project, what it did, how you measured its impact (latency, cost, revenue, error rate, user metric), and why this was the biggest thing you worked on. The measurement piece is critical. “We improved the user experience” is not impact. “We reduced checkout latency from 1.8 seconds to 600 ms and checkout conversion improved by 12%, which the team attributed approximately $4M/year in incremental revenue” is impact.

The trap: Candidates pick a project that was technically interesting but had no business or user impact, or they pick a high-impact project they were adjacent to rather than central to.


Story 8: Time you coached or mentored someone.

Why it’s asked: L5+ signal. If you have no stories of developing other engineers, you look like an individual contributor who is not ready for a senior role that requires multiplying others.

Strong answer template: The person’s starting point, the specific skill or behavior you worked on with them, how you worked on it, and the observable outcome. The outcome should be concrete: they were promoted, they shipped something they couldn’t have shipped before, they resolved a type of problem they used to escalate.

The trap: Candidates describe answering questions from junior engineers, which is helpful but is not mentorship. Mentorship is intentional, ongoing, goal-directed. The story has to demonstrate that you thought about the person’s development explicitly, not just that you were available when they had questions.

Weak candidate says: “I was happy to help junior engineers whenever they came to me with questions.”

Strong candidate says: “I worked closely with a junior engineer who was technically strong but was struggling with ambiguous problems — she would wait for a clearer spec before starting. Over three months, I worked with her on three specific projects where I deliberately gave her an ambiguous problem statement, had her define the requirements herself, then reviewed her reasoning rather than her code. In our 1:1s I asked ‘what assumptions did you make and why’ rather than ‘what did you build.’ By the end of the quarter she was defining requirements for the team on one of our projects without prompting. She was promoted the following cycle.”


Story 9: Time you simplified something complex.

Why it’s asked: Simplification is a core engineering skill and a strong signal for L5+. The ability to find the 20% of a system that does 80% of the work, to remove a layer, to make a system easier to reason about — this is often more impactful than adding capability.

Strong answer template: The thing that was complex, why it was complex (accidental complexity or essential complexity), what you did to simplify it, and the outcome in terms of operational burden, velocity, or error rate.

The trap: Candidates describe removing a feature or functionality. Simplification is about reducing complexity while preserving or improving capability, not about cutting scope.


Story 10: Time you dealt with a tight deadline and incomplete information.

Why it’s asked: Production engineering is full of moments where you have to make a call with 60% of the information you’d like. The ability to bound the uncertainty, make a defensible decision, and execute is fundamental.

Strong answer template: The deadline, the information you had, the information you were missing, how you bounded the risk of acting on incomplete information, the decision you made, and what happened. Did you turn out to be right? Did you turn out to be wrong, and how did you handle it?

The trap: Candidates describe making a snap decision that worked out. The interesting part is not the outcome — it’s the reasoning under uncertainty. How did you decide what information to try to get versus what to bound? What was your explicit risk model?


Prepare all ten. Write them out once, in full, in STAR format. Then cut them down to bullet points — just the key facts, numbers, and decisions. Rehearse from the bullet points, not the transcript. You want the story to feel natural, not recited.

128.5 The bar-raising question: “What would you do differently?”

At the end of a strong behavioral story, the interviewer will sometimes ask: “If you could go back and do that project over, what would you do differently?”

This is the single most powerful question in the behavioral interview. It is the question that separates L5 answers from L6 answers more cleanly than almost anything else.

The L4 answer: “Honestly, I think we did pretty well. Maybe I would have communicated more frequently with stakeholders.” Translation: the candidate is not willing to critique their own work substantively, or they lack the perspective to see the real structural problem.

The L5 answer: “I would have run the data quality audit two weeks earlier. I spent six weeks optimizing the model when the real bottleneck was the feature freshness. Earlier diagnosis would have saved that time.” This is specific, it identifies a real failure, and it is phrased as a structural change, not a vague intention.

The L6 answer: “I would have scoped the project differently from the start. I defined success as ‘model quality improvement,’ which led the team down the model optimization path. The right framing was ‘user-facing metric improvement,’ which would have surfaced the data freshness problem in the first week. The lesson I took from that wasn’t ‘run audits earlier’ — it was ‘define the success metric before picking the technical approach.’ I’ve changed how I scope all my projects since then.” The L6 answer diagnoses the structural failure in the problem definition, not just the execution. It shows the candidate can reason about their own reasoning process.

The distinction: L5 identifies what to do differently. L6 identifies why they made the original mistake and what they changed in their mental model as a result.

Prepare this follow-up for every one of your ten stories. It is not optional. Every story should have a clear, specific, intellectually honest answer to “what would you do differently?” If the answer is “nothing,” you are either misremembering the project or you are not at the level required to give this answer.

128.6 What L6+ candidates say that L4 candidates don’t

The following phrases appear regularly in L6 answers and almost never in L4 answers. This is not a vocabulary trick — you should only use a phrase if you have a real example behind it. But if you have the experience and you are not using the language, you are underselling the level.

“I scoped it to…” Scoping is an active choice, and naming it explicitly signals that you made a deliberate decision about what was in and out of bounds. “I scoped the initial work to the serving layer only, because that was the highest-leverage piece and the team could execute it without dependencies. The training-side improvements were important but we deferred them.” An L4 executes the given scope. An L6 names the scope they chose and why.

“The counterfactual was…” “The counterfactual to our caching investment was paying approximately $800k more per year in compute and having higher tail latency. That framing made the caching work easy to prioritize.” L6 candidates think in counterfactuals. They can articulate what would have happened if they hadn’t done the work.

“The durable change was…” “The immediate impact was a 40% latency reduction. The durable change was that the team now has a culture of measuring before optimizing — that behavioral change has produced three more improvements in the 18 months since.” L6+ candidates distinguish between the output of a project and the lasting change to the system, the team, or the process that the project left behind.

“I delegated the [X] to [engineer] because…” Naming delegation explicitly signals that you were running a team-level effort, not just an individual effort. “I designed the architecture and led the recovery analysis, but I delegated the implementation of the migration tooling to two junior engineers because it was a strong growth opportunity for them and didn’t require my involvement.”

“I changed my prior on X when…” Calibration language. The willingness to name a belief you held and then updated is an L6 signal. “I came in with a prior that the model quality was the bottleneck. I updated that prior when I saw the serving latency data — the model was fine, the infrastructure was not.”

“The leverage was…” “There were ten things I could have worked on. The reason I chose this one is that it was the highest-leverage point — fixing it unblocked three downstream teams and reduced our incident rate by two-thirds.” L6 candidates explain their prioritization decisions, not just their execution decisions.

“I invested in making this repeatable because…” “Rather than doing the migration manually, I invested two extra weeks in building a migration framework that the team could use for the next three migrations. That turned out to be right — we’ve run the framework four times since and saved roughly 40 team-weeks.”

These phrases are common in L6 answers because they reflect the actual way senior engineers think. If you have examples that fit these framings and you are not using this language, your behavioral stories are under-representing your level.

128.7 Red flags interviewers tag silently

Interviewers rarely tell candidates which signals landed badly. The feedback in rejections is almost always vague — “we’re looking for someone with more experience at this scope” or “the loop wasn’t strong enough at the senior level.” But the actual signals that drove that conclusion are specific, and they show up repeatedly. These are the ones that appear most often in debrief notes.

Passive voice throughout the Action section. “The system was redesigned. The data pipeline was migrated. The performance issues were resolved.” Passive voice is a tell for credit diffusion. The interviewer notices immediately that the candidate is not claiming ownership of the actions. Always active, always first-person.

“We” everywhere in the Action section. Covered in §128.3. The team did the work; your story is about what you specifically did.

No numbers in the Result section. If the impact was real, there are numbers somewhere. Find them before the interview. “Improved significantly” tells the interviewer you either don’t know the result or you’re uncomfortable claiming it.

No failures, no mistakes, no disagreements. A career with no failures, no bad decisions, and no pushback from colleagues is a career that never happened. Interviewers who hear ten positive stories with no tension become suspicious that the candidate is either very junior or very selective in the truth. The absence of failure stories is itself a negative signal.

Blaming others for failures. “The project failed because the product manager kept changing requirements.” Maybe true. But the interesting question is: what did you do when the requirements changed? How did you manage the instability? The blame attribution signals that the candidate did not think of the failure as something they could have influenced.

Scope creep in the Situation section. The candidate spends five minutes setting context and the interviewer still doesn’t know what the story is about. Senior candidates front-load: one sentence of context, then the decision or action.

No “what I learned.” A story with no explicit learning is a story the candidate has not fully processed. Every significant professional experience should have a durable lesson. If the lesson is “I confirmed that my original instinct was right,” that is not a lesson.

Interviewers asking follow-up questions to get basic facts. If the candidate doesn’t know the rough scale of their own project (how many users, how much cost, what the team size was), it reads as either low ownership or poor preparation.

Stories that are too recent or too old. Extremely recent stories (the last three months) often lack the perspective to reflect on well. Very old stories (five-plus years ago) signal that the candidate’s most compelling work is behind them. Mix of recent and mature is best.

128.8 The behavioral interview as a two-way conversation

The behavioral interview is not an interrogation. It is a conversation, and conversations run in both directions. The questions you ask at the end of a behavioral interview — or at natural pauses in the conversation — are part of the evaluation.

The interviewers are watching whether you have thought carefully about what you are signing up for. Junior candidates ask about salary, benefits, and perks. Mid-level candidates ask about the tech stack. Senior candidates ask about the work itself, the team’s current constraints, and the growth opportunities.

Questions that signal seniority:

On team health and technical debt. “What is the biggest piece of technical debt the team is carrying, and what’s the plan for it?” This signals that you think about long-term system health, and it also surfaces real information about the team’s actual state.

On the scope of the role. “Where would you expect me to be spending most of my time in the first six months — individual contribution, cross-team coordination, or something else?” This signals that you think carefully about role definition and that you understand different work modes.

On the hardest problem. “What’s the hardest technical problem the team is facing right now that hasn’t been solved?” A question about challenge signals that you want interesting work, not an easy landing.

On growth path. “What does the growth path from this role look like? Have there been engineers who came in at this level and moved to the next in a reasonable timeframe?” This signals that you think about your own development and that you take promotion trajectories seriously.

On failure. “What was the most significant incident the team has had in the last year, and what changed as a result?” This signals operational maturity and a willingness to engage with failure as a learning mechanism.

Questions that signal junior:

  • “What’s the work-life balance like?” (Signals that you’re optimizing for comfort over contribution.)
  • “What’s the tech stack?” (Read the job description. Asking this signals you haven’t done basic research.)
  • “When will I hear back?” (This is a recruiter question, not an interviewer question. Ask the recruiter.)
  • Nothing. Silence at the end of a behavioral interview signals disengagement.

The end of the interview is not a formality. It is the last data point the interviewer collects. Use it.


The mental model

Seven points to carry into Chapter 129:

  1. Behavioral interviews grade scope, ambiguity, impact, leadership, and learning — not personality. Every answer is evidence for a level claim.

  2. Map your stories to the levels ladder before you walk in. A great L4 story in an L5 interview costs you the offer.

  3. STAR is the structure; first-person ownership is the obligation. Every action sentence should say “I” not “we.”

  4. Ten canonical stories cover 90% of what you’ll be asked. Prepare them all, write them down, rehearse from bullet points.

  5. “What would you do differently?” is the most important question. Prepare a specific, structural answer for every story.

  6. L6 language is not vocabulary — it is a way of thinking about scope, counterfactuals, delegation, and leverage. Use it only when you have the substance behind it.

  7. The questions you ask are part of the evaluation. Ask about the hardest technical problem, the technical debt, or the scope of the role — not about the perks.

Chapter 129 is the day itself — what to do the night before, how to run the first five minutes, and the tactical moves that carry you from “prepared” to “hired.”


Read it yourself

  • Laszlo Bock, Work Rules!. The Google VP of People Ops on what structured behavioral interviews actually measure and why they predict performance.
  • Geoff Smart and Randy Street, Who. The hiring methodology that most systematic behavioral interviewers are drawing from, whether they know it or not.
  • The Amazon Leadership Principles — not because Amazon is special, but because they are the most explicit public articulation of what behavioral interviewers are grading for at every major tech company.
  • “Cracking the PM Interview” by Gayle McDowell — read the behavioral sections. The same framework applies to engineering; the questions are nearly identical.
  • The “Levels.fyi” interview section — read recent interview reports for any company you’re targeting. The behavioral questions that appear there are the real questions.

Practice

  1. Write out all ten stories from §128.4 in full STAR format. This will take two to three hours. Do it.
  2. For each story, write a specific answer to “what would you do differently?” before your next interview.
  3. Record yourself telling Story 7 (biggest impact project). Watch it back. Count the number of times you say “we” when you mean “I.” Count the number of times you quantify the result. Target: zero “we” in the Action section, two or more numbers in the Result.
  4. Map each of your ten stories to the levels ladder from §128.2. Label each story with the level it demonstrates. Ensure you have at least three stories at L5 or above.
  5. Give your ten stories to a senior colleague and ask them: “What level does each of these stories demonstrate to you?” The answer may be lower than you expect.
  6. List five questions you would ask at the end of a behavioral interview for your target role. Make sure none of them are on the “signals junior” list.
  7. Stretch: Run a full mock behavioral interview with a senior engineer in your network. Ask them to push back on at least one story by saying “that sounds like a team effort — what specifically did you do?” and “if you could do it over, what would you change?” Practice the responses until they are automatic.