Chapter 101: Build systems for monorepos: Bazel, Pants, Buck, Nx

Part IX is the build-and-operate half of the book. The preceding parts handled what the system does (ML, serving, retrieval, agents, observability). This part handles how the system gets to production — the compile, package, deploy, and operate loop that a platform team owns. It opens with the first problem every growing codebase hits: the language’s native build tool collapses under the weight of a monorepo, and a proper build system has to take over.

By the end of this chapter the reader can explain why Bazel, Pants, Buck, and Nx exist, what hermeticity and remote caching buy, how a BUILD file models a dependency graph, and what the “distroless static + CGO-off” rule for Go binaries actually means. The concepts here are prerequisites for Chapters 100 (containers), 104 (OCI lifecycle), and 105 (GitOps) — the build system is what produces the artifacts those chapters consume.

Outline:

Why native build tools break at monorepo scale.
What hermeticity actually means.
Remote caching and the dependency graph as an artifact.
Incremental builds and the content-addressable store.
Bazel and the BUILD file model.
Pants and Buck as alternatives.
Nx for JavaScript monorepos.
Cross-language builds — Go, Python, Java, Rust, containers.
The distroless static + CGO-off rule for Go.
When a build system is not worth the cost.
The mental model.

101.1 Why native build tools break at monorepo scale

Every language ships with a build tool. npm for Node, pip for Python, go build for Go, mvn/gradle for Java, cargo for Rust. These tools are fine — excellent, even — for a single package with a single output. They start to fail the moment you have dozens or hundreds of interdependent packages in one repo.

The failures come in three categories. Wall-clock time: a full rebuild touches every package even when only one file changed. go build ./... in a 200-package monorepo takes minutes; Maven reactor builds take longer. Correctness: native tools rely on implicit dependencies — the filesystem, the installed Python version, PATH, GOPATH, whatever Docker image happens to be on the developer’s laptop. Two developers running the same command produce different outputs. Sharing: a work-from-home engineer re-downloads and re-compiles what the CI runner built an hour ago. Every engineer pays the full cost of every build.

A monorepo amplifies all three. With 10 services the problems are annoying; with 500 services they are company-ending. At some headcount — typically around 50 engineers and a few million lines of code — the native tool starts taking longer than the patience of the fastest engineer. That engineer starts writing shell scripts to skip unchanged packages. The scripts become a Makefile. The Makefile becomes a tangle of if [ -f .built ] guards and race conditions. The company is now maintaining an informal, incorrect build system. The correct move is to adopt a formal one.

The formal options fall into two camps. Google-inspired multi-language systems (Bazel, Pants, Buck) that try to orchestrate every language in one dependency graph. Language-specific systems (Nx, Turborepo, Gradle) that accept a single language family and optimize hard within it. The choice depends on how many languages the monorepo actually uses.

Native build tools fail at monorepo scale across three independent axes; a proper build system addresses all three by making builds hermetic, incremental, and remotely cached.

101.2 What hermeticity actually means

Hermeticity is the property that a build produces the same output given the same input, on any machine, at any time. It is the foundation of every other property of a modern build system — caching, reproducibility, parallelism. Without hermeticity none of them work.

The standard failure mode of make or npm run build is non-hermeticity. The build reads /usr/bin/gcc, whose version is not declared. It reads $PATH, which differs between machines. It writes timestamps into the output. It reads ~/.npmrc. It downloads packages from a registry that may change. The “same” build on two machines produces two different binaries, and it is impossible to say which is correct.

A hermetic build system makes all of this explicit. Every input is a declared file in the source tree or a pinned external dependency. The compiler is a pinned toolchain, vendored or downloaded from a content-addressable URL with a SHA256. Build actions run in a sandbox — typically a Linux mount namespace or a Docker container — that hides everything except the declared inputs. The output is determined entirely by the declared inputs. If two machines compute the same SHA256 over the same inputs and run the same action, they produce byte-identical outputs.

The payoff is that the build becomes a pure function. action(inputs) -> outputs is deterministic and cacheable. Two engineers running bazel build //service/foo against the same commit get the same binary. CI produces the same binary developer laptops do. The binary in production was produced by the same action that was verified on a developer’s laptop. Every downstream property — reproducibility, provenance, supply-chain attestation (see Chapter 106) — follows from this one primitive.

Hermeticity makes the build a pure function — same inputs always produce the same output, which is the necessary condition for correctness and remote caching.

Non-hermeticity sneaks in constantly. A test that reads the current time. A code generator that embeds __FILE__. A Go binary with -ldflags="-X main.buildTime=$(date)". Each of these breaks the cache and has to be surgically removed. The discipline is not free, but the alternative is a build system that lies about whether it is up to date.

101.3 Remote caching and the dependency graph as an artifact

Once the build is hermetic, the dependency graph becomes a first-class artifact. Every action has a cache key: the hash of its declared inputs, its command line, and the toolchain it uses. The output of the action is stored, indexed by that key, in a content-addressable store. A remote build cache is just this store, served over HTTP or gRPC, shared across all machines.

The effect is transformative. The first engineer to run bazel build //... on a new commit pays the full build cost; everyone else downloads the outputs from the cache.

graph LR
  Dev1[Engineer A<br/>first build] -->|populates| RC[(Remote Cache)]
  CI[CI Runner<br/>first build] -->|populates| RC
  Dev2[Engineer B] -->|cache hit| RC
  Dev3[Engineer C] -->|cache hit| RC
  Dev4[Engineer D] -->|cache hit| RC
  style RC fill:var(--fig-accent-soft),stroke:var(--fig-accent)

The remote cache is a shared content-addressable store — CI and the first developer to build a commit populate it; every subsequent build on the same inputs downloads outputs instead of recomputing them. CI pipelines become cache populators. Developer laptops become cache consumers. A full rebuild on a warm cache is often dominated by network and disk I/O, not compilation. Bazel’s --remote_cache=grpc://... and Pants’ remote caching via the Remote Execution API are the standard interfaces. BuildBuddy, EngFlow, NativeLink, and Buildbarn are hosted or self-hosted implementations.

Remote execution goes one step further: not only are outputs cached remotely, but actions are also run remotely on a fleet of workers. A 5-minute link step on a laptop becomes a 30-second link step on a 64-core remote worker. The developer’s laptop just orchestrates — it sends action requests to the remote executor and receives outputs back. This is how Google famously builds Chromium in minutes instead of hours.

The dependency graph is what makes this possible. Every action declares its inputs and outputs; the build system sees the entire DAG and knows exactly what can run in parallel. A monorepo with 100,000 targets has a DAG with 100,000 nodes, and the build system schedules them across a cluster of workers the same way Spark schedules tasks. The per-engineer cost of a full build stops scaling with the monorepo’s size and starts scaling with the size of the change.

101.4 Incremental builds and the content-addressable store

Incrementality is the property that changing one file only rebuilds the actions that transitively depend on it. Every modern build system does this, but the quality varies by orders of magnitude. The gold standard is action-level incrementality based on content hashes, not filesystem mtimes.

The model: every file in the source tree is hashed. Every action’s cache key is derived from the hashes of its declared inputs. When a file changes, its hash changes. Every action that depends on it gets a new cache key and must be re-run. Every action that does not depend on it keeps its old cache key and its cached output is reused. The walk is linear in the size of the diff, not linear in the size of the repo.

Filesystem mtime-based systems (Make, old go build) are fast but unreliable. An mtime can be forged by touch; it does not change when a file’s content is re-written to the same bytes; a git checkout that updates mtimes to “now” invalidates everything. Content-hash-based systems (Bazel, Pants, Buck) are slightly slower to walk the tree but immune to these failures.

# Bazel's action cache key is roughly:
action_key = SHA256(
    command_line +
    sorted(hash(input_file) for input_file in declared_inputs) +
    environment_variables +
    execution_platform_properties
)

Any change to any component invalidates the key. The on-disk store maps action_key -> output_files and output_file_hash -> output_file_content. The same content is never stored twice. This is the same CAS model used by Git, and it composes with remote caches — a laptop’s local CAS and a remote build cache share the same keyspace.

The practical effect: an engineer edits one line in one file, runs bazel test //..., and only the targets that transitively import the changed file are re-built and re-tested. In a large monorepo this is usually a handful of targets, and the test run takes seconds.

101.5 Bazel and the `BUILD` file model

Bazel is Google’s internal build system (Blaze) open-sourced in 2015. It is the most mature, most feature-complete, and most operationally painful of the options. Its conceptual model is simple: every directory that produces outputs has a BUILD (or BUILD.bazel) file that declares targets. A target is a named unit with inputs, an action, and outputs. Dependencies between targets form the DAG.

A minimal BUILD file for a Go binary:

load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library")

go_library(
    name = "server_lib",
    srcs = ["main.go", "handler.go"],
    importpath = "example.com/service/server",
    deps = [
        "//pkg/config:config",
        "//pkg/logging:logging",
        "@com_github_gorilla_mux//:mux",
    ],
)

go_binary(
    name = "server",
    embed = [":server_lib"],
    pure = "on",        # forces CGO_ENABLED=0
    static = "on",      # static linking
)

Every dependency is declared. deps lists Bazel labels — either internal (//pkg/config:config is “the config target in pkg/config/BUILD”) or external (@com_github_gorilla_mux//:mux is a target in a repository rule that fetches and wraps an external Go module). There is no implicit GOPATH lookup. The dependency graph is exactly what the BUILD files declare, nothing more.

The pain is that you have to write the BUILD files. For a repo with thousands of directories this is enormous tedium, and Bazel users rely on generators — gazelle for Go, rules_python’s generator for Python, bazel-ide tooling for various languages — to produce the files automatically from imports. The generators are their own source of footguns. BUILD files drift from the source code; the build breaks mysteriously; an engineer who does not understand the generator has to learn two tools to fix one problem. This is the tax of the model.

The payoff is that once the BUILD files exist, everything works. Cross-language builds (a Go service that embeds a Python ML model via cgo wrappers, or a Java client that depends on a proto generated by a Go service) are expressed in one graph. Remote caching is transparent. Incrementality is correct. Build outputs are hermetic. Bazel is hard to learn and harder to maintain, but past a certain codebase size, every alternative is worse.

101.6 Pants and Buck as alternatives

Pants v2 (from the Pants team, a rewrite in Rust) is Bazel’s friendlier cousin. It targets Python-heavy polyglot monorepos — companies where most code is Python with some Go and some Java. Its key differentiator is that it does not require hand-written BUILD files for every directory. It infers dependencies from import statements (import foo.bar implies a dep on the target that defines foo.bar) and only requires manual declarations at module boundaries. The learning curve is much shallower than Bazel’s; a team can adopt Pants in a few days where Bazel takes weeks.

Pants v2 is written on the same remote execution API as Bazel, so the same remote caches work for both. The action model is the same. The difference is ergonomic: Pants’ inference engine makes the common case (“I added a new Python file to an existing package”) cost zero configuration, where Bazel requires editing the BUILD file and re-running gazelle. For Python-centric shops this is enormous.

Buck2 is Meta’s rewrite of their internal Buck build system, open-sourced in 2023. It is written in Rust (Pants v2 uses a Rust core for its scheduler, though its rules layer is Python), and is designed for extreme scale — Meta’s monorepo has tens of millions of targets. Buck2’s conceptual model is closer to Bazel’s (explicit targets in BUCK files) but its rule language is Starlark with a stricter execution model, and its dependency graph is more aggressively parallel. Buck2 is fast. On benchmarks it beats Bazel by meaningful margins on cold builds and by dramatic margins on warm builds.

The tradeoff across the three: Bazel has the biggest ecosystem (rules_go, rules_python, rules_docker, rules_oci, rules_scala, everything) and the most Stack Overflow answers. Pants has the best ergonomics for Python. Buck2 is the fastest but has the thinnest ecosystem outside Meta. For a new monorepo today, the honest answer is “Pants if you are Python-heavy; Bazel if you are polyglot and can pay the tax; Buck2 if you are operating at a scale where the build time matters more than the ecosystem.”

101.7 Nx for JavaScript monorepos

Nx (from Nrwl) is the JavaScript-native monorepo build system. It does not try to be polyglot. It is designed for TypeScript/JavaScript workspaces — React, Angular, Node services, Next.js apps — and it gets the JavaScript story right in a way the multi-language tools do not. Turborepo (from Vercel) is the lighter-weight competitor; it does less but is simpler to adopt.

Nx’s model uses the package.json / tsconfig.json ecosystem but adds a project graph, a task runner, and a remote cache. A project.json file in each package declares targets (build, test, lint, serve) and their inputs. Nx computes which targets are affected by a change (nx affected --target=test) and runs only those, in parallel, with caching. On a typical web monorepo with dozens of apps and libraries, this takes a 10-minute CI run down to under a minute on a warm cache.

{
  "name": "web-app",
  "targets": {
    "build": {
      "executor": "@nx/webpack:webpack",
      "dependsOn": ["^build"],
      "inputs": ["production", "^production"],
      "outputs": ["{workspaceRoot}/dist/{projectRoot}"]
    }
  }
}

^build means “build all dependencies first.” inputs declares what files affect this target’s cache key. outputs declares what to cache. Nx handles the dependency graph automatically from import statements and package.json dependencies, so most of the time a developer does not touch these files.

Nx Cloud is the hosted remote cache. It works out of the box with a token; CI populates it, developers consume it. The first nx build after git pull downloads pre-built outputs from Nx Cloud for every library that did not change. For a JavaScript monorepo this is usually “everything except the thing you just edited.”

Nx is not as powerful as Bazel (no hermeticity, no remote execution, JavaScript only), but it is dramatically cheaper to adopt and maintain. For a team that will never have more than one language family, it is the right choice.

101.8 Cross-language builds

The reason Bazel and Pants exist, not Nx, is cross-language builds. A real production stack has: Go services, Python ML code, TypeScript frontends, protobuf schemas, Dockerfiles, Helm charts, Terraform, and half a dozen code generators gluing them together. Each of these has a native build tool. Orchestrating them together is where things break.

The classical approach is a Makefile with make build calling go build, pytest, tsc, protoc, docker build in sequence. This works until it doesn’t. The failures: protoc runs every time even when nothing changed; docker build rebuilds because a source file’s mtime updated; Python tests run on the wrong Python version because CI and laptops have different pyenv states; a generated .pb.go file is out of sync with the .proto it was generated from, and the Go build fails mysteriously.

Bazel’s answer is a single dependency graph across all languages. A proto_library target feeds a go_proto_library target feeds a go_library target feeds a go_binary target feeds an oci_image target (via rules_oci, see Chapter 106). Every step is hermetic. Every step is cached. Changing a .proto file invalidates exactly the targets that transitively depend on it, in every language. The Python library that also consumes the proto gets re-generated; the Go service that uses it gets re-built; the Docker image that packages the Go service gets re-layered; everything else is untouched.

proto_library(
    name = "inference_proto",
    srcs = ["inference.proto"],
    deps = ["@com_google_protobuf//:timestamp_proto"],
)

go_proto_library(
    name = "inference_go_proto",
    compilers = ["@io_bazel_rules_go//proto:go_grpc"],
    importpath = "example.com/api/inference",
    proto = ":inference_proto",
)

py_proto_library(
    name = "inference_py_proto",
    deps = [":inference_proto"],
)

One .proto file, two language bindings, one dependency graph, one cache. No Makefile. This is the core argument for Bazel: if you are paying the cost of multiple languages that share schemas, you may as well put them in one build graph.

101.9 The distroless static + CGO-off rule for Go

A Go-specific rule that matters for production. Every Go binary destined for a container should be built with CGO_ENABLED=0 and linked statically, and packaged in a distroless base image (or scratch). This is not optional; it is the standard.

The reasons. Go binaries by default dynamically link against glibc via cgo for net (hostname resolution) and os/user. A glibc-linked binary depends on the glibc in the container’s root filesystem. If you package it in Alpine (which uses musl libc), it crashes at runtime with obscure dynamic loader errors. If you package it in a stripped-down image without glibc at all, it crashes immediately. The fix is to build with CGO_ENABLED=0, which makes Go use pure-Go implementations of the networking and user stdlib and produces a fully static binary.

CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o server ./cmd/server

In Bazel, the go_binary rule handles it declaratively:

go_binary(
    name = "server",
    embed = [":server_lib"],
    pure = "on",       # CGO_ENABLED=0
    static = "on",     # fully static link
)

Once the binary is static, it has zero runtime dependencies. You can drop it into gcr.io/distroless/static-debian12 (which has nothing but /etc/ssl/certs and a few bytes of ca-certificates) or into scratch (which has literally nothing). The resulting image is 10-30 MB instead of the 200-500 MB of a Debian base, and it has essentially no attack surface — no shell, no package manager, no libc, nothing for a malicious payload to exploit.

FROM golang:1.22 AS build
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /out/server ./cmd/server

FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=build /out/server /server
USER nonroot
ENTRYPOINT ["/server"]

This is the standard. Chapter 102 covers why distroless and why nonroot; Chapter 106 covers how the image gets pushed and pinned by digest. For now: CGO off, static link, distroless base. Memorize it.

101.10 When a build system is not worth the cost

Adopting Bazel or Pants is a multi-week project and an ongoing operational burden. It is not always worth it. The honest evaluation:

You need a proper build system when: the monorepo has more than ~50 packages; builds take longer than developers’ patience (>5 minutes for a warm build); CI is dominated by rebuilding things that didn’t change; multiple languages share schemas or code generators; “it builds on my laptop but not on CI” is a recurring problem; you need byte-identical builds for supply-chain provenance (SLSA, in-toto).

You do not need a proper build system when: there are under ~20 packages; native tools’ incrementality is good enough; builds take under a minute; you have only one language; nobody is complaining about build speed. In this case, adopting Bazel is a distraction that costs engineering months for no measurable benefit. Stick with go build ./..., or pnpm workspaces, or pytest directly.

The middle ground — 20-50 packages, mild build pain — is where Nx and Turborepo (for JS) and Pants v2 (for Python) shine. Cheap to adopt, most of the benefit, a fraction of the tax. The Bazel tax is only justified at the high end of scale or at the high end of correctness requirements.

The failure mode to avoid is adopting Bazel too early. A team of 10 with 30 packages adopts Bazel because a principal engineer likes it. Six months later they have 2,000 lines of Starlark, a full-time maintainer, and the same build times they had before. The decision should be driven by actual pain, not aspiration.

101.11 The mental model

Eight points to take into Chapter 102:

Native language build tools collapse at monorepo scale. They are not hermetic, not shared, not incremental at the action level.
Hermeticity is the foundation. Same inputs, same action, same output, on any machine. Everything else follows.
Remote caching turns the dependency graph into a shared artifact. CI populates, developers consume.
Content-hashed action keys, not mtimes. Correctness at the cost of slightly slower tree walks.
Bazel, Pants, Buck2 are the polyglot options. Pick Pants for Python-heavy, Bazel for maximum ecosystem, Buck2 for speed.
Nx and Turborepo are the JavaScript options. Cheap to adopt, most of the benefit, single language only.
Cross-language builds are the Bazel-class use case. One dependency graph across Go, Python, TS, protobuf, containers.
Go binaries in containers are always CGO=0, static, distroless. Memorize this rule; you will reach for it constantly.

In Chapter 102 the focus shifts from how the artifact is built to what the artifact actually is — the container image and the kernel primitives underneath it.

Read it yourself

Bazel documentation, especially the sections on “Build encyclopedia” and “Remote execution.” bazel.build/docs.
Pants v2 documentation, “Concepts” and “Remote caching.” pantsbuild.org.
Meta Engineering, Introducing Buck2, our open source build system (2023). Explains the design rationale and benchmarks vs Buck1.
Google’s Software Engineering at Google (Winters, Manshreck, Wright), chapter “Build Systems and Build Philosophy.” The Blaze/Bazel origin story.
Nrwl’s Effective Nx documentation on nx.dev, particularly the “Task pipeline configuration” section.
The rules_go and rules_oci Bazel rule repositories for concrete examples of Go builds producing OCI images.

Practice

In a small Go monorepo (3-5 packages), time a full rebuild with go build ./... twice in a row. Why is the second one still slow? What part of the work is re-done unnecessarily?
Write a BUILD.bazel file for a Go binary that depends on one internal library and one external module. Explain each field.
Explain hermeticity to a skeptical engineer in two sentences. Give one concrete example of a non-hermetic build failure.
A team has 15 engineers and a monorepo with 40 Python packages. Full pytest takes 8 minutes. Should they adopt Bazel? Argue both sides.
Write a multi-stage Dockerfile that produces a CGO_ENABLED=0 static Go binary in a distroless/static image. Measure the final image size.
Compute the cache hit rate needed for a remote cache to save more engineer-hours than it costs to operate, assuming a 10-engineer team, a 5-minute build, and a $500/month cache service. State your assumptions.
Stretch: Set up a minimal BuildBuddy or NativeLink remote cache locally. Configure a Bazel workspace to use it. Run a clean build twice from two different directories; verify that the second run pulls actions from the cache instead of re-running them.