Chapter 104: API contract design: OpenAPI vs gRPC vs GraphQL vs Connect

APIs are the surface where teams meet. A bad API is hard to use, hard to evolve, and hard to retire. A good one survives a decade of product pivots with nothing more than additive changes. The choice of protocol — REST with OpenAPI, gRPC, GraphQL, or Connect — determines most of the downstream ergonomics. And the choice of contract-first versus code-first determines whether the contract actually acts like one.

By the end of this chapter the reader can place any API somewhere in the protocol matrix, explain why public APIs reach for REST, service-to-service reaches for gRPC, client-driven aggregation reaches for GraphQL, and the new middle ground reaches for Connect. The reader also knows the breaking-change discipline that separates APIs that live for a decade from APIs that get deprecated in two years.

Outline:

Contract-first vs code-first — the organizational split.
REST with OpenAPI — the lingua franca for public APIs.
gRPC and protobuf — the service-to-service default.
GraphQL — the client-driven aggregation model.
Connect — gRPC semantics over plain HTTP.
Code generation pipelines — oapi-codegen, buf, protoc, graphql-codegen.
Schema registries and contract testing.
Breaking-change discipline.
Versioning strategies — URL, header, schema evolution.
Picking a protocol — decision matrix.
The mental model.

104.1 Contract-first vs code-first

The single biggest design decision in API work is whether the contract or the code comes first. Contract-first means writing the schema (OpenAPI YAML, proto file, GraphQL SDL) and generating the server stubs and client SDKs from it. Code-first means writing the server in Python/Go/Java annotations and generating the contract as an artifact of the code.

Code-first is faster for the person writing the server. You write a FastAPI handler with type hints and Pydantic models, and FastAPI emits the OpenAPI spec automatically. You write a Flask-RESTX endpoint with decorators, same result. The contract appears as a byproduct, and for a team of one shipping fast this is great.

Contract-first is better for teams. Writing the schema first forces you to think about the API as an interface before thinking about the implementation. It surfaces design issues early — “wait, we have two different shapes of error object in this service” or “the status enum has three spellings depending on endpoint.” The schema becomes the single source of truth that clients, servers, and tests all generate from. Consumers can start building against the schema before the server exists; mocks can be generated automatically; breaking changes are visible in schema diffs.

The organizational insight is that the contract is an interface between teams. If team A owns the producer and team B owns the consumer, the contract is what they agree on. Generating it from A’s code means B is at the mercy of whatever A’s annotations produce — and B discovers changes when their SDK regeneration breaks. Writing it first means both teams review the schema as a design artifact, and any change to it is a scheduled, reviewed event.

graph LR
  Schema[Contract / Schema<br/>OpenAPI · proto · SDL] -->|codegen| ServerStub[Server stub<br/>Go / Python / Java]
  Schema -->|codegen| Client[Client SDK<br/>TypeScript / Go]
  Schema -->|codegen| Docs[Interactive docs<br/>Swagger UI / Redoc]
  Schema -->|codegen| Mocks[Mock server<br/>for consumers]
  style Schema fill:var(--fig-accent-soft),stroke:var(--fig-accent)

In contract-first design, the schema is the single source of truth — servers, clients, docs, and mocks are all derived artifacts, so a breaking change is visible as a schema diff before any code changes.

The rule of thumb: for internal single-team services, code-first is fine. For public APIs, cross-team APIs, or any API with external consumers, contract-first is the right default. The coordination cost pays for itself many times over.

104.2 REST with OpenAPI — the lingua franca for public APIs

REST over JSON is still the default for public-facing APIs. It has universal client support (curl, every HTTP library in every language, browser fetch), caches at every layer of the HTTP stack, works with every proxy and gateway, and is debuggable from the command line. OpenAPI (formerly Swagger) is the standard schema language for it.

An OpenAPI 3.1 spec looks like:

openapi: 3.1.0
info:
  title: Billing API
  version: 2.4.0
paths:
  /v1/invoices/{id}:
    get:
      operationId: getInvoice
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Invoice'
        '404':
          $ref: '#/components/responses/NotFound'
components:
  schemas:
    Invoice:
      type: object
      required: [id, amount_cents, currency, status]
      properties:
        id: { type: string }
        amount_cents: { type: integer, minimum: 0 }
        currency: { type: string, enum: [USD, EUR, GBP] }
        status: { type: string, enum: [draft, open, paid, void] }
        created_at: { type: string, format: date-time }

The spec is a YAML (or JSON) file. Tools consume it. oapi-codegen generates Go server stubs and clients. openapi-generator generates clients for 50+ languages. Swagger UI and Redoc render it as interactive HTML docs. Postman imports it. Stainless, Fern, and Speakeasy generate polished SDKs for enterprise customers. The ecosystem is deep.

The weaknesses of REST-over-JSON are well known. JSON is verbose compared to binary formats. There is no streaming story natively; you have to bolt on Server-Sent Events or WebSockets or long polling. Request/response semantics are stateless, which makes multi-step workflows awkward. The HTTP verb mapping to actions (POST /users vs PUT /users/123) is a convention, not a guarantee, and different teams disagree on the details (should partial update be PATCH or POST?).

But for external APIs, the tradeoffs still favor REST. Clients don’t need a special runtime. Debugging is curl -v. Caching is Cache-Control: max-age=300. Rate-limiting is a proxy rule. For the long tail of non-trivial consumers — third-party developers, mobile clients on shaky networks, browser clients, command-line tools — nothing beats it.

104.3 gRPC and protobuf — the service-to-service default

gRPC is the modern default for service-to-service communication inside a cluster. Protocol Buffers (protobuf) define the schema and wire format, gRPC is the RPC framework on top of HTTP/2, and the combination is fast, well-typed, and has excellent code generation for every mainstream language.

A .proto file:

syntax = "proto3";
package billing.v1;

import "google/protobuf/timestamp.proto";

service BillingService {
  rpc GetInvoice(GetInvoiceRequest) returns (Invoice);
  rpc ListInvoices(ListInvoicesRequest) returns (ListInvoicesResponse);
  rpc StreamInvoiceUpdates(StreamInvoiceUpdatesRequest) returns (stream InvoiceUpdate);
}

message Invoice {
  string id = 1;
  int64 amount_cents = 2;
  string currency = 3;
  Status status = 4;
  google.protobuf.Timestamp created_at = 5;

  enum Status {
    STATUS_UNSPECIFIED = 0;
    STATUS_DRAFT = 1;
    STATUS_OPEN = 2;
    STATUS_PAID = 3;
    STATUS_VOID = 4;
  }
}

protoc with the appropriate language plugins generates server interfaces, client stubs, and message types. A Go server implements the generated BillingServiceServer interface; a client calls client.GetInvoice(ctx, &pb.GetInvoiceRequest{Id: "..."}). The wire format is binary protobuf, ~5-10× smaller than equivalent JSON and dramatically faster to serialize.

gRPC’s strengths for internal APIs: strong typing across languages (Python client, Go server, no manual contract syncing), streaming (server streaming, client streaming, bidi streaming), deadlines and cancellation as first-class concepts, HTTP/2 multiplexing so many concurrent RPCs share one connection, and mature load-balancing and service discovery integrations. For service meshes (Istio, Linkerd, see Chapter 109) gRPC is the lingua franca.

The weaknesses. Browsers cannot natively speak gRPC (they don’t let JavaScript control HTTP/2 trailers, which gRPC uses for status). grpc-web is a workaround but adds a proxy. Debugging is harder — curl doesn’t work; you use grpcurl which requires the proto definitions. Public APIs rarely use gRPC for these reasons: the ergonomics for external consumers are worse than REST.

buf (bufbuild.com) is the modern protobuf toolchain. It replaces protoc with a faster, better-designed CLI, adds lint rules (buf lint) and breaking-change detection (buf breaking), and integrates with the Buf Schema Registry for distribution. Any new proto codebase should start with buf, not raw protoc.

104.4 GraphQL — the client-driven aggregation model

GraphQL solves a specific problem: “the client needs data from multiple backend services and each backend exposes its own REST API, and the client ends up making N+1 requests to assemble one screen.” Instead of the backend deciding what the endpoint returns, the client sends a query describing exactly what fields it wants, and the GraphQL server fetches and assembles the response.

query GetOrderScreen($orderId: ID!) {
  order(id: $orderId) {
    id
    status
    total_cents
    customer {
      name
      email
      loyalty_tier
    }
    items {
      product_name
      quantity
      price_cents
    }
    shipping_address {
      street
      city
      country
    }
  }
}

The server resolves each field by calling whichever backend owns it. The client gets exactly what it asked for in one round trip.

GraphQL shines in two scenarios. Client-driven aggregation: a mobile app or web UI needs data from a dozen microservices and wants one API to hit. A GraphQL gateway in front of REST or gRPC services provides exactly that. Facebook (who invented GraphQL) uses it for exactly this reason — the iOS and Android apps have wildly different data needs and GraphQL lets each decide. Schema-first exploration: a team building internal tools can point a GraphQL client at the API and explore the schema with autocomplete, which is faster than reading docs.

GraphQL also has real drawbacks. Caching is hard — HTTP caching works on URLs, but every GraphQL query is a POST to /graphql, so it looks the same to a cache. Solutions exist (Apollo Client’s normalized cache, persisted queries with deterministic IDs) but add complexity. The N+1 problem shows up on the server instead of the client — naive resolvers call the database once per field per item. Solutions (DataLoader, the @defer directive) add more complexity. Authorization is harder because any query might touch any field; you can’t just slap auth middleware on /users.

The pattern that works: GraphQL at the edge for client aggregation, gRPC or REST behind it for the actual services. The GraphQL layer is a thin translator, not the system of record. This is the “BFF” (Backend for Frontend) pattern, and it is the common-sense home for GraphQL in 2025.

104.5 Connect — gRPC semantics over plain HTTP

Connect is the Buf team’s protocol that solves the browser-gRPC problem elegantly. It uses protobuf for schemas (same .proto files as gRPC), supports unary and streaming RPCs, but speaks plain HTTP/1.1 and HTTP/2 without HTTP/2 trailers. A Connect endpoint is a POST to /package.Service/Method with the protobuf (or JSON) body, and the response is a status code plus the protobuf result.

The result: a Connect server can accept traffic from gRPC clients, Connect clients, or even curl (with JSON). The same server, same proto, no separate grpc-web proxy. Browsers just call the Connect endpoint with fetch.

// TypeScript client
import { createPromiseClient } from "@connectrpc/connect";
import { createConnectTransport } from "@connectrpc/connect-web";
import { BillingService } from "./gen/billing/v1/billing_connect";

const client = createPromiseClient(BillingService, createConnectTransport({
  baseUrl: "https://api.example.com",
}));

const invoice = await client.getInvoice({ id: "inv_123" });

// Go server
import (
    "connectrpc.com/connect"
    billingv1 "example.com/gen/billing/v1"
    "example.com/gen/billing/v1/billingv1connect"
)

type BillingServer struct{}

func (s *BillingServer) GetInvoice(
    ctx context.Context,
    req *connect.Request[billingv1.GetInvoiceRequest],
) (*connect.Response[billingv1.Invoice], error) {
    // ...
}

mux := http.NewServeMux()
path, handler := billingv1connect.NewBillingServiceHandler(&BillingServer{})
mux.Handle(path, handler)
http.ListenAndServe(":8080", mux)

The same server handles gRPC clients (from Go services), Connect clients (from TypeScript frontends), and JSON-over-HTTP clients (from curl or Python scripts). This is genuinely new — before Connect, you had to pick one of “nice browser experience” or “nice service-to-service experience” and accept the other being worse.

Connect is still newer than gRPC (released late 2022) but is mature enough for production. Adopt it for any new API where the consumer set includes browsers and backend services. It does not replace REST+OpenAPI for cases where you need a wide ecosystem of third-party tools; OpenAPI still wins there.

104.6 Code generation pipelines

The value of contract-first depends on the code-generation pipeline. Broken generators produce more friction than writing the code by hand. The good pipelines are:

oapi-codegen (for OpenAPI + Go). Takes an OpenAPI YAML, produces Go types and server/client stubs. Supports several router flavors (chi, echo, gin, stdlib). Stable and widely used.

openapi-generator (for OpenAPI + anything). Java-based, generates clients and servers for dozens of languages. Quality varies by language (Go and TypeScript are good, Python and Rust are hit-or-miss). Good enough for generating client SDKs for public APIs.

buf (for protobuf + anything). buf generate reads a buf.gen.yaml that declares which plugins to run and writes the output. Replaces the old protoc --go_out=... command lines. Integrates with buf.build for remote plugins (no local install needed) and remote generation. The modern proto toolchain.

graphql-codegen (for GraphQL + TypeScript / Java / Kotlin / etc). Reads the schema and queries, produces typed client code. The TypeScript React plugin is the killer feature — typed hooks generated per query.

The discipline: generated code goes in a gen/ directory, is checked in (or regenerated in CI), and is never hand-edited. The generation step is a pre-commit or CI hook that fails if the committed output is out of sync with the schema. This is the same pattern as Wire’s wire_gen.go (Chapter 103).

# buf.gen.yaml
version: v2
plugins:
  - remote: buf.build/protocolbuffers/go
    out: gen/go
    opt: paths=source_relative
  - remote: buf.build/grpc/go
    out: gen/go
    opt: paths=source_relative
  - remote: buf.build/connectrpc/go
    out: gen/go
    opt: paths=source_relative
  - remote: buf.build/bufbuild/es
    out: gen/ts

One command, buf generate, produces Go, TypeScript, and whatever else. The pipeline is the contract’s operational backbone — when it breaks, development stops.

104.7 Schema registries and contract testing

For larger organizations, a schema registry becomes valuable. The registry stores every version of every schema (proto, OpenAPI, GraphQL), enforces backward-compatibility checks on PRs, and serves as the single source of truth that clients pull from. Buf Schema Registry does this for protobuf with remote generation. Apollo Studio / Hasura Cloud do it for GraphQL with schema change tracking. Stoplight and SwaggerHub do it for OpenAPI.

The value is organizational. Without a registry, every repo has its own copy of the schema; drift is inevitable; breaking changes slip through review because no one runs the check. With a registry, the schema is centralized, breaking-change checks run automatically, and every consumer knows where to get the latest version.

Contract testing is the complement: tests that verify a producer and consumer agree on the schema. Pact is the canonical tool. The consumer team writes tests that say “when I call GET /users/1, I expect a JSON with an id field and a name field.” The producer team runs those tests against their implementation in CI. If the producer changes the response shape, the consumer’s tests fail in the producer’s pipeline, before deploy. This flips the usual integration-test polarity — consumers define the contract, producers verify it.

Contract testing is especially valuable in microservice architectures where integration tests are expensive. Instead of standing up the whole system to test one interaction, each service tests against contracts. The downside is the operational overhead of maintaining the contracts and the tooling; for small teams it is often overkill, for large teams it is the only way to keep a mesh of microservices from constantly breaking each other.

104.8 Breaking-change discipline

Once an API has consumers, breaking changes are expensive. A breaking change means every consumer has to update their SDK, redeploy, and handle rollout coordination. If the consumer is external, it means customer communication, migration guides, and deprecation timelines. The discipline of “additive changes only” is what keeps an API usable for years.

The rules, across all three protocols:

Additive changes never break existing clients; any removal, rename, or type change breaks them — automate enforcement with buf breaking or openapi-diff in CI to catch violations before review.

Adding fields is safe. New optional fields don’t break existing clients. In protobuf, adding a new field with a new tag number is always safe. In OpenAPI, adding an optional property to a response is safe. In GraphQL, adding a new field to a type is safe.

Removing fields is breaking. Clients may be reading the field. Even if “nobody uses it,” some client will. Do not remove; deprecate and wait.

Changing the type of a field is breaking. string → int breaks every deserializer. Status enum changing from string to integer breaks every consumer.

Changing the semantics of a field is breaking. The field is still called amount_cents but now it’s “amount after tax” instead of “amount before tax.” The types still parse; the behavior is wrong. These are the worst kind of breaking changes because they show up as subtle bugs in production, not compile errors.

Renaming fields is breaking. Use the old name forever, or version the API.

Reordering parameters is breaking. Obvious for positional parameters; less obvious for proto tag numbers (which are positional). Never reuse or change a tag number.

In protobuf, the reserved keyword prevents accidentally reusing a removed field:

message Invoice {
  reserved 5, 6;
  reserved "old_field_name";
  string id = 1;
  // new fields start at 7
}

buf breaking enforces all of these rules automatically in CI. For OpenAPI, openapi-diff does similar checks. For GraphQL, Apollo’s schema check integrates with CI. Automate the enforcement; human review alone is not enough.

104.9 Versioning strategies

When a breaking change is actually needed, you version. The options:

URL-versioned: /v1/invoices, /v2/invoices. Simple, obvious, widely used. Google, Stripe, most big public APIs. The versions exist side-by-side; old clients keep working on /v1; new clients use /v2. Deprecation happens on a timeline (Stripe famously never removes old versions, just bills them as legacy).

Header-versioned: Accept: application/vnd.example.v2+json. More “REST-pure” but worse ergonomics — you can’t curl it without remembering the header syntax. Rare in practice.

Schema-evolution: protobuf’s model. No explicit version; the schema evolves additively forever, and breaking changes require a new message type (InvoiceV2) or a new service. Works for internal APIs where you control both ends. Doesn’t work for public APIs where clients update on their own schedule.

Date-versioned: Stripe’s approach. Every API version has a date (2024-11-01). Clients pin a date in a header; the server returns the response shape for that date. Works because Stripe has a giant compatibility layer internally. Overkill for most.

The honest recommendation: URL versioning for public APIs, schema evolution for internal APIs, never mix the two. And above all, version rarely. Every version doubles your maintenance burden. Before going from /v1 to /v2, ask whether the new shape can be expressed as additive changes to /v1. Most of the time it can.

104.10 Picking a protocol

A pragmatic decision matrix:

Protocol selection reduces to two questions: who is the consumer and does the API need to be externally discoverable — REST+OpenAPI wins for external consumers, gRPC for internal services, Connect when both coexist.

Consumer	Producer	Pick
Browser (JS/TS)	Internal service	Connect
Browser	Aggregation of many services	GraphQL BFF + gRPC/REST behind
Mobile app	Internal service	gRPC or Connect
Third-party developers	Public API	REST + OpenAPI
Internal service	Internal service	gRPC
CLI tool	Any API	REST + OpenAPI
Webhooks (producer)	Anyone	REST + OpenAPI
Real-time streaming	Anyone	gRPC streaming or WebSocket
Batch jobs / ETL	Any API	REST + OpenAPI (easy to script)

The first dimension is the consumer: who is calling this? The second is the producer: what kind of service is exposing this? The answer usually falls out immediately. The hard cases are when there are multiple consumer types — a service that is called by both a browser and other backends. That is where Connect shines; before Connect, you would have built two APIs (REST for browsers, gRPC for services) or one compromise (REST for both, giving up the service-to-service ergonomics).

Do not over-index on performance benchmarks. “gRPC is 5x faster than REST” is true on a microbenchmark and irrelevant for most workloads, where the network and database dominate. Pick the protocol that makes the team productive, not the one that wins a synthetic JSON-vs-protobuf race.

104.11 The mental model

Eight points to take into Chapter 105:

Contract-first for shared APIs, code-first for internal quick services. The contract is an interface between teams.
REST + OpenAPI is still the default for public APIs. Universal tooling, debuggable with curl, maximum ecosystem.
gRPC + protobuf for service-to-service. Fast, typed, streaming, great codegen.
GraphQL for client-driven aggregation. A BFF in front of microservices, not a replacement for them.
Connect bridges browsers and backends. Same proto, same service, unary + streaming, no grpc-web.
buf is the modern proto toolchain. Lint, breaking-change checks, remote generation.
Breaking changes are forever. Add, never remove; reserved in proto; buf breaking in CI.
Version rarely. Additive changes cover 90% of the cases; URL versioning handles the rest.

Chapter 105 goes down one more level to how services are built with modern Python tooling — uv, ruff, and black — since Python is half of every ML stack.

Read it yourself

The OpenAPI Specification (v3.1) at spec.openapis.org. The canonical reference, dense but definitive.
gRPC: Up and Running by Kasun Indrasiri and Danesh Kuruppu (O’Reilly, 2020). A practical walkthrough of gRPC patterns.
Buf’s documentation, especially “Breaking change detection” and “Best practices” (buf.build/docs).
Marc-Andre Giroux, Production Ready GraphQL. Focuses on the hard operational parts of GraphQL.
The Connect protocol documentation (connectrpc.com/docs). Short, readable, and explains the design rationale.
Stripe’s API documentation and changelog. A real example of how a decade-old public API evolves without breaking.

Practice

Write an OpenAPI 3.1 spec for a trivial “todo list” API with GET /todos, POST /todos, GET /todos/{id}, PATCH /todos/{id}, DELETE /todos/{id}. Generate a Go server stub with oapi-codegen.
Define the same API in protobuf. Generate Go and TypeScript clients with buf generate.
Name three OpenAPI changes that are safe and three that are breaking.
A frontend team complains that a list view needs 12 API calls per screen. What would you propose? Why?
You want to expose a gRPC service to a browser. Compare grpc-web vs Connect. Why is Connect better?
A proto message has reserved 3, 4, 5; reserved "old_name";. What does this mean and why?
Stretch: Set up buf breaking in a CI pipeline for a toy proto repo. Make a breaking change and verify CI fails. Make a non-breaking change and verify CI passes.