Chapter 81: Inter-service trust patterns

Every request inside a platform crosses trust boundaries. A user request hits the edge, gets authenticated, is routed to a gateway, passed to an application, which then calls three internal services, which each read from a database and maybe a blob store, and somewhere in there a token has to prove “this call is allowed.” The naive answer — “we’re all on the same VPC, just trust the network” — was the default in 2010 and is indefensible in 2026. The mature answer is a layered system where every hop verifies identity, every call is authenticated, and the blast radius of any compromised component is bounded.

After this chapter the trust model for a modern ML platform is clear. The five primary patterns — mTLS, signed URLs, JWT chains, API gateway as trust boundary, Kubernetes NetworkPolicies — and how they compose into defense-in-depth. The spectrum from “trust the network” (the old model) to zero trust (the new model) and where most real platforms sit on it. This chapter builds on the AuthN/AuthZ split from Chapter 74 and the identity-propagation patterns from Chapter 75.

Outline:

The trust spectrum from perimeter to zero trust.
mTLS and the service identity problem.
Signed URLs, the S3 pattern.
JWT chains and token exchange.
API gateways as the trust boundary.
Kubernetes NetworkPolicies as the network-layer lever.
Secrets at rest and in transit.
Defense in depth and what it actually means.
A reference trust architecture.
The mental model.

81.1 The trust spectrum from perimeter to zero trust

Every platform sits somewhere on a spectrum. At one end is perimeter security: a hard outer shell, a soft interior. Once you are inside the VPC, the firewall, or the private network, you can talk to anything. At the other end is zero trust: every call, even between two pods on the same node, is authenticated, authorized, encrypted, and audited.

Real platforms sit somewhere in between. The common pattern, especially for ML platforms that grew organically, is:

Hard perimeter: WAF, DDoS protection, rate limiting at the edge. Mature.
Authenticated API surface: every external request has a token. Mature.
Internal auth: services call each other with a service account or shared secret. Partial — many teams skip it.
mTLS between services: every internal call is encrypted and the caller’s identity is cryptographically verified. Increasingly common with service meshes.
Per-call authorization: “user X can call service Y method Z.” Rare.
Encrypted at rest, encrypted in transit, audit log everything: the full zero-trust picture. Rare outside of highly regulated industries.

The typical ML platform is in the first three stages. The AI platforms at regulated companies (healthcare, finance, government) are in stage four or five. The push to zero trust is driven by: cloud breaches, container compromises, insider threats, and the blunt observation that “assume the network is hostile” is cheaper than “try to keep the network clean.”

The zero-trust argument in one sentence: you will eventually have a compromised workload, and when that happens, you want the blast radius to be one service, not the whole platform. Every trust layer you add shrinks the blast radius.

Each ring independently limits the blast radius of a breach; an attacker who penetrates the outer ring must defeat four more to reach the data layer.

81.2 mTLS and the service identity problem

Mutual TLS is TLS where both sides present certificates. The server certificate proves the client is talking to the right service. The client certificate proves the server is talking to the right caller. Both sides verify the other’s cert against a trusted CA, and the handshake fails if either side cannot prove its identity.

This is the network-layer answer to “how do I know which service is calling me.” Without mTLS, the answer is either “the source IP” (easily spoofed, unreliable on Kubernetes where pod IPs change constantly) or “a header in the request” (easily spoofed if the network is not protected). With mTLS, the answer is “the certificate, which is signed by a CA I trust, and which I validated during the handshake.”

The deployment pattern: every service has a cert issued by an internal CA, rotated frequently (hours to days), with a short validity period. When service A calls service B, the TLS handshake includes A’s client cert; B verifies it, extracts the identity from the Subject Alternative Name or a custom SPIFFE ID, and can then authorize the call based on that identity.

The hard part is the operational machinery. Every service needs:

A cert that is unique to its identity.
A rotation mechanism that swaps certs before they expire with zero downtime.
A way to bootstrap the initial identity without chicken-and-egg problems.
A CA that is trusted by all services and whose compromise doesn’t kill everything.

Three common machineries:

Service mesh (Istio, Linkerd, Consul Connect). The mesh runs a sidecar (Envoy or a linkerd2-proxy) alongside every pod, intercepts all outbound and inbound traffic, and does mTLS transparently. The application code is unaware. The mesh handles cert issuance, rotation, identity, and enforcement. This is the easiest operationally for greenfield deployments but adds a sidecar to every pod (CPU, memory, latency).

SPIFFE / SPIRE. A standard for workload identity (the SPIFFE ID) and an implementation (SPIRE) that issues short-lived X.509 certs or JWT-SVIDs to every workload. Applications read their identity from a local socket and use it for mTLS. More code in the application but no sidecar.

Cloud-native options. AWS App Mesh, GCP Anthos Service Mesh, Azure Service Mesh. Managed versions of Istio/Envoy tied to the cloud provider. Less operational burden, more vendor lock-in.

The gotcha every team hits: mTLS works great in steady state but bootstrap is painful. The first deploy, the first cluster, the first new environment — all have a “how do I get my first cert without trusting something else first” moment. The answer is usually a trusted bootstrap channel (a K8s service account token → exchange for SPIFFE ID, or a cloud IAM role → exchange for internal identity).

81.3 Signed URLs, the S3 pattern

A signed URL is an HTTP URL with a query string that contains a cryptographic signature proving the bearer is authorized to perform a specific action on a specific resource for a limited time. The canonical form is the AWS S3 presigned URL:

The signed URL pattern keeps large data transfers off the application's network path; the application only mints a short-lived, narrowly scoped capability token.

https://my-bucket.s3.amazonaws.com/documents/abc.pdf
  ?X-Amz-Algorithm=AWS4-HMAC-SHA256
  &X-Amz-Credential=AKIA.../20260410/us-east-1/s3/aws4_request
  &X-Amz-Date=20260410T120000Z
  &X-Amz-Expires=3600
  &X-Amz-SignedHeaders=host
  &X-Amz-Signature=a7f4...

The pattern is: a service that holds credentials signs a URL with those credentials, hands the signed URL to a client, and the client uses the signed URL directly against the origin (S3) without ever seeing the service’s credentials. The signature encodes the allowed method, the resource, the expiration time, and optionally the request conditions.

Why this matters: it allows the client to do heavy data-plane work (downloading a 500 MB model, uploading a 5 GB training file) directly against the object store without proxying the bytes through the application. The application’s only role is to verify the caller is allowed and mint the signed URL. The bytes go point-to-point between client and S3.

The pattern generalizes beyond S3. Any system where a trusted authority mints capability tokens that clients can then use directly is a “signed URL” pattern. Examples:

S3 presigned URLs (the canonical case).
GCS signed URLs.
Azure Blob SAS tokens.
CloudFront signed URLs / signed cookies for CDN content.
HLS/DASH streaming tokens for video delivery.
Any internal system where “service X mints a short-lived token that grants permission to do Y on Z.”

The properties to preserve:

Short expiry. Minutes, not hours. If a signed URL leaks, the damage window should be small.
Narrow scope. One resource, one action. Not “read all of S3.”
Server-minted, never client-forged. The signing key lives only on the server.
No server round-trip at use time. The whole point is that the origin (S3) validates the signature without calling back to the minting service.

The failure mode: signed URLs end up in browser histories, server access logs, CI artifacts. Teams then extend the expiry (because “5 minutes is too short for the large files”) until the URL is effectively a bearer token for an hour. This is fine only if the origin does nothing else dangerous with the grant. The discipline is short expiry plus narrow scope.

81.4 JWT chains and token exchange

The JWT (RFC 7519) is the standard format for bearer tokens on the modern web. An external user authenticates, gets a JWT from the auth provider, and presents it on every request. The gateway verifies it and propagates something forward to internal services.

The question is: what does the gateway propagate? Three patterns, in increasing sophistication.

Pattern 1: forward the user’s JWT unchanged. The gateway verifies it, then just passes it on as-is in an Authorization header to the backend service. Each backend service independently verifies it against the same provider. Simple. The downside is every backend becomes a trust boundary for the user’s full-power token, and if one backend is compromised, the attacker has the user’s token.

Pattern 2: mint an internal JWT. The gateway verifies the user’s external token, then mints a new internal JWT signed with an internal key, containing claims like sub=user_id, tenant=org_id, scopes=..., exp=now+5min. Backend services only verify the internal JWT, not the external one. The user’s external token never crosses the gateway. This is cleaner and limits the scope of the external token to the edge.

Pattern 3: token exchange (RFC 8693). When service A wants to call service B on behalf of a user, it asks the auth server to exchange the user’s token for a new, narrower token scoped specifically to “service A calling service B on behalf of this user.” The auth server returns a short-lived, tightly scoped token that A can then present to B. B verifies the token and sees both the user identity and the calling service identity.

Token exchange is the rigorous answer to identity propagation (Chapter 75). It distinguishes three identities in one call: the end user, the calling service, and the target service. Each can be checked independently at the target. The audit trail is complete: “user U, via service A, called service B at 12:34.”

The downside of token exchange is latency and complexity. Every inter-service call potentially involves a round-trip to the auth server. The mitigation is caching exchanged tokens for their lifetime (typically minutes). The caching logic must be careful: expired tokens must not be used; revocation events must invalidate the cache.

For an ML platform, a reasonable default is pattern 2 (internal JWT) with selective use of pattern 3 for sensitive cross-service calls. Pattern 1 is the starter implementation; if you stay there past seed stage, it becomes a liability.

81.5 API gateways as the trust boundary

The API gateway (Chapter 73) is the hardest trust boundary in the system. Everything outside it is assumed hostile. Everything inside it is assumed less hostile but not friendly. The gateway’s job is to make “outside is hostile” concrete: authenticate, authorize, rate-limit, sanitize, transform, and emit an enriched request to the backend with a clean internal identity attached.

The gateway owns:

Authentication. Verify the caller’s credentials (API key, JWT, mTLS client cert, OIDC flow, etc.). Reject anything unverified before it reaches any application code.
Authorization (coarse). Check that the authenticated principal has permission to call this API at all. Fine-grained authorization (“can user U perform operation O on resource R”) usually lives in the application, but the gateway can handle endpoint-level allowlists.
Rate limiting and quota enforcement. Per-key, per-user, per-endpoint. See Chapter 76.
Input validation. Size limits, content-type checks, schema validation. Reject malformed inputs at the edge.
Sanitization. Strip headers the caller should not be able to set (X-Forwarded-*, X-User-Id, internal-only headers).
Identity enrichment. After authenticating, attach the canonical identity (user id, tenant id, scopes) as headers or as a signed internal JWT.
Observability. Every request gets a trace ID, request ID, and standard log line.

A critical point interviewers test on: the gateway must strip or overwrite every header that the application treats as trusted identity information. If the application reads X-User-Id from the incoming request, and the gateway does not strip it from untrusted callers, an attacker sends their own X-User-Id: admin header and gets admin. This is not hypothetical; it has happened at many real companies. The gateway is the ONLY component that gets to set identity headers. It must always overwrite them from the outside.

The second critical point: the gateway is a single point in the trust model, but there are usually multiple gateways in practice — a public edge gateway, an internal API gateway, maybe a regional replica, a mesh ingress. Each one has to be hardened. The trust model is only as strong as the weakest gateway.

81.6 Kubernetes NetworkPolicies

Kubernetes NetworkPolicy resources let you declare pod-level firewall rules. “Pods with label role=frontend may talk to pods with label role=backend on port 8080. Nothing else may talk to backend pods.” This is Layer 3/4 (IP and port level) and is enforced by the CNI plugin (Calico, Cilium, etc.) via iptables or eBPF.

Why it matters: a NetworkPolicy is the last-resort barrier when application-layer auth fails. If the gateway forgets to strip a header, if mTLS is misconfigured, if a service has a bug — the NetworkPolicy still enforces “only these pods can even reach this port.” It is defense in depth at the network layer.

A minimal example:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: inference-ingress
  namespace: ai-models
spec:
  podSelector:
    matchLabels:
      app: llama-3-70b
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ai-gateway
          podSelector:
            matchLabels:
              role: gateway
      ports:
        - protocol: TCP
          port: 8000

This says “pods labeled app: llama-3-70b accept ingress only from pods in the ai-gateway namespace labeled role: gateway, on port 8000.” Anything else is dropped by the CNI.

The operational reality is that most clusters do not have NetworkPolicies. They are tedious to maintain, easy to get wrong, and the enforcement is CNI-dependent. Teams that do deploy them typically:

Default-deny at the namespace level (egress: [], ingress: [] on all pods).
Selectively allow the actual traffic flows.
Use higher-level tools (Cilium Network Policies, Istio Authorization Policies) for Layer 7 policies.
Test them carefully in staging because a wrong policy will silently break things.

NetworkPolicy is not a replacement for mTLS or application-layer auth. It is a supplement. The three layers compose: NetworkPolicy says “can this packet even arrive at this port,” mTLS says “is the sender who they claim to be,” application auth says “is this caller allowed to perform this operation.” Each layer blocks different classes of attack.

81.7 Secrets at rest and in transit

Any trust model is built on secrets — signing keys, client certs, database passwords, API tokens. The secret-management story is a load-bearing piece of the trust architecture and routinely done badly.

The minimum bar:

Never commit secrets to source. Git history is forever; scan tools catch some but not all. Use a secret manager.
Rotate regularly. Every secret has an expiry. Long-lived secrets (API keys that last years) are an anti-pattern.
Short-lived credentials where possible. Rather than “API key good for a year,” mint a new credential on every workload start with a lifetime of hours.
Encrypt at rest. The secret store encrypts with a KMS-backed key. The KMS key is the root of trust.
Encrypt in transit. All secret fetches are TLS (and ideally mTLS).
Audit access. Every secret read is logged with caller, time, secret name.

The common secret managers: HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, Kubernetes Secrets (with an external KMS for encryption). The choice matters less than the discipline.

For Kubernetes specifically, the default “Secret” object is base64-encoded (not encrypted) by default. The correct pattern is:

Enable KMS encryption at rest in the etcd configuration (so Secrets are encrypted in the underlying storage).
Mount Secrets as files or environment variables, not both in inconsistent ways.
Limit which service accounts can read which Secrets (RBAC).
Or use an external secret operator (External Secrets Operator, Sealed Secrets, Vault Agent Injector) that keeps secrets in a dedicated store and syncs them into pods on demand.

The failure mode to avoid: a single “secret of doom” that is shared across services and rotated never. If every service has the same master password and one of them leaks it, everything is compromised at once. The discipline is per-service credentials, rotated frequently, scoped narrowly.

81.8 Defense in depth

Defense in depth is the principle that no single control is trusted to handle all threats. Instead, multiple overlapping controls compose so that defeating any one does not open the whole system.

For inter-service trust, the layers are roughly:

Network layer: NetworkPolicies, VPC security groups, subnet isolation. Blocks “can the packet arrive at all.”
Transport layer: mTLS. Blocks “is the sender who they claim to be, and is the channel encrypted.”
Application layer: JWT / token verification, authz checks. Blocks “is this caller allowed to perform this operation.”
Data layer: encryption at rest, row-level security, per-tenant isolation. Blocks “even if the caller gets through, can they read data that isn’t theirs.”
Audit layer: structured logs of every auth decision, every secret fetch, every admin operation. Does not block but enables detection and forensics.
Monitoring layer: anomaly detection, rate spikes, unusual access patterns. Catches what the other layers miss.

The design question is always “if this layer is compromised, what is the next line of defense.” A healthy platform has an answer at every layer. An unhealthy platform has “the firewall” as the answer and nothing behind it.

A concrete mental exercise: imagine an attacker has achieved RCE inside one of your inference pods. What can they do? A perimeter-trust architecture answers: “everything, they have free run of the network.” A defense-in-depth architecture answers: “they can reach exactly the services NetworkPolicy allows, they need a valid cert to be accepted by mTLS, they need a valid token to get past application auth, they can only see data their pod’s identity is authorized to see, and their access is logged and alerted on.” The gap between those two answers is the value of every additional layer.

81.9 A reference trust architecture

Tying it together, a reasonable reference model for an ML platform:

Each layer handles one trust concern: the edge absorbs unauthenticated traffic, the gateway establishes identity, the mesh enforces mTLS, and the data layer adds a final access-control check.

The trust layers:

Public edge: TLS termination, WAF, DDoS, bot detection.
Public API gateway: authenticates external tokens, rate-limits, sanitizes headers, mints a short-lived internal JWT, adds a trace id.
Service mesh: mTLS between all internal services with certs from SPIRE, sidecar enforcement.
Internal services: verify the internal JWT, check per-operation authorization, enforce tenant isolation.
NetworkPolicies: default-deny, explicit allowlists between namespaces.
Data plane: signed URLs for large blobs (models, training data), mTLS for database connections, row-level security where applicable.
Secrets: short-lived credentials from Vault or equivalent, KMS-backed, audited.
Audit: structured logs through the entire chain, correlation by trace id, alerting on anomalies.

This is not exhaustive, but it is a realistic target for a mid-maturity ML platform. A startup platform hits layers 1–3 in the first year. A series-B platform adds mesh mTLS and NetworkPolicies. A regulated-industry platform adds the rest.

81.10 The mental model

Eight points to take into Chapter 82:

The network is hostile even when it is yours. Assume compromise; design for blast radius.
mTLS gives services cryptographic identity. The easiest operational path is a service mesh (Istio/Linkerd).
Signed URLs let clients do data-plane work directly against origins without proxying bytes through the application.
Token exchange (RFC 8693) is the rigorous answer to identity propagation in multi-service calls. Internal-JWT minting at the gateway is the pragmatic middle ground.
The API gateway is the only component that gets to set identity headers. Strip untrusted inputs at the edge.
Kubernetes NetworkPolicies are defense-in-depth at the network layer, not a replacement for application auth.
Secrets must be short-lived, scoped narrowly, rotated frequently, and audited. A “master password” is an anti-pattern.
Defense in depth means multiple layers compose. No single control is trusted; each one blocks a different class of attack.

In Chapter 82, the operations service pattern: how the durable “operation” becomes its own system of record, independent of the workflow engine underneath.

Read it yourself

RFC 8446 (TLS 1.3) and RFC 5246 (TLS 1.2) — the handshake the whole mTLS story rests on.
The SPIFFE specification (spiffe.io/docs) and the SPIRE server/agent architecture docs.
RFC 8693, OAuth 2.0 Token Exchange — the standard for the token-exchange pattern.
AWS S3 presigned URL documentation and the Signature V4 algorithm reference.
The Istio Security documentation, especially the mTLS and AuthorizationPolicy pages.
The Kubernetes NetworkPolicy reference and the Cilium / Calico policy docs.
NIST SP 800-207, “Zero Trust Architecture” — the government-blessed zero trust definition.

Practice

Draw the full trust path for a request to an LLM completion endpoint: user → edge → gateway → inference service. Mark what is verified at each hop and by whom.
A service receives an X-User-Id header and trusts it. Construct the attack in concrete terms.
Compare mTLS via Istio vs via SPIRE without a mesh. What does each give you that the other doesn’t?
Write an S3 presigned URL generator in Python. Verify the signature is correct by testing against S3. What is the minimum expiry you would ship?
For a workflow engine activity that needs to call an external API on the user’s behalf, write the token-exchange sequence: who calls whom, what claims flow where, what is cached and for how long?
Write a default-deny Kubernetes NetworkPolicy for a namespace, then add the minimum allowlist to let an inference pod serve traffic from a gateway pod and call a Redis pod.
Stretch: Build a tiny mutual-auth gRPC server where both client and server present SPIFFE-style X.509 certs, verify each other, and the server logs the caller’s SPIFFE ID. Use go-spiffe or the Python equivalent.