Part IX · Build, Deploy, Operate
Chapter 112 ~23 min read

Edge ingress: Cloudflare Tunnels, Ingress controllers, service meshes

"The internet is not a fact of nature. Every packet that reaches your pod has been shaped, filtered, and authenticated by a stack of proxies you almost certainly did not write"

Every request to a production service arrives through a chain of proxies. A user types a URL; DNS resolves it to an anycast IP; the closest edge PoP of a CDN takes the TCP connection; TLS is terminated there; HTTP/2 or HTTP/3 is parsed; WAF rules run; the request is forwarded over another TLS connection to an origin load balancer; the origin LB picks a backend cluster; the cluster’s ingress controller picks a pod; the pod’s service mesh sidecar intercepts the call; the application finally sees the request. By the time the HTTP handler runs, the request has been touched by five or six independent proxies, each with its own configuration, its own failure modes, and its own observability story.

This chain is the edge ingress, and understanding it end-to-end is the difference between debugging a “503 from somewhere” incident in minutes versus hours. The chain is also where the trust boundary of the system lives — authentication, rate limiting, DDoS protection, TLS termination, header sanitization — and misunderstanding where the trust boundary sits is how “internal-only” services end up exposed to the public internet.

This chapter walks the chain. Layer-4 versus layer-7 concerns. Kubernetes Ingress controllers and the newer Gateway API. Service meshes as the intra-cluster version of the same problem. Cloudflare Tunnels and similar “no open ports” alternatives. And the trust boundary discipline that makes it all safe.

Outline:

  1. The request chain from client to pod.
  2. Layer-4 vs layer-7 ingress.
  3. Kubernetes Ingress controllers — nginx, Traefik, HAProxy, Envoy Gateway.
  4. The Gateway API as the modern replacement.
  5. Service meshes — Istio, Linkerd, Cilium.
  6. Cloudflare Tunnels and Tailscale Funnel.
  7. The trust boundary at the edge.
  8. TLS termination and certificate management.
  9. Rate limiting, WAF, and DDoS.
  10. Observability at the edge.
  11. The mental model.

112.1 The request chain from client to pod

Trace the full chain of one request. A user in Berlin types api.example.com/v1/chat/completions into an LLM client. What happens:

  1. DNS resolution. The browser asks the local resolver for api.example.com. The resolver eventually hits an authoritative nameserver, which returns an anycast IP managed by a CDN (Cloudflare, Fastly, CloudFront). The user’s machine connects to that IP; BGP routes the TCP SYN to the nearest edge PoP (probably Frankfurt for a Berlin user).

  2. TLS termination at the edge. The user’s TLS handshake completes with the CDN edge, not with the origin. The CDN presents the certificate for api.example.com. Inside the TLS session, HTTP/2 or HTTP/3 is negotiated.

  3. Edge filtering. The CDN runs WAF rules, rate limits, bot detection, and DDoS protection. Requests that fail any of these get a 4xx or get black-holed. Requests that pass continue to the origin.

  4. Origin forward. The CDN opens a connection to the origin (or reuses a pooled one). The connection is another TLS session, with the origin’s certificate. The CDN passes the request through, often adding headers like CF-Connecting-IP, X-Forwarded-For, X-Forwarded-Proto.

  5. Origin load balancer. The origin is usually a cloud load balancer (AWS ALB/NLB, GCP Load Balancer, Azure Application Gateway). The LB selects a backend pool (which might be a Kubernetes NodePort on a set of nodes, or a target group registered by a Kubernetes controller).

  6. Kubernetes ingress. Inside the cluster, an ingress controller (nginx, Traefik, Envoy Gateway, etc.) receives the request, matches it against Ingress or HTTPRoute rules, and forwards to the appropriate backend Service.

  7. Service → Pod. The Service is a stable virtual IP; kube-proxy (or Cilium’s eBPF data path) rewrites the packet to target a specific pod IP.

  8. Service mesh sidecar. If Istio or Linkerd is installed, the request is intercepted by a sidecar (Envoy for Istio, linkerd2-proxy for Linkerd) before reaching the application container. The sidecar enforces mTLS, retries, timeouts, authz policies, and emits telemetry.

  9. Application. Finally, the HTTP handler in the application code receives the request.

Each hop terminates and re-establishes some layer of the stack. TLS terminates at the CDN, re-originates to the origin, terminates at the LB, re-originates to the ingress, and depending on mesh configuration, might re-originate again pod-to-pod. Headers are added and sometimes stripped at each hop. Observability context (trace IDs, request IDs) has to propagate explicitly or it breaks.

The key takeaway: there is no single proxy. The “ingress” is a pipeline, and understanding the pipeline is what makes production debugging tractable.

Request chain from client to pod: CDN edge, origin load balancer, Kubernetes ingress controller, service mesh sidecar, and finally the application handler — each hop adds latency and may modify headers. Client CDNTLS+WAF Origin LBL4/L7 Ingressnginx/Envoy SidecarmTLS+policy Apphandler Each hop terminates and re-establishes TLS. Headers accumulate (X-Forwarded-For, trace ID). Debugging = knowing which hop failed.
A request touches five or six independent proxies before reaching the application handler — understanding each hop's job is how a "503 from somewhere" becomes a "503 from the ingress controller's upstream timeout" in minutes rather than hours.

112.2 Layer-4 vs layer-7 ingress

A fundamental split in how proxies handle traffic.

Layer 4 (transport layer) ingress operates on TCP/UDP connections. It knows the source IP, destination IP, source port, destination port. It does not parse the bytes on the wire — it just forwards them. SNI-based routing (reading the TLS Server Name Indication field) is a borderline case: technically layer 4, because the proxy doesn’t terminate TLS, but it still peeks at the first bytes to route based on the hostname.

Examples of L4 proxies: AWS Network Load Balancer (NLB), HAProxy in TCP mode, kube-proxy itself. They’re fast (no parsing, no TLS, no buffering), they preserve the client’s source IP (when configured), and they handle arbitrary protocols (not just HTTP).

The limit: you cannot do HTTP-specific things. No URL routing, no HTTP header manipulation, no HTTP rate limiting, no WAF rules (other than TCP-level DDoS mitigation).

Layer 7 (application layer) ingress parses the full HTTP request. It reads the method, path, headers, body (optionally), and can make routing and filtering decisions based on any of it. URL-based routing (/api/v1/users → users-service) is layer 7. Header-based rate limiting is layer 7. WAF rules are layer 7.

Examples of L7 proxies: AWS Application Load Balancer (ALB), nginx, Envoy, Traefik, HAProxy in HTTP mode, Cloudflare’s edge. They’re more expensive per request (parsing, TLS termination, sometimes buffering) but dramatically more powerful.

For Kubernetes-facing services, ingress is almost always layer 7 because HTTP is the dominant protocol and URL routing is a core requirement. Layer 4 ingress is used for non-HTTP services (gRPC over HTTP/2 is handled at L7, but other protocols like Postgres, Redis, raw TCP are L4) and for passthrough TLS when the application wants to terminate TLS itself.

The cloud LB products reflect this split: ALB/HTTPS LB for L7, NLB/TCP LB for L4. Both are legitimate, and production clusters often use both in different places.

112.3 Kubernetes Ingress controllers

The classic Kubernetes resource for L7 ingress is Ingress. It’s a declarative description of HTTP routing rules: hostnames, paths, backend services, TLS secret references. An Ingress controller is a component running in the cluster that reads Ingress objects and programs a proxy (nginx, Envoy, etc.) to actually handle the traffic.

A minimal Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  tls:
    - hosts: [api.example.com]
      secretName: api-tls
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /v1
            pathType: Prefix
            backend:
              service:
                name: api-v1
                port: { number: 80 }

The controllers that matter:

ingress-nginx (the Kubernetes community project, not to be confused with NGINX Inc.’s commercial nginx-ingress). The most widely deployed controller. Mature, stable, enormous ecosystem. Configured heavily via annotations, which is both a strength (flexibility) and a weakness (the annotation zoo is huge, poorly typed, and inconsistent). Under the hood, each config change triggers an nginx reload, which can be disruptive at high request rates.

Traefik. Born cloud-native. Auto-discovers routes from labels or CRDs, supports many providers (K8s, Docker, Consul, AWS), has a modern UI and good dashboards. Configuration is often cleaner than ingress-nginx. Smaller community. Some shops love it, some find it opinionated in annoying ways.

HAProxy. The old-school option. Extremely fast (TPS-wise), battle-tested for decades, excellent observability. Configuration is more manual than the modern options. Still chosen by teams that prioritize raw performance and stability.

Envoy-based ingress (Envoy Gateway, Contour, Emissary). Envoy is the proxy that Istio, Linkerd, and many L7 LBs share under the hood. Envoy-based ingress controllers give you Envoy’s configuration model (xDS), which is richer than any annotation-based approach, at the cost of more complexity. Envoy Gateway is the emerging standard for the Gateway API (§112.4).

Cloud-managed controllers. AWS Load Balancer Controller (creates and manages ALBs from Ingress or Gateway resources), GKE Ingress, AKS Application Gateway Ingress Controller. These program the cloud LB directly, so traffic flows through the managed LB rather than a cluster-internal proxy. Simpler in some ways — you don’t run the proxy yourself — but less flexible than a cluster-internal option.

For a typical production cluster, ingress-nginx is the default choice. It’s stable, widely supported, and the ecosystem of guides and troubleshooting content is massive. Teams with a sharper need for programmability or Gateway API support reach for Envoy Gateway or Contour. Teams on a specific cloud that want the simplest story reach for the cloud-managed controller.

112.4 The Gateway API

The Ingress API was designed in 2016 and is showing its age. It has four main limitations:

  • The annotation zoo. Anything beyond the bare routing is configured via controller-specific annotations, which are unportable.
  • Limited protocol support. Ingress is HTTP-only in practice; gRPC, TCP, and UDP require workarounds.
  • Weak separation of concerns. The cluster operator, the app developer, and the security team all want different levels of access to the Ingress resource, but the resource has no role separation.
  • Resource duplication. Each service owner has to copy the same TLS config, the same listener, the same hostname boilerplate.

The Gateway API (CRDs under gateway.networking.k8s.io) is the replacement. It splits the concerns into three resources:

  1. GatewayClass — a cluster-wide resource, owned by the cluster operator, that says “here is a gateway implementation (Envoy Gateway, Contour, etc.).”
  2. Gateway — a namespace-scoped (but usually cluster-admin-owned) resource that says “here is a listener on this hostname, using this certificate, bound to this GatewayClass.”
  3. HTTPRoute (and TCPRoute, GRPCRoute, TLSRoute, UDPRoute) — a namespace-scoped resource, owned by the app team, that says “for requests matching this path/header/etc., route to this Service.”
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: api-gw
  namespace: gateway-system
spec:
  gatewayClassName: envoy-gateway
  listeners:
    - name: https
      hostname: "*.example.com"
      port: 443
      protocol: HTTPS
      tls:
        certificateRefs:
          - name: wildcard-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-v1
  namespace: api
spec:
  parentRefs:
    - name: api-gw
      namespace: gateway-system
  hostnames: [api.example.com]
  rules:
    - matches:
        - path: { type: PathPrefix, value: /v1 }
      backendRefs:
        - name: api-v1
          port: 80

Notice the split: the cluster operator owns Gateway (with the TLS cert and the listener), the app team owns HTTPRoute (with the routing rules) in their own namespace. The app team cannot accidentally break the TLS config; the cluster operator doesn’t have to coordinate on every new route.

The Gateway API is v1 stable as of 2023-2024 and is now the preferred model for new clusters. Most of the major ingress controllers (Envoy Gateway, Contour, ingress-nginx, Traefik) support it with varying maturity. Migration from Ingress is incremental — you can run both side by side during the transition.

graph TD
  ClusterOp[Cluster operator] -->|owns| GC[GatewayClass<br/>cluster-wide]
  ClusterOp -->|owns| GW[Gateway<br/>listeners + TLS]
  AppTeam[App team] -->|owns| HR[HTTPRoute<br/>path rules, namespace-scoped]
  GW -->|binds| HR
  HR -->|routes to| Svc[Service]
  style GW fill:var(--fig-accent-soft),stroke:var(--fig-accent)

Gateway API separates cluster-operator concerns (GatewayClass + Gateway with TLS) from app-team concerns (HTTPRoute with path rules) — an app team can add routes without touching the TLS configuration.

The features that Ingress never had and Gateway API has natively: header- and query-param-based routing, request and response header modification, URL rewrite and redirect rules, traffic splitting (for canary and blue/green), explicit gRPC route support, cross-namespace referencing via ReferenceGrant. All standardized, all portable across implementations.

For any new cluster, start with Gateway API. For existing clusters on Ingress, migration is a quarter-long project and worth doing on the next natural refresh.

112.5 Service meshes

A service mesh is the intra-cluster version of the ingress problem. Instead of “how does traffic enter the cluster,” it’s “how do services inside the cluster call each other.” The mesh inserts a proxy (a sidecar container, or in some meshes a node-level agent) between every service, and that proxy handles the complexity: mTLS, retries, timeouts, circuit breakers, observability, policy.

Istio is the incumbent. It uses Envoy sidecars injected into every pod, a control plane (istiod) that configures the sidecars via xDS, and a rich policy/routing model. Strengths: enormous feature set, active development, works everywhere. Weaknesses: operationally heavy — many moving parts, upgrades are risky, and the sidecar-per-pod model roughly doubles pod count and adds latency per hop. Istio added an “ambient” mode (no sidecar, node-level proxies) to address the weight concern; ambient is still newer than the sidecar mode but is the recommended direction for new installs.

Linkerd is the opposite design philosophy. It prioritizes operational simplicity. Uses a custom Rust proxy (linkerd2-proxy) that is lighter than Envoy, a simple control plane, and a much smaller feature set. Install is one command, upgrades are painless, and the metrics story is excellent out of the box. The limit: fewer features than Istio. If you need complex traffic policy, header-based routing, or integration with external CAs, Linkerd may not cover it.

Cilium Service Mesh is the eBPF-native option. Cilium is already a popular CNI (Container Network Interface) that uses eBPF for cluster networking, and its service mesh extends that with L7 features. The pitch: no sidecar, minimal latency, integrated with the CNI so observability is unified. Younger than Istio and Linkerd, but maturing fast. For teams already using Cilium as their CNI, the service mesh is a natural extension.

Consul Connect (HashiCorp) and Kuma (CNCF) are also in the space, with smaller footprints.

What a mesh buys you:

  • mTLS everywhere. Service-to-service traffic is encrypted and authenticated without the application having to handle certificates.
  • Standardized retries and timeouts. No more “every service implements retries differently.”
  • Traffic policy. Canary deployments, blue/green, fault injection, traffic shifting — all as declarative CRDs.
  • Observability. Per-hop metrics, distributed tracing out of the box.
  • Authorization policy. “Service A can call /users on Service B but not /admin on Service C” expressed declaratively, enforced by the sidecar.

What a mesh costs:

  • Operational complexity. Another control plane, another CRD set, another upgrade surface.
  • Per-hop latency. Sidecars add typically 0.5-2ms per hop. For most services this is negligible; for latency-sensitive hot paths (inference serving at p99 < 50ms) it’s meaningful.
  • Resource overhead. Sidecars consume CPU and memory per pod.
  • Debugging complexity. When a request fails, is it the app, the sidecar, the mesh config, or the network?

When to adopt a mesh. If you have >50 services that talk to each other, if you need mTLS without retrofitting every app, if you need uniform traffic policies for canary deploys — adopt a mesh. If you have 5 services and a simple architecture, don’t. The operational cost isn’t worth it at small scale.

Which mesh. Linkerd for simplicity, Istio for features, Cilium for eBPF-native integration. For most teams getting started, Linkerd is the sane default.

112.6 Cloudflare Tunnels and Tailscale Funnel

An entirely different approach to ingress: don’t open ports at all. Instead, the origin makes an outbound connection to a central edge, and traffic flows back through that connection.

Cloudflare Tunnel (formerly Argo Tunnel) runs a small daemon (cloudflared) next to your service. cloudflared connects outbound to Cloudflare’s edge and holds an HTTP/2 or QUIC session open. Incoming public requests land at Cloudflare’s edge, Cloudflare looks up the tunnel for the hostname, and forwards the request over the persistent outbound connection to cloudflared, which then forwards to the local service.

The benefit: no inbound firewall rules, no public IP, no exposed ports. The origin is completely unreachable from the internet directly. An attacker can only reach the origin through Cloudflare’s edge, which enforces authentication (Cloudflare Access), WAF rules, rate limits, and so on. The attack surface is drastically smaller.

Tailscale Funnel is similar but built on Tailscale’s WireGuard-based overlay. You run Tailscale on the origin, enable Funnel on the hostname, and Tailscale’s edge forwards public traffic through the overlay to your node. Smaller surface (it’s newer and has fewer features than Cloudflare Tunnel) but the same model.

These tools are genuinely transformative for a certain class of deployments: self-hosted services, dev environments, homelabs, private platforms that need occasional public exposure, internal tools that shouldn’t be on the public internet. A Cloudflare Tunnel takes about 5 minutes to set up, requires no firewall configuration, and gives you an HTTPS URL with a valid certificate.

For high-scale production traffic (millions of QPS), Cloudflare Tunnel is usable but you’d normally pair it with Cloudflare’s other edge services (WAF, caching, etc.) — the tunnel is the last hop, not the whole story. For small-scale production and internal tools, it’s often the cleanest ingress story.

The limit: you trust Cloudflare (or Tailscale) with the edge. For some deployments this is fine — you’re already trusting them for CDN services anyway. For deployments that need a specific CDN or edge policy, it’s a constraint.

112.7 The trust boundary at the edge

The trust boundary is the question of which proxy is the last one that enforces authentication and authorization before the request reaches application code. Getting this wrong is how services intended for internal use end up exposed to the internet.

Three common patterns, each with a different trust boundary:

Pattern 1: Public traffic, edge WAF + origin LB + ingress controller. The trust boundary is the edge (Cloudflare / WAF). The origin LB accepts traffic from any source (limited by the cloud security group, usually to the CDN’s IP ranges). The ingress controller trusts traffic from the LB. The pod trusts traffic from the ingress controller. Authorization is at the edge (OAuth, JWT) and re-validated in the application.

The risk: if the origin LB’s security group is misconfigured to allow all IPs, the WAF is bypassable — an attacker hits the origin directly and the protection disappears. The mitigation: lock the security group to the CDN’s published IP ranges (Cloudflare publishes these; AWS ALB has native integration). And/or: use mTLS between the CDN and the origin, with the CDN holding the client cert, so direct connections fail.

Pattern 2: Internal-only, cluster-internal ingress, no public exposure. The trust boundary is the VPC / cluster network. Traffic from outside the VPC simply cannot reach the pod. Authentication within the VPC is typically mTLS via the service mesh. The risk is lateral movement: if an attacker gets a foothold in the VPC (compromised pod, stolen credentials, SSRF into the internal network), every “internal-only” service is reachable.

The mitigation: treat the VPC as untrusted (zero-trust networking). Every service authenticates every caller, even inside the cluster. The mesh’s mTLS and authz policies are the enforcement layer.

Pattern 3: Public traffic via Cloudflare Tunnel. The origin has no inbound ports open at all. The trust boundary is Cloudflare’s edge, and Cloudflare Access enforces authentication (OAuth, mTLS, device posture) before any traffic reaches the tunnel. The risk surface is much smaller: no open port means no accidental exposure, no security group misconfiguration, no cloud LB surprise.

The most common mistake across all patterns: thinking a service is internal when it’s actually exposed. Some ways this happens:

  • A Service of type LoadBalancer creates a public cloud LB by default on most clouds, exposing whatever’s behind it to the internet.
  • An Ingress is added for “dev testing” and never removed, giving the internal service a public URL.
  • A security group rule allows all IPs “temporarily” and is never tightened.
  • A new ingress controller is deployed with default settings that expose it.

The defense is audit. Periodically, enumerate every endpoint reachable from the public internet and confirm each one is supposed to be. Tools like nmap against the origin’s IP ranges, amass for subdomain discovery, the cloud provider’s own “public resources” audit features. If you don’t know what’s public, you don’t know what your attack surface is.

112.8 TLS termination and certificate management

Where TLS terminates in the chain is a choice.

Termination at the CDN. The CDN holds the certificate, terminates the handshake, and reconnects to the origin over a separate TLS session. The origin sees requests as HTTP (or as TLS to an origin cert). This is the standard pattern for public-facing services using a CDN.

Termination at the origin LB. The cloud LB holds the certificate. The LB forwards to the backends over HTTP or a separate TLS session. Common when there’s no CDN and the LB is the public entry point.

Termination at the ingress controller. The ingress controller holds the certificate. The LB passes TLS through (SNI-based routing) or forwards unencrypted traffic from a private network. Common when the ingress controller is the public entry point (fewer hops, but more operational burden).

Termination at the pod. The pod handles TLS itself, using a cert mounted via a secret or CSI driver. Rare in practice — it moves the cert-management burden to each service. Sometimes required when end-to-end encryption is a compliance requirement.

mTLS in the mesh. Service-to-service TLS inside the cluster is a separate layer; the mesh handles it transparently. Termination-at-the-pod for ingress can coexist with mTLS in the mesh.

Certificate management is almost universally handled by cert-manager on Kubernetes. cert-manager is a controller that watches Certificate CRDs, requests certificates from a configured issuer (Let’s Encrypt, a private CA, Vault PKI, AWS Certificate Manager), and stores them as Kubernetes secrets that the ingress controller or pod can mount. Renewals are automatic; cert-manager renews well before expiry and updates the secret.

The common Let’s Encrypt pattern:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: api
spec:
  secretName: api-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - api.example.com

The letsencrypt-prod ClusterIssuer handles the ACME dance with Let’s Encrypt (HTTP-01 or DNS-01 challenge). The resulting cert lands in the api-tls secret; the ingress reference is by secret name. Renewals happen transparently ~30 days before expiry. It’s one of the quiet infrastructure wins of the past decade.

A failure mode to watch for: rate limits on Let’s Encrypt. The public LE service has a rate limit of 50 certs per domain per week. If a misconfigured cert-manager retries in a loop, you can blow through the limit and be locked out for a week. Use the LE staging environment during initial setup and testing; only switch to letsencrypt-prod once the pipeline is known-good.

112.9 Rate limiting, WAF, and DDoS

The edge is where hostile traffic gets filtered.

Rate limiting. Caps on requests per IP, per user, per API key, per route. Prevents abuse and absorbs small floods. Implemented at the CDN, the origin LB, the ingress controller, or all three. A layered approach is normal: the CDN does coarse per-IP limits, the ingress or API gateway does per-user and per-route limits, and the application might do business-logic limits (N calls per user per month).

The right layer depends on the dimension. Per-IP limits belong at the edge because they’re cheap to enforce and keep malicious traffic off the origin. Per-user limits belong at the API gateway or application because that’s where user identity is known. Per-route limits can be anywhere but are usually in the gateway.

WAF (Web Application Firewall) rules match patterns known to indicate attacks: SQL injection strings, XSS payloads, path traversal, known-bad user agents. Cloudflare and AWS WAF both ship with curated rule sets (OWASP Core Rule Set is the open source version). WAFs have false positives — legitimate requests that match a rule and get blocked — so you run them in “count mode” first, measure, and only move to “block mode” when confident.

WAFs are useful but not a substitute for application-level input validation. A WAF catches known patterns; a custom exploit or a business-logic flaw slips past. Treat WAF as a defense in depth, not as the primary defense.

DDoS protection. Large-scale floods (volumetric attacks, SYN floods, amplification attacks) are handled by the CDN or cloud provider’s DDoS service. Cloudflare, AWS Shield, GCP Cloud Armor, Akamai all offer this as part of their edge product. A single origin cannot absorb a multi-Gbps flood; the only defense is a distributed edge that can soak it up or black-hole the source IPs.

For most services, DDoS protection is included in the CDN product and no explicit action is needed. The exceptions are services that bypass the CDN (direct origin access), which lose the protection.

112.10 Observability at the edge

Every proxy in the chain emits logs and metrics. Getting the full picture requires correlating across hops.

Logs. Each proxy logs every request with fields like timestamp, source IP, method, path, status, latency, bytes in/out, user agent. The volume at the edge is huge — a CDN might log billions of lines a day — so log sampling and retention policies are essential. The pattern: full sampling for errors (4xx, 5xx), low sampling for success (0.1-1%), with the option to ramp up during incidents.

Metrics. Per-route, per-status, per-percentile latency. These are aggregated in real time and feed dashboards and alerts. Kubernetes ingress controllers expose Prometheus metrics natively; cloud LBs expose metrics via CloudWatch / Cloud Monitoring / Azure Monitor.

Distributed tracing. The killer observability feature for a multi-hop request chain. A trace ID is generated at the edge (or accepted from the client) and propagated through every hop as a header (traceparent in W3C Trace Context). Each proxy and each application records a span with its portion of the latency. When a request is slow, you can see exactly which hop contributed. Jaeger, Tempo, Honeycomb, Datadog APM — all handle this.

The hardest part of tracing is getting every proxy and every application to propagate the trace headers. Missing propagation creates broken traces with gaps. The modern approach: OpenTelemetry auto-instrumentation for the application, and ensure every proxy in the chain has trace propagation enabled (most do by default; some need explicit configuration).

Client-side metrics. Often forgotten. The CDN’s view of latency is latency-from-CDN-to-client. It doesn’t see the portion where the client is slow (JavaScript execution, rendering, actual network). For user-facing services, Real User Monitoring (RUM) and synthetic checks from external monitoring (Pingdom, UptimeRobot, Grafana Synthetic Monitoring) fill in the gap.

112.11 The mental model

Eight points to take into Chapter 113:

  1. The request chain has many hops. Every hop terminates and re-establishes some layer. Debugging requires knowing where each hop is.
  2. Layer 4 vs layer 7 — L4 is fast and protocol-agnostic, L7 is HTTP-aware and powerful. Most Kubernetes ingress is L7.
  3. Ingress controllers: ingress-nginx is the default, Envoy Gateway / Contour for Gateway API maturity, Traefik and HAProxy for their niches.
  4. Gateway API replaces Ingress for new clusters. It splits cluster-operator concerns from app-team concerns cleanly.
  5. Service meshes solve the same problem inside the cluster. Linkerd for simplicity, Istio for features, Cilium for eBPF.
  6. Cloudflare Tunnel (and Tailscale Funnel) are the “no open ports” alternatives. Dramatically smaller attack surface.
  7. The trust boundary is the last proxy that enforces auth. Audit what’s public. Treat the VPC as untrusted (zero-trust).
  8. cert-manager + Let’s Encrypt is the standard cert story. Use staging first to avoid rate-limit lockouts.

In Chapter 113, the focus shifts from runtime to the pipeline that builds everything: CI as a system, the capstone of Part IX.


Read it yourself

  • The Kubernetes Ingress and Gateway API documentation, especially the conformance pages.
  • The Envoy Proxy documentation — even if you don’t run Envoy directly, it’s the conceptual foundation for Istio, Contour, and Envoy Gateway.
  • The Istio and Linkerd official tutorials.
  • The Cloudflare Tunnel (cloudflared) documentation.
  • BPF Performance Tools (Gregg, Addison-Wesley, 2019), for the eBPF foundations used by Cilium.
  • The cert-manager documentation, especially the ACME issuer configuration and the troubleshooting guide.
  • Mark Nottingham, HTTP APIs: Building Successful APIs, and the RFC 9110 HTTP Semantics spec.

Practice

  1. Trace a request from your browser to a pod in a cluster behind a CDN. List every hop, what it does, and what protocol it speaks.
  2. Write a Gateway API Gateway + HTTPRoute pair that routes /v1/* to api-v1 and /v2/* to api-v2, both on hostname api.example.com, terminating TLS at the gateway.
  3. Explain why an Ingress is an awkward place for cross-namespace routing, and how Gateway API’s ReferenceGrant fixes it.
  4. Compare Linkerd, Istio, and Cilium Service Mesh on four axes: feature set, operational complexity, latency overhead, ecosystem size.
  5. Set up a Cloudflare Tunnel for a local service running on localhost:8080. What does the tunnel configuration look like? What are the security properties?
  6. Design the trust boundary for a public API that uses CDN + origin ALB + ingress-nginx + pods. Which proxy enforces auth? What happens if an attacker bypasses the CDN?
  7. Stretch: Deploy cert-manager on a local K8s cluster, configure the Let’s Encrypt staging issuer, provision a cert for a real domain you own via DNS-01, and verify renewal by advancing the clock or setting a short renewal window.