Chapter 110: IaC: Terraform, Pulumi, CDK

Infrastructure as Code — IaC — is the discipline of expressing cloud infrastructure in text files that live in a repository and get applied by machines, not by humans clicking in a console. It is the oldest and least-glamorous part of platform engineering, and it is also the part that determines whether the rest of the platform can be rebuilt after a disaster, audited after an incident, or changed without a two-week meeting. Everything in this book above IaC — clusters, services, deploys, observability — assumes IaC works. If IaC is broken, nothing else matters.

Three tools dominate the landscape in 2026: Terraform (now OpenTofu for the open-source fork), Pulumi, and AWS CDK (with CDK8s as its Kubernetes cousin, covered in Chapter 108). They represent the two major philosophies of IaC: declarative-DSL (Terraform) and real-code (Pulumi, CDK). The choice between them is a long-running debate that produces more heat than light, because the honest answer is that both work and the right choice depends on what your team already knows.

This chapter covers what each tool is, what the state-management problem looks like (it’s the hardest part of all three), how modern CI/CD integrates with IaC via OIDC and plan-on-PR workflows, and the gotchas that actually bite in production.

Outline:

What IaC is and why it matters.
Declarative vs imperative IaC.
Terraform — HCL, providers, modules.
The state file problem.
Pulumi — real code, real state.
AWS CDK and CDK-family tools.
OIDC for CI, the plan-on-PR / apply-on-merge workflow.
Drift, imports, and reality reconciliation.
Secrets in IaC (pointer to Chapter 111).
Gotchas and operational patterns.
The mental model.

110.1 What IaC is and why it matters

The pre-IaC world is console clicks. An engineer logs into the AWS console, creates a VPC, subnets, security groups, an RDS instance, an IAM role, a load balancer, a certificate. It takes three hours. Nobody writes down what they did. Three months later, a subtle setting needs changing and nobody remembers what the current state is. A year later, the engineer is gone and the next engineer has to reverse-engineer the whole thing from the console.

The IaC world is text. An engineer writes a Terraform (or Pulumi, or CDK) file that declares the same VPC, subnets, etc. The file is committed to git. A machine reads the file and creates (or updates) the cloud resources. Anyone can read the file to understand what exists. Changes are code reviews. Rollbacks are reverts. Rebuild after a disaster is terraform apply against a clean account.

The benefits compound. Auditability: git log is the change history. Reproducibility: the same code produces the same infrastructure. Drift detection: if someone clicks in the console, the next plan shows the delta. Blast-radius containment: changes go through review, not through a human’s muscle memory. Onboarding: new engineers read the IaC to understand the system, not the console.

The costs are also real. IaC is its own skill. The tools are imperfect. State management is a recurring problem. The plans are slow. Debugging a failed apply at 2 AM, when the state file is in a broken intermediate state, is its own special kind of pain.

But the cost of not having IaC is higher at any non-trivial scale. A platform team without IaC is a platform team that cannot be trusted with production. This has been consensus in the industry for a decade. The only remaining debate is which tool.

110.2 Declarative vs imperative IaC

The philosophical split.

Declarative IaC describes the desired end state. “Here is a VPC with these subnets, here is an RDS instance with this size, here is an IAM policy with these permissions.” The tool figures out what to do to get from the current state to the desired state. Creates are creates, modifies are modifies, deletes are deletes, and the order is computed by the tool. Terraform is the canonical example.

Imperative IaC describes steps. “Create a VPC. Create subnets in it. Create an RDS instance. Attach this policy.” The tool executes the steps in order. This is less common in modern IaC; it’s what old shell scripts did.

The line blurs with tools like Pulumi and CDK. They use real programming languages (TypeScript, Python, Go), which are imperative-looking, but the code builds up a declarative graph of resources and then hands it to a deployment engine that does the declarative reconciliation. From the user’s perspective they look like imperative code; from the engine’s perspective they’re declarative specs.

The declarative-DSL camp (Terraform/HCL) argues:

The DSL is constrained, so you can’t write logic you shouldn’t.
The declarative shape is easier to diff and review.
The plan output is a pure function of the code and state.
No dependency on a host language runtime.

The real-code camp (Pulumi, CDK) argues:

Real languages have loops, conditions, functions, types, tests.
IDE support is actually good (not just syntax highlighting).
Abstraction is cheap — you can write a function that returns a resource composition and call it from many places.
You can unit-test your infrastructure code.

Both camps are right. The declarative-DSL approach wins on readability for outsiders and has fewer ways to shoot yourself in the foot. The real-code approach wins on expressiveness and is better for internal platforms that generate complex stacks from high-level specs.

For a team with heavy platform engineering needs and strong developers, Pulumi or CDK can be the right choice. For everyone else, and especially for shared infrastructure code that operators (not developers) will touch, Terraform is the safer default.

graph TD
  Q1{Multi-cloud?} -->|Yes| V[Terraform / Pulumi]
  Q1 -->|No| Q2{AWS-only?}
  Q2 -->|Yes| Q3{Complex abstractions?}
  Q3 -->|Yes| CDK[AWS CDK]
  Q3 -->|No| TF[Terraform]
  Q2 -->|No GCP/Azure| TF2[Terraform]
  V -->|team is strong devs| Pulumi[Pulumi]
  V -->|ops team, readability| TF3[Terraform]
  style TF fill:var(--fig-accent-soft),stroke:var(--fig-accent)
  style TF2 fill:var(--fig-accent-soft),stroke:var(--fig-accent)
  style TF3 fill:var(--fig-accent-soft),stroke:var(--fig-accent)

IaC tool selection reduces to three questions: multi-cloud?, AWS-only?, and is the team developers who will build abstractions? — Terraform wins in most branches of the decision tree. The overwhelming majority of industry usage is Terraform-shaped, which matters for hiring and for community modules.

110.3 Terraform — HCL, providers, modules

Terraform’s model. A .tf file in HashiCorp Configuration Language (HCL) declares resources via providers. Running terraform plan computes the diff between the declared state and the actual state. Running terraform apply executes the diff.

A minimal example:

terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
  backend "s3" {
    bucket         = "my-tf-state"
    key            = "platform/network.tfstate"
    region         = "us-east-1"
    dynamodb_table = "my-tf-locks"
    encrypt        = true
  }
}

provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags = { Name = "main", Env = "prod" }
}

resource "aws_subnet" "private" {
  for_each          = toset(["us-east-1a", "us-east-1b", "us-east-1c"])
  vpc_id            = aws_vpc.main.id
  availability_zone = each.value
  cidr_block        = "10.0.${index(["us-east-1a","us-east-1b","us-east-1c"], each.value)}.0/24"
}

The language features. Resources are typed objects with a schema from the provider. Providers are plugins (aws, google, kubernetes, github, cloudflare, hundreds of others). Modules are reusable bundles of resources (module "vpc" { source = "terraform-aws-modules/vpc/aws" }). Variables parameterize modules. Outputs expose values from a module for others to consume. Data sources read existing state without managing it.

The ecosystem is enormous. The AWS provider alone has coverage for essentially every AWS service. The Kubernetes provider, Helm provider, and Kubectl provider let you manage cluster resources from Terraform (though this is usually a mistake — cluster resources belong in a GitOps system, see Chapter 107). Community modules on the Terraform Registry cover every common pattern.

The strengths of Terraform:

Provider coverage. Everything has a provider.
The plan is the review artifact. You stare at a plan and say yes or no.
State is explicit. You know where it lives, you can back it up, you can import into it.
HCL is simple. New engineers can read it in an hour.
Massive community. The answer to almost every question is on Stack Overflow or a GitHub issue.

The weaknesses:

HCL is limited. Complex logic becomes gross — nested for expressions, dynamic blocks, try() and coalesce(). At a certain complexity level, real code would be cleaner.
Module interfaces are loose. A module accepts a map of inputs and produces a map of outputs, with no type safety beyond what HCL’s type system can express.
The state file is a foot-gun (next section).
Slow plans on large states. A state with 5000 resources can take minutes just to refresh.

The OpenTofu fork (born from HashiCorp’s license change in 2023) is now the preferred open-source choice. It’s a drop-in replacement for Terraform and tracks most of its features, with community-driven development and no license risk. Many teams have moved to OpenTofu or are planning to.

110.4 The state file problem

Every IaC tool has to track what it created. The mapping from “the resources I declared in code” to “the actual cloud resources that exist” is called the state. Terraform calls this a state file, Pulumi calls it a stack, CDK uses CloudFormation stacks.

Why state exists: cloud resources have IDs that are generated at create time (an EC2 instance ID, an S3 bucket ARN, an RDS DBI resource ID). The code doesn’t know the ID in advance — it only knows “I want an EC2 instance with these properties.” After the first apply, the tool needs to remember which ID corresponds to which code-level resource, so the next plan can diff correctly.

Terraform stores state in a backend: a local file (bad), an S3 bucket with DynamoDB locking (the standard), Terraform Cloud (the SaaS option), or various other backends. Pulumi stores state in its cloud backend by default, or a self-hosted backend (S3, Azure Blob, etc.). AWS CDK stores state in CloudFormation stacks, which AWS manages for you — no state file to handle, which is a real advantage.

The problems with state:

Locking. Two terraform apply runs at the same time will corrupt the state. The standard solution is a DynamoDB table used as a lock (dynamodb_table = "my-tf-locks" in the backend config). Without a lock, CI pipelines that run plans concurrently will race.

Corruption. State files get corrupted. A failed apply mid-way through can leave state inconsistent with reality. Terraform has terraform state rm and terraform import to surgically fix it, but these commands are dangerous — they edit the state directly, and a mistake can lose resource tracking entirely.

Drift. Someone clicks in the console and changes a resource that Terraform manages. The state still thinks the old value is correct. The next plan shows the drift as something to undo. If the manual change was intentional and urgent (“I had to increase the RDS instance size during an incident”), the operator has to update the code to match reality, not revert reality to match the code.

Secrets in state. Terraform state files contain resource attributes including sensitive ones — RDS passwords, API keys for secret stores, connection strings. The state file is effectively a secrets file. Store it encrypted, lock access to it, and never commit it to git.

State splitting. A state file that manages everything is too big. A state file that manages nothing is too small. Splitting state is a real design decision — one state per “layer” (network, databases, clusters, apps), one state per environment (dev, staging, prod), one state per cell (§109.4). Too many states means cross-state dependencies via data sources or outputs (slow, awkward). Too few means slow plans and high blast radius on a single bad apply.

A pragmatic split: one state per (environment, layer) pair. So prod/network, prod/clusters, prod/apps, staging/network, and so on. Cross-layer dependencies (the cluster reads the network’s VPC ID) happen via remote state data sources, which are a supported pattern.

The state file is the hardest thing about Terraform. Every team rediscovers this. Plan your state layout from the start, back up your state files religiously, and be very careful about terraform state subcommands. There is no undo.

110.5 Pulumi — real code, real state

Pulumi’s pitch: write infrastructure in a real language, get type safety, get IDE support, get tests. The model is similar to CDK8s (Chapter 108): you write a program in TypeScript/Python/Go/C# that uses a Pulumi SDK to declare resources, and the Pulumi engine diffs against state and applies.

import * as aws from "@pulumi/aws";

const vpc = new aws.ec2.Vpc("main", {
  cidrBlock: "10.0.0.0/16",
  enableDnsHostnames: true,
  tags: { Name: "main", Env: "prod" },
});

const azs = ["us-east-1a", "us-east-1b", "us-east-1c"];
const subnets = azs.map((az, i) => new aws.ec2.Subnet(`private-${az}`, {
  vpcId: vpc.id,
  availabilityZone: az,
  cidrBlock: `10.0.${i}.0/24`,
}));

The language features that matter:

Type checking. aws.ec2.Subnet has a typed input schema. Invalid field names or wrong types fail at compile time, not at apply time.
Real control flow. Loops, conditions, functions, imports from any npm package.
Unit tests. You can write Jest/pytest tests that assert the resource graph has specific shapes before any cloud API is called.
Standard packaging. Your stack is an npm/pip package with a package.json/pyproject.toml, a node_modules/virtualenv, and standard tooling.

State management in Pulumi is similar to Terraform but the default backend is the Pulumi SaaS. A stack (Pulumi’s unit of state) lives in the SaaS or in a self-hosted backend (S3, Azure Blob, filesystem). Locking is handled by the backend. The state model otherwise looks like Terraform’s — resource URN → cloud ID mapping, diff-based planning, import support.

Pulumi also supports converting Terraform code via a translator, which is useful for teams considering a migration but not committed. And Pulumi can consume Terraform providers via a shim, so anything Terraform can manage, Pulumi can manage.

The weaknesses:

Smaller community than Terraform. Fewer Stack Overflow answers, fewer shared modules.
More lock-in to the Pulumi backend, especially the SaaS.
Harder to read for non-developers. A TypeScript file is more opaque to an operator than an HCL file.
Compile step. You need npm install, a build, etc. Slower feedback loop.

Pulumi is the right choice when the team is strong on developers and will build complex abstractions over IaC. It’s the wrong choice for a platform team whose operators are not full-time programmers and whose priority is readability and community support.

110.6 AWS CDK and CDK-family tools

AWS CDK is Amazon’s answer to the real-code IaC question. It’s TypeScript/Python/Java/C#/Go code that compiles to CloudFormation templates, which AWS then applies. The distinguishing feature is the constructs library (the same library concept used by CDK8s, Chapter 108), which provides high-level, opinionated wrappers around CloudFormation resources.

import { App, Stack } from "aws-cdk-lib";
import { Vpc, SubnetType } from "aws-cdk-lib/aws-ec2";
import { Cluster, FargateService } from "aws-cdk-lib/aws-ecs";

const app = new App();
const stack = new Stack(app, "PlatformProd");

const vpc = new Vpc(stack, "Vpc", {
  maxAzs: 3,
  subnetConfiguration: [
    { name: "public", subnetType: SubnetType.PUBLIC },
    { name: "private", subnetType: SubnetType.PRIVATE_WITH_EGRESS },
  ],
});

const cluster = new Cluster(stack, "Cluster", { vpc });

The pitch. The Vpc construct creates a VPC and the subnets, route tables, NAT gateways, and Internet Gateway — all the boilerplate you’d write by hand in Terraform. The Cluster construct knows how to wire an ECS cluster to the VPC. You write three lines and get a production-shaped network and cluster. Compared to raw CloudFormation (which is verbose and painful) or Terraform (which is less verbose but requires you to assemble the pieces yourself), CDK’s high-level constructs are the fastest way to express AWS patterns.

The catch. CDK only targets AWS. Multi-cloud teams cannot use CDK. Teams that want to manage Kubernetes resources use CDK8s (a sibling library that emits Kubernetes YAML, Chapter 108) but the ergonomics of combining CDK and CDK8s are not as clean as Pulumi or Terraform for mixed workloads. And CDK’s abstraction level means you sometimes don’t know what CloudFormation you’re generating until you synth and look at it.

CDK’s state management is CloudFormation, which is AWS-managed. No state files to worry about, no locking, no corruption recovery. That’s a genuine advantage. The tradeoff is that CloudFormation has its own failure modes (stack in UPDATE_ROLLBACK_FAILED, stack drift, resource import limitations) that CDK inherits.

CDK is the right choice for AWS-only teams that want high-level abstractions and don’t want to deal with Terraform state. It’s the wrong choice for multi-cloud teams or for anything that isn’t primarily AWS.

110.7 OIDC for CI, the plan-on-PR / apply-on-merge workflow

Running terraform apply from a laptop is a historical anachronism. Modern IaC runs from CI, which raises the question of how CI authenticates to the cloud. The old answer was long-lived IAM access keys stored as CI secrets. The modern answer is OIDC.

The setup. Your cloud provider trusts GitHub Actions (or GitLab CI, or Buildkite, or whatever) as an OIDC identity provider. Your CI pipeline requests a short-lived OIDC token during the job, hands it to the cloud provider’s STS API, and receives short-lived credentials in return. The credentials expire in an hour. No long-lived secrets exist anywhere.

The GitHub Actions pattern:

permissions:
  id-token: write
  contents: read

jobs:
  terraform-plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-terraform
          aws-region: us-east-1
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform plan -out=plan.out
      - uses: actions/upload-artifact@v4
        with: { name: plan, path: plan.out }

The IAM role github-actions-terraform has a trust policy that allows assumption only from a specific GitHub repository and branch (or PR), authenticated via the OIDC token. The role’s permissions are the set of cloud actions Terraform is allowed to perform. No secret is stored in GitHub. No credential is long-lived.

Plan-on-PR, apply-on-merge is the canonical workflow:

PR is opened. CI runs terraform plan and posts the plan as a PR comment.
Reviewers look at the plan. Is this the change we meant?
PR is approved and merged to main.
CI on main runs terraform apply automatically, using the plan from the merge commit.

Plan-on-PR means reviewers see the exact diff that will be applied — no trust-the-code required; OIDC ensures CI holds no long-lived cloud credentials, and the merge is the gate between read and write permissions.

The beauty of this flow:

The review is the plan. You don’t have to trust the code; you trust the plan diff. Reviewers see exactly what will change.
No manual applies. Nobody runs apply from a laptop.
The apply role has more permissions than the plan role. Plan is read-only; apply has write. A PR can’t do harm until it’s merged.
Locks prevent concurrent applies. The state lock ensures one apply at a time.

Refinements that production-grade teams add:

Saved plans. The plan artifact from the PR is the exact plan that’s applied on merge. No “rebase introduced a new diff” surprises.
Environment promotion. The same code is applied to dev, then staging, then prod, with manual approval gates between. Each environment has its own state.
Destroy protections. A separate workflow for terraform destroy, with stronger approval gates. Or better: remove destroy from CI entirely.
Drift detection jobs. Nightly terraform plan against main that alerts if the plan is non-empty. Catches manual console changes before the next PR’s plan mixes them with the intended change.
Atlantis or Spacelift for more sophisticated workflows. These are IaC-specific CI tools that handle locking, PR comments, and multi-state orchestration.

This workflow is the single largest operational improvement a platform team can make to its IaC. It replaces “someone runs apply” with “CI runs apply after review,” and it removes the largest source of IaC incidents.

110.8 Drift, imports, and reality reconciliation

Drift is the gap between what IaC thinks exists and what actually exists. It happens because:

A human edited a resource in the console during an incident.
A different IaC system or tool created a resource that overlaps.
An auto-scaling or auto-creating process (e.g., cloud-managed backup retention) modified a resource.
A bug in the provider caused state to become inconsistent.

Detecting drift: run terraform plan against the current state. A non-empty plan against unchanged code is drift.

Fixing drift: two options.

Revert reality to match code. Apply the plan as-is. Appropriate when the drift was accidental.

Update code to match reality. Modify the Terraform code to reflect the current state, then apply (which becomes a no-op). Appropriate when the drift was intentional and needs to be preserved.

A harder problem: importing existing resources. An engineer created an RDS instance in the console six months ago. You want to bring it under Terraform management without destroying and recreating it. Terraform has terraform import resource_address resource_id, which reads the resource from the cloud and adds it to the state. The code has to declare the resource first; the import then binds it to the state.

Imports are fragile. The Terraform provider’s schema has to match the resource’s current configuration exactly, or the next plan will show a diff. Real imports are an iterative dance: import, plan, fix the code to match, plan again, repeat until the plan is empty. For complex resources (an RDS with parameter groups, option groups, subnet groups, security groups), the import dance can take hours.

Pulumi has import as well, with similar ergonomics. CDK’s import story is weaker — CloudFormation has a CreateChangeSet with import resources, but it’s limited in what can be imported.

The universal advice: import aggressively in the first year, then never again. The first year of a platform has lots of pre-IaC resources that need to be brought in. After that, treat “needs to be imported” as a bug and fix the process that let resources be created outside IaC.

110.9 Secrets in IaC

Chapter 111 is the full treatment. The short version: secret values do not live in IaC. IaC manages pointers to secrets (secret manager ARNs, external secret CRDs, vault paths), not the secret values themselves.

The wrong way: Terraform code with a hardcoded password in a variable, even if the variable is marked sensitive. The password ends up in state, in CI logs, in plan output.

The right way: a secret is created in a secret store (AWS Secrets Manager, Vault, etc.) outside IaC, or via a separate bootstrapping process. IaC declares “this RDS uses the password at ARN X.” The value is never in IaC.

For RDS specifically, the modern pattern is to use AWS Secrets Manager’s managed rotation, where the secret is created by IaC with a random initial value, RDS rotates it, and no human ever sees it. For Kubernetes-bound secrets, External Secrets Operator (ESO, Chapter 111) pulls from the store at runtime, so the IaC only needs to declare the ExternalSecret object.

110.10 Gotchas and operational patterns

A list of things that have bitten production.

Terraform state file divergence between main and a feature branch. A feature branch runs plan against the state from a commit that’s since been applied to main. The plan is wrong. The fix is to always re-plan on rebase, or to block applies until plans are against the current HEAD.

count vs for_each. count indexes resources by integer, so inserting a resource in the middle of the list renumbers everything and causes spurious destroys/creates. for_each indexes by key, which is stable under reordering. Always prefer for_each for non-trivial collections.

Provider version pinning. A minor provider upgrade can change resource schemas subtly, producing drift that isn’t real. Pin provider versions explicitly and upgrade deliberately.

lifecycle { prevent_destroy = true }. Use this on irreplaceable resources (databases, S3 buckets with data). It prevents terraform destroy from nuking them. Do not use it on everything — it gets in the way of legitimate changes.

Terraform workspaces. Don’t use them for environment separation (dev/staging/prod). Workspaces share a single backend configuration and are easy to confuse. Use separate state files (via directory structure) per environment instead. Workspaces are for rare cases like “multiple tenants with identical configuration but different state.”

The depends_on escape hatch. Most dependencies are implicit (resource A references resource B). Explicit depends_on is needed when the dependency is on a side effect. Overuse produces slower, more brittle plans.

Long apply times. A state with 3000+ resources can take 10+ minutes just to refresh. Splitting state is the fix, but splitting is painful. Plan the split early.

The GitOps boundary. IaC manages infrastructure (VPCs, clusters, IAM, databases). GitOps (Chapter 107) manages cluster resources (Deployments, Services, ConfigMaps). Don’t use IaC to manage cluster resources; it’s slower, noisier, and doesn’t handle drift well. The line is: IaC stops at the cluster boundary.

Destroying is rare. In production, you almost never terraform destroy. The correct pattern for “I want to remove a resource” is to delete the code and let the next apply destroy it, which gives you the usual plan review. terraform destroy is for dev/test environments, not production.

110.11 The mental model

Eight points to take into Chapter 111:

IaC is non-negotiable. Everything else in platform engineering assumes it works.
Declarative vs real-code is a genuine philosophical split. Terraform is declarative-DSL; Pulumi and CDK are real-code. Both work.
Terraform is the default for most teams. Pulumi when you have strong developers and complex platforms. CDK when you’re AWS-only.
State management is the hardest part. Split state by (environment, layer). Back it up. Lock it. Fear the state subcommands.
OIDC-for-CI replaces long-lived keys. Short-lived credentials, authenticated by the CI identity.
Plan-on-PR, apply-on-merge is the canonical workflow. The plan is the review artifact. No manual applies.
Drift is normal and has two fixes: revert reality or update code. Import existing resources aggressively in year one, then never.
Secrets don’t live in IaC. IaC manages pointers to secrets; the secret store holds the values (Chapter 111).

In Chapter 111, the secrets problem gets the full treatment: where secrets actually live, how they rotate, and how they reach the workloads that need them.

Read it yourself

The Terraform documentation, especially the backends, state, and imports sections.
Terraform: Up & Running (Brikman, 3rd ed., O’Reilly, 2022).
The Pulumi documentation, especially the state management and the automation API sections.
The AWS CDK Developer Guide and the Constructs library.
The OpenTofu project documentation (the open-source fork of Terraform).
HashiCorp’s Terraform Best Practices and Terraform Recommended Practices guides.
The GitHub Actions OIDC documentation for AWS, GCP, and Azure.

Practice

Write a Terraform configuration for a VPC with three public and three private subnets across three AZs. Use for_each, not count.
Configure an S3 + DynamoDB backend with encryption and locking. Explain each field in the backend config.
Design a state-splitting scheme for a platform with 4 environments (dev, staging, prod, prod-eu) and 5 layers (network, identity, clusters, databases, apps). How many state files? How do they reference each other?
Write the GitHub Actions workflow for plan-on-PR and apply-on-merge with AWS OIDC. Include the trust policy for the IAM role.
Explain why terraform destroy should almost never be run in production and what the correct workflow is for removing a resource.
Compare Terraform, Pulumi, and AWS CDK on four axes: type safety, readability, ecosystem size, state management. Which wins each?
Stretch: Take an existing manually-created cloud resource (say, a real RDS instance you own) and bring it under Terraform management via terraform import. Document every step, every plan diff, and every code change needed to make the plan empty.