Chapter 91: Feature stores: Feast, Tecton, the offline/online split

A feature store is infrastructure whose entire job is to guarantee that the feature values a model sees in training are the same feature values it sees in production. That sentence sounds like a tautology until you realize that classical ML systems in the wild routinely fail this constraint, silently, and the result is a model that looks good in offline evaluation and degrades in production in ways that take weeks to diagnose. The technical name for the failure is training-serving skew, and it is the single most common cause of “we deployed a model and it performs worse than the offline numbers said it would.” The feature store exists because enough teams independently rediscovered this pain that a category of infrastructure emerged to solve it.

This chapter is the part-VII capstone. It explains why feature stores exist, what the offline/online split actually is, how point-in-time correctness is the hardest part, how Feast and Tecton solved it, and why the rise of LLMs made feature stores less relevant for one part of the ML world while keeping them essential for another. At the end, the chapter closes out Part VII with a recap of how the seven data-plane chapters (object storage, document stores, time-series, Kafka, caching, lakehouses, feature stores) fit together into a single storage narrative.

Outline:

Classical ML and the feature pipeline.
Training-serving skew and point-in-time correctness.
The offline/online split.
Feature views as the abstraction.
Feast: the open-source minimal core.
Tecton and the managed approach.
What LLMs changed.
Feature stores for ranking and retrieval.
When a feature store is the wrong answer.
Part VII recap.
The mental model.

91.1 Classical ML and the feature pipeline

Classical ML — recommender systems, ranking models, fraud detection, churn prediction, pricing, personalization — runs on features, which are scalar or small-vector summaries of an entity’s state. For a user, features might be “number of sessions in the last 7 days,” “median session duration over 30 days,” “last country seen,” “has_active_subscription.” For a merchant, features might be “transactions per hour,” “fraud rate over 24 hours,” “average ticket size.” Features are the vocabulary in which the model reasons about the world, and the model itself is usually a relatively simple regression or tree-based ensemble over those features.

The feature pipeline is the code that computes features from raw data. In practice it’s usually a Spark or SQL job that reads event logs and transactional tables, aggregates them by entity and time window, and writes the result to a dataset used for training. When the model is retrained, the pipeline runs again over the latest data. So far, this is just ETL.

The problem shows up at serving time. When the model is deployed, the inference path needs to fetch features for live entities — a user just made a request, give me their features, run them through the model, return a prediction. The live features are not the same as the training features, in two deep ways:

Different compute paths. The training features came from a batch job that ran on historical data; the serving features have to come from a low-latency lookup. Whatever produced the training features — a Spark aggregation, a complex SQL window function — cannot be naively re-run on a single row in the serving path.
Different timing. In training, the feature for a row labeled “event at time T” should reflect state as of time T, not today. If the feature “number of sessions in last 7 days” was computed on today’s data and labeled with last month’s event, the model learns from the future: the feature leaks information the model wouldn’t have at serving time.

Both problems are easy to state and hard to solve. A feature store is the infrastructure that exists to solve them.

Without a feature store, training reads from the warehouse and serving reads from Redis with different code — these two compute paths drift apart over time, silently degrading model quality in production.

91.2 Training-serving skew and point-in-time correctness

Training-serving skew is the general name for “the feature values the model sees in training differ from the values it sees at serving time in ways that hurt performance.” It comes in several flavors:

Implementation skew. The training feature is computed by SQL, the serving feature is computed by Python, and the two implementations are subtly different. The SQL rounds differently, or handles NULLs differently, or treats time zones differently. The model learned on one distribution and sees another.

Timing skew. The training features are “as of yesterday” because the batch job runs daily. The serving features are “as of now.” The distribution differs, especially for fast-moving features like “sessions in the last hour.”

Data source skew. The training pipeline reads from the data warehouse, which has one version of the events. The serving pipeline reads from a Redis cache, which has another version with different late-arriving events. They disagree.

Label leakage. The most dangerous. The training feature incorporates data from the future relative to the training label. In training, this looks like a very predictive feature. In serving, the data simply doesn’t exist yet, and the feature collapses to a default value. The model crashes.

Point-in-time correctness is the technical property that eliminates label leakage. Formally: for each training row with event time T, the feature value used for that row must be the value that would have been observable at time T. Not the latest value, not the batch-job value, the as-of-T value.

This is hard because it requires the feature store to be able to answer historical queries: “what was the value of feature X for entity Y at time T?” — for every (X, Y, T) tuple in the training set. For a training set with millions of rows, each of which asks a different (Y, T) question, this is a lot of historical queries. Naively, it’s an enormous number of point lookups over a time-series database. In practice, it’s a sophisticated join operation on a table that carries the full history of feature values.

The canonical implementation: feature values are stored as a slowly-changing-dimension table with (entity_id, valid_from, valid_to, feature_value) rows. To compute point-in-time features for a training set, you join the training set to the feature table on entity_id and require that the training row’s event_time falls in the feature row’s valid interval. This is an as-of join (sometimes called a point-in-time join) and it’s the core operation a feature store has to do efficiently.

The as-of join picks the feature row whose valid_from is the latest value not exceeding the training row's event_time — the Apr 22 update is invisible to the Apr 17 label, eliminating label leakage entirely.

91.3 The offline/online split

The classical feature store architecture is built around two storage layers that serve different purposes:

The offline store holds the full history of feature values. It is typically backed by a lakehouse (Chapter 90) or a warehouse, with feature tables containing (entity_id, timestamp, feature_columns…) rows. The offline store is optimized for large, time-aware queries: “give me features for 10 million training rows, point-in-time correct.” It has high latency per row (seconds or minutes for a batch) but huge throughput.

The online store holds the latest feature values for each entity. It is typically backed by a low-latency key-value store like Redis, DynamoDB, or Cassandra. The online store is optimized for point lookups: “give me the current features for entity Y.” Latency is sub-millisecond; there is no history. When a new feature value is computed, it overwrites the previous one.

The two stores are kept in sync by the feature pipeline. A batch job writes to the offline store (building up history) and to the online store (overwriting the latest values). A streaming job, if features are computed in real time, does the same thing on a per-event basis. The feature store’s job is to guarantee that the two stores agree on what “the latest” value is, so that training (offline) and serving (online) see the same definitions.

The split is the key architectural insight. You cannot use a single store for both workloads. A warehouse is too slow for serving. A key-value store has no history and cannot do point-in-time joins. The feature store’s value is in presenting a unified API over the two stores — the application asks “give me features for this entity” and gets the right answer, whether the caller is a training job or an inference service.

One feature pipeline writes to both the offline store (for point-in-time training) and the online store (for sub-millisecond serving) — training and serving consume features through the same API, making implementation skew structurally impossible.

In practice, a feature definition is written once and applied automatically to both stores. The feature pipeline code doesn’t split; only the destinations do. This is what “feature registry” means in a feature store context: a central catalog of feature definitions that gets materialized into both stores.

91.4 Feature views as the abstraction

The unit of feature-store plumbing is a feature view (or “feature group” in Tecton, “feature service” in some systems). A feature view is a named set of related features that share:

An entity (the thing the features describe — user, merchant, product).
An event timestamp column (used for point-in-time correctness).
A source (the table or stream that produces the features).
A freshness SLA (how stale the online values are allowed to be).
A TTL (how far back the offline values are retained).

Example, in Feast syntax:

from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
from datetime import timedelta

user = Entity(name="user", join_keys=["user_id"])

user_features_source = FileSource(
    path="s3://my-bucket/features/user_features.parquet",
    timestamp_field="event_timestamp",
)

user_stats = FeatureView(
    name="user_stats",
    entities=[user],
    ttl=timedelta(days=90),
    schema=[
        Field(name="sessions_7d", dtype=Int64),
        Field(name="avg_session_duration_s", dtype=Float32),
        Field(name="last_country", dtype=String),
    ],
    source=user_features_source,
)

This definition creates the infrastructure: offline materialization from the Parquet source, online materialization to whatever online store is configured, and the API to fetch features from either store.

At training time:

training_df = store.get_historical_features(
    entity_df=labels_df,   # has (user_id, event_timestamp, label) columns
    features=[
        "user_stats:sessions_7d",
        "user_stats:avg_session_duration_s",
        "user_stats:last_country",
    ],
).to_df()

The feature store does the as-of join: for each row in labels_df, it finds the feature values that were valid at that row’s event timestamp and joins them in. Point-in-time correctness is automatic.

At serving time:

features = store.get_online_features(
    features=[
        "user_stats:sessions_7d",
        "user_stats:avg_session_duration_s",
        "user_stats:last_country",
    ],
    entity_rows=[{"user_id": user_id}],
).to_dict()

The feature store does a key-value lookup in the online store. No history, no joins, just the latest values. Sub-millisecond latency.

The same feature view powers both. The definitions live in one place, the pipelines are coupled, and training-serving skew is eliminated by construction.

91.5 Feast: the open-source minimal core

Feast (Feature Store) is the most widely used open-source feature store. It started as a project at Gojek, was adopted and incubated by Google Cloud and Tecton, and is now a standalone open project. Feast’s philosophy is to be a thin, modular layer that plugs into whatever storage infrastructure a team already has, rather than a full platform. The offline store can be Parquet on S3, BigQuery, Snowflake, Redshift, or a lakehouse table. The online store can be Redis, DynamoDB, Cassandra, Postgres, or Datastore. Feast provides the glue, the Python SDK, the feature registry, and the materialization jobs.

Feast’s architecture:

Registry: a Python-parseable file or remote service that holds feature view definitions, entities, and feature services.
Offline store: pluggable, abstracted behind an interface. Feast knows how to do point-in-time joins on each supported backend.
Online store: pluggable, abstracted similarly. Feast materializes feature values from the offline store into the online store on a schedule.
Materialization jobs: periodic jobs that read the offline store and write to the online store for the entities and features defined in the registry.
Python SDK: the library applications use to fetch features.

Feast is minimal by design. It doesn’t own the compute (you bring your own Spark or SQL), doesn’t own the storage (you bring your own offline and online stores), and doesn’t own the orchestration (you use Airflow or whatever). It owns the feature definition schema and the feature lookup semantics. This minimalism is the reason it’s widely adopted — it fits into any stack — but it’s also the reason teams often layer their own platform on top, because the bare Feast installation leaves a lot of ergonomics to the team.

For a medium-size ML team that already has a warehouse and a Redis cluster, Feast is the right choice. For a team that wants a full managed platform with streaming, monitoring, and SLAs, Feast alone is not enough.

91.6 Tecton and the managed approach

Tecton is a commercial feature store, founded by the original Uber Michelangelo team. It takes the opposite philosophy from Feast: Tecton owns the full stack from feature definitions to compute to serving. You write a feature definition in Tecton’s DSL, Tecton runs the compute (Spark or Rift, their vectorized engine), Tecton writes to its own offline and online stores, and Tecton’s API serves the features.

The practical differences:

Streaming features. Tecton has strong support for streaming feature pipelines (Kafka → Spark Structured Streaming → online store) with latency in the seconds. Feast can do streaming but you build more of the pipeline yourself.
Embedded compute. Tecton computes features on ingest, materializes to the online store, and serves via low-latency REST. Feast requires you to run the materialization jobs separately.
Feature monitoring. Tecton ships with feature drift monitoring, freshness SLA tracking, and quality dashboards. Feast has basic hooks for these but you bring your own observability.
On-demand features. Tecton supports “on-demand” features that are computed at serving time from request-level data combined with precomputed features. Useful for features like “user’s current latitude rounded to a grid cell” where the raw input comes with the request.
Cost. Tecton is SaaS and expensive. Feast is free (minus the storage and compute you’re already paying for).

The choice between Feast and Tecton is roughly the choice between “we have a platform team and want control” and “we don’t have a platform team and want velocity.” Both are legitimate. For large organizations with their own infrastructure, Feast or a custom-built feature store is typical. For startups and mid-size companies wanting to move fast, Tecton makes sense.

Other feature stores worth knowing: Vertex AI Feature Store (GCP’s managed offering), SageMaker Feature Store (AWS), Hopsworks (a European open-source option with stronger time-travel semantics), Databricks Feature Store (native to the Databricks lakehouse). Each fits specific ecosystems.

91.7 What LLMs changed

The rise of LLMs changed the relevance of feature stores for a specific part of the ML world. For pure LLM serving — a user sends a prompt, the model generates text — there are no “features” in the classical sense. The model’s inputs are tokens, and the context is the literal prompt. You don’t have a feature pipeline, a feature view, or an offline/online split. The feature store is simply not needed.

This is why a lot of 2023-2024 discourse about ML infrastructure skipped over feature stores entirely. When your whole ML stack is a vLLM deployment in front of Llama 3, a feature store is irrelevant.

But the LLM era didn’t eliminate classical ML. The systems that generate billions of dollars per year — recommendation, ranking, ads, search, fraud, pricing — still run classical ML models. Those models still need features. Those systems still have training-serving skew problems. Those teams still use feature stores. The feature store became less universally important, not less important in its domain.

There is also a hybrid emerging. LLM-powered systems sometimes want classical features to steer or guide the LLM. A RAG system might use a user-preferences feature to bias retrieval. A conversational commerce agent might use a user’s historical spend as an input to its pricing function. In these hybrid systems, the feature store serves the classical features and the LLM serves the generative step, and the two meet at a well-defined interface. Feast and Tecton both support this pattern, often by exposing features as tool inputs to an agent framework.

For senior ML systems interviews in 2026, the right framing is: feature stores solve training-serving skew and point-in-time correctness for classical ML, which is still the dominant business ML application. LLMs are an additional workload, not a replacement. A candidate who handwaves away feature stores as “not relevant anymore” is signaling that they don’t understand where the money is made.

91.8 Feature stores for ranking and retrieval

The most defensible current use case for a feature store in a modern ML system is ranking: the second-stage model that takes a retrieval candidate set and re-orders it based on user and item features. Every recommendation system, search engine, and ads platform has one.

The ranking model’s input is a high-dimensional feature vector combining:

User features: demographics, recent behavior, derived preferences, engagement rates.
Item features: content type, popularity, recency, author/publisher identity.
Context features: device, time of day, location, session state.
Interaction features: click-through rate for this user-item pair, recency of last interaction.

These features need point-in-time correctness at training time (the model sees features as they were when the user saw the item, not as they are now) and low-latency access at serving time (the ranker runs on a candidate set of hundreds of items, all features fetched in parallel, in tens of milliseconds). A feature store is the natural infrastructure for this.

The serving-time pattern: the request arrives, the retriever produces a candidate set, the ranker fetches features for (user, item_1), (user, item_2), … (user, item_N) in a single batched get_online_features call, scores each candidate, and returns the top-K. The feature store must support batch lookups across entity combinations, and the online store must have sub-millisecond p99 latency to fit in the overall ranking budget (typically 50-150 ms).

Feast and Tecton both optimize for this pattern. The feature fetch is usually the second-biggest latency budget in a ranking pipeline, after the model itself. Getting it right — low latency, high hit rate, correct values — is the difference between a responsive ranker and a slow one.

91.9 When a feature store is the wrong answer

A feature store is not the right solution when:

You don’t have training-serving skew. If your serving features come from the same query as your training features (e.g., both run a SQL query against Postgres on demand), you don’t have skew. You’re doing “features on demand” and a feature store adds complexity without value.
You have a very small feature set. If your model has 10 features and one batch job, a feature store is over-engineered. A single well-written Python module plus a Redis hash is enough.
Your features are pure LLM inputs. Tokens and prompts are not features. No store needed.
You have exactly one model. Feature stores pay off because they let multiple models share feature definitions. A single model with a single pipeline can get away without one.
You don’t have point-in-time correctness problems. If your features don’t depend on history, or if the history is so short it doesn’t matter, point-in-time isn’t an issue and a KV store suffices.

The right way to think about feature stores: invest when you have three or more models sharing features, when you have genuinely point-in-time-sensitive training, and when you have a serving path that can’t accept warehouse-query latency. Below that bar, a feature store is overhead. Above it, you will build one whether you know it or not — better to use an existing one than accrete a custom solution.

91.10 Part VII recap

Part VII covered the data plane of an ML system — the storage and data-movement layer that lives underneath the inference and training stacks. The seven chapters fit together as a coherent architecture:

Chapter 85 established the bulk-bytes substrate: object storage (the S3 model) as the universal key-value store for unstructured and semi-structured data. Model weights, Parquet files, checkpoints, logs, and archives all live here. Object storage is the foundation of every chapter that follows.

Chapter 86 covered structured operational metadata: document stores (MongoDB) and key-value stores (DynamoDB). These hold the per-request metadata, user sessions, feature flags, model registry entries — the control plane of an ML serving system.

Chapter 87 addressed the special shape of time-indexed data: time-series databases (TimescaleDB, Prometheus TSDB, InfluxDB). Every metric, every latency sample, every GPU exporter reading ends up here, stored efficiently with time-bounded retention and compression.

Chapter 88 covered data in motion: Kafka and the log abstraction. The streaming layer that decouples producers from consumers and moves events between services — inference request logs, CDC streams, feature pipelines, telemetry fan-out.

Chapter 89 covered volatile acceleration: caches (Redis, patterns, stampede protection). The low-latency layer that absorbs hot-path reads, reduces backend load, and keeps p99 latencies predictable.

Chapter 90 covered the lakehouse: Parquet + Iceberg/Delta/Hudi. The layer that turns object storage into a queryable, transactional, time-traveling store of training data, feature tables, and analytical history. The replacement for the data warehouse for most ML workloads.

Chapter 91 — this chapter — covered the ML-specific layer on top: feature stores. The abstraction that unifies offline and online storage around point-in-time-correct feature definitions, eliminating training-serving skew for classical ML.

Stacked from the bottom: object storage → time-series / document stores → Kafka log → lakehouse tables → feature store → cache → application. Each layer solves a specific problem the ones above it cannot. Each layer fails differently, costs differently, and scales differently. A senior ML systems engineer has an opinion on all seven and a preferred implementation for each. Part VIII will move up into observability — how you see what’s happening across all of this in real time.

graph TD
  App["Application / Inference Service"]
  Cache["Ch 87 — Cache<br/>(Redis, patterns)"]
  FS["Ch 89 — Feature Store<br/>(Feast / Tecton)"]
  LH["Ch 88 — Lakehouse<br/>(Iceberg / Delta on S3)"]
  Kafka["Ch 86 — Kafka<br/>(streaming log)"]
  TSDB["Ch 85 — Time-Series DB<br/>(Prometheus / Timescale)"]
  Doc["Ch 84 — Document Store<br/>(Mongo / Dynamo)"]
  ObjS["Ch 83 — Object Storage<br/>(S3 model)"]

  App --> Cache
  App --> FS
  FS --> LH
  FS --> Cache
  LH --> ObjS
  Kafka --> LH
  Kafka --> FS
  TSDB --> ObjS
  Doc --> App

  style Cache fill:var(--fig-accent-soft),stroke:var(--fig-accent)
  style ObjS fill:var(--fig-surface),stroke:var(--fig-border)

The seven data-plane chapters form a layered dependency graph — object storage is the substrate everything else builds on, Kafka is the motion layer, the lakehouse is the structured history layer, and the feature store is the ML-specific serving adapter at the top.

91.11 The mental model

Eight points to take into Chapter 92:

Feature stores solve training-serving skew and point-in-time correctness for classical ML systems.
The offline/online split is the core architectural pattern: lakehouse/warehouse for history, KV store for latest.
A feature view is the unit of definition: entity, timestamp, source, features. One definition, two materializations.
Point-in-time correctness requires as-of joins at training time: match each training row to the feature values valid at its timestamp.
Feast is the open-source minimal core — modular, pluggable, bring-your-own storage.
Tecton is the managed, full-stack alternative with streaming features, on-demand features, and monitoring built in.
LLMs reduced the universal relevance of feature stores but didn’t eliminate them — classical ML is still the business ML dominant workload.
Use a feature store when you have multiple models, genuine point-in-time needs, and a serving latency budget incompatible with warehouse queries.

In Chapter 92, Part VIII begins: observability across the whole system.

Read it yourself

The Feast documentation and the Feast architectural overview on feast.dev.
The Tecton blog’s series on feature platforms, especially the posts on online/offline consistency and the history of Michelangelo.
Jeremy Hermann and Mike Del Balso, Meet Michelangelo: Uber’s Machine Learning Platform (2017). The blog post that introduced the feature store idea to the industry.
Chip Huyen, Designing Machine Learning Systems, Chapter 5 on feature engineering and the feature store chapter.
The Hopsworks documentation on time-travel feature semantics.
Simba Khadder’s blog posts on the feature store landscape (Featureform founder).

Practice

Walk through a concrete training-serving skew scenario. A team has features computed by a nightly Spark job. At serving time, features come from Redis. Identify three ways this could produce skew.
Implement an as-of join in SQL between a training set with (entity_id, event_timestamp, label) and a feature table with (entity_id, valid_from, feature_value). How does it differ from a regular join?
Design a feature view in Feast for “rolling 7-day transaction count per merchant.” What are the source, entity, schema, and materialization pattern?
Compare Feast and Tecton on these axes: who owns compute, who owns storage, how streaming features work, how freshness is monitored.
Construct an argument for why LLM-only serving systems don’t need feature stores, then construct a counter-argument for why a hybrid LLM+classical system might.
A ranking model needs to fetch 500 (user, item) feature vectors in under 30 ms. How does the feature store API have to be designed? What’s the backend choice?
Stretch: Stand up Feast with a local Parquet offline store and a local Redis online store. Define a feature view, materialize, and do both a historical-features fetch (for training) and an online-features fetch (for serving). Verify they return the same feature values at the same timestamp.