Chapter 1: The City That Learned Too Fast

What you'll learn

How to establish baseline distributions for an ML system's inputs and predictions.
How to create a reference window and use it for drift detection.
Which tests and distances to use (KS, PSI), and why.

1. Baseline first: define normal

Monitoring and observability start with a clear definition of "normal." In production ML, that means:

a reference window of data (e.g., last 14 days before launch)
feature profiles (summary stats + histograms)
an initial prediction score profile (if available)

This is the known state we compare against later windows. Monitoring answers that something changed; observability helps us ask why.

1.1 Example schema (rides table)

We'll use a simple ride-sharing schema throughout the guide:

rides_baseline schema
column	type	notes
ride_id	string	unique id
timestamp	datetime	event time (UTC)
pickup_zone	string	city grid cell id
dropoff_zone	string	city grid cell id
trip_distance_km	float	continuous
surge_multiplier	float	continuous (>=1)
fare_amount	float	continuous
driver_eta_min	float	model output (optional in Ch1)

2. Visualizing the baseline

Below are baseline histograms and descriptive stats. These serve as your reference profiles for P(X) features—trip distance, surge multiplier, and fare.

Select feature

Loading histogram data...

Why histograms? Two-sample tests (e.g., KS for continuous features; Chi-squared for categorical) tell you if today's window likely came from the same distribution as the baseline window. But pictures (plus summary stats) help engineers reason quickly about where the change is (center, spread, tails).

3. Today vs. Baseline: measuring shift

When labels lag, compare inputs P(X) and model outputs P(ŷ) over time. That's standard in industry monitoring stacks.

We'll use:

KS test (continuous): simple, non-parametric, compares empirical CDFs
PSI (binned, symmetric): widely used for production drift dashboards; easy thresholds for alerting

Calculating PSI...

< 0.10: stable
0.10–0.25: moderate shift (watch)
≥ 0.25: major shift (investigate, retrain or fix)

PSI thresholds (rule-of-thumb)

4. Run it yourself (data + code)

import numpy as np, pandas as pd
rng = np.random.default_rng(7)

N0, N1 = 20000, 8000 # baseline, today

# Baseline distributions
trip0 = np.clip(rng.normal(6.5, 2.0, N0), 0.5, None)
surge0 = np.clip(rng.lognormal(mean=0.05, sigma=0.15, size=N0), 1.0, None)
fare0 = np.clip(35 + trip0*3.2 + rng.normal(0, 5, N0), 5, None)

df0 = pd.DataFrame({
"ride_id": [f"b_{i}" for i in range(N0)],
"timestamp": pd.date_range("2025-09-01", periods=N0, freq="min"),
"pickup_zone": rng.choice([f"Z{i:03d}" for i in range(40)], size=N0),
"dropoff_zone": rng.choice([f"Z{i:03d}" for i in range(40)], size=N0),
"trip_distance_km": trip0,
"surge_multiplier": surge0,
"fare_amount": fare0,
})

# Today's window with subtle shift (slightly longer trips, heavier tail)
trip1 = np.clip(rng.normal(7.2, 2.3, N1), 0.5, None)
surge1 = np.clip(rng.lognormal(mean=0.08, sigma=0.18, size=N1), 1.0, None)
fare1 = np.clip(36 + trip1*3.4 + rng.normal(0, 6, N1), 5, None)

df1 = pd.DataFrame({
"ride_id": [f"t_{i}" for i in range(N1)],
"timestamp": pd.date_range("2025-10-01", periods=N1, freq="min"),
"pickup_zone": rng.choice([f"Z{i:03d}" for i in range(40)], size=N1),
"dropoff_zone": rng.choice([f"Z{i:03d}" for i in range(40)], size=N1),
"trip_distance_km": trip1,
"surge_multiplier": surge1,
"fare_amount": fare1,
})

df0.to_csv("rides_baseline.csv", index=False)
df1.to_csv("rides_today.csv", index=False)
print("Wrote rides_baseline.csv and rides_today.csv")

5. What to alert on in Chapter 1

PSI ≥ 0.25 on any high-importance feature → Alert
0.10 ≤ PSI < 0.25 → Warn, annotate and watch next window
KS p-value < 0.01 for major features → annotate Drift suspected

Why this mix? Labels can be delayed; monitoring P(X) and P(ŷ) is crucial in those situations. Summary stats + tests help quickly narrow the where and how of change.

6. Where this connects (foreshadow)

This chapter ends with a subtle alert (PSI rising) that will carry into Chapter 2: Covariate Shift. We'll add spatial hexbins and a control-room view of evolving frequencies across zones, then step into concept drift later. Observability widens from "that it changed" to "why it changed".