The City That Learned Too Fast

Baseline distributions & drift detection (PSI, KS test)

What you'll learn

  • How to establish baseline distributions for an ML system's inputs and predictions.
  • How to create a reference window and use it for drift detection.
  • Which tests and distances to use (KS, PSI), and why.

1. Baseline first: define normal

Monitoring and observability start with a clear definition of "normal." In production ML, that means:

  • a reference window of data (e.g., last 14 days before launch)
  • feature profiles (summary stats + histograms)
  • an initial prediction score profile (if available)

This is the known state we compare against later windows. Monitoring answers that something changed; observability helps us ask why.

1.1 Example schema (rides table)

We'll use a simple ride-sharing schema throughout the guide:

rides_baseline schema
columntypenotes
ride_idstringunique id
timestampdatetimeevent time (UTC)
pickup_zonestringcity grid cell id
dropoff_zonestringcity grid cell id
trip_distance_kmfloatcontinuous
surge_multiplierfloatcontinuous (>=1)
fare_amountfloatcontinuous
driver_eta_minfloatmodel output (optional in Ch1)

2. Visualizing the baseline

Below are baseline histograms and descriptive stats. These serve as your reference profiles for P(X) features—trip distance, surge multiplier, and fare.

Loading histogram data...

Why histograms? Two-sample tests (e.g., KS for continuous features; Chi-squared for categorical) tell you if today's window likely came from the same distribution as the baseline window. But pictures (plus summary stats) help engineers reason quickly about where the change is (center, spread, tails).

3. Today vs. Baseline: measuring shift

When labels lag, compare inputs P(X) and model outputs P(Ĺ·) over time. That's standard in industry monitoring stacks.

We'll use:

  • KS test (continuous): simple, non-parametric, compares empirical CDFs
  • PSI (binned, symmetric): widely used for production drift dashboards; easy thresholds for alerting

Calculating PSI...

  • < 0.10: stable
  • 0.10–0.25: moderate shift (watch)
  • ≥ 0.25: major shift (investigate, retrain or fix)
PSI thresholds (rule-of-thumb)

4. Run it yourself (data + code)

import numpy as np, pandas as pd
rng = np.random.default_rng(7)

N0, N1 = 20000, 8000 # baseline, today

# Baseline distributions
trip0 = np.clip(rng.normal(6.5, 2.0, N0), 0.5, None)
surge0 = np.clip(rng.lognormal(mean=0.05, sigma=0.15, size=N0), 1.0, None)
fare0 = np.clip(35 + trip0*3.2 + rng.normal(0, 5, N0), 5, None)

df0 = pd.DataFrame({
"ride_id": [f"b_{i}" for i in range(N0)],
"timestamp": pd.date_range("2025-09-01", periods=N0, freq="min"),
"pickup_zone": rng.choice([f"Z{i:03d}" for i in range(40)], size=N0),
"dropoff_zone": rng.choice([f"Z{i:03d}" for i in range(40)], size=N0),
"trip_distance_km": trip0,
"surge_multiplier": surge0,
"fare_amount": fare0,
})

# Today's window with subtle shift (slightly longer trips, heavier tail)
trip1 = np.clip(rng.normal(7.2, 2.3, N1), 0.5, None)
surge1 = np.clip(rng.lognormal(mean=0.08, sigma=0.18, size=N1), 1.0, None)
fare1 = np.clip(36 + trip1*3.4 + rng.normal(0, 6, N1), 5, None)

df1 = pd.DataFrame({
"ride_id": [f"t_{i}" for i in range(N1)],
"timestamp": pd.date_range("2025-10-01", periods=N1, freq="min"),
"pickup_zone": rng.choice([f"Z{i:03d}" for i in range(40)], size=N1),
"dropoff_zone": rng.choice([f"Z{i:03d}" for i in range(40)], size=N1),
"trip_distance_km": trip1,
"surge_multiplier": surge1,
"fare_amount": fare1,
})

df0.to_csv("rides_baseline.csv", index=False)
df1.to_csv("rides_today.csv", index=False)
print("Wrote rides_baseline.csv and rides_today.csv")

5. What to alert on in Chapter 1

  • PSI ≥ 0.25 on any high-importance feature → Alert
  • 0.10 ≤ PSI < 0.25 → Warn, annotate and watch next window
  • KS p-value < 0.01 for major features → annotate Drift suspected

Why this mix? Labels can be delayed; monitoring P(X) and P(Ĺ·) is crucial in those situations. Summary stats + tests help quickly narrow the where and how of change.

6. Where this connects (foreshadow)

This chapter ends with a subtle alert (PSI rising) that will carry into Chapter 2: Covariate Shift. We'll add spatial hexbins and a control-room view of evolving frequencies across zones, then step into concept drift later. Observability widens from "that it changed" to "why it changed".