alertingregressionssre

Alerting on Performance Regressions Without Alert Fatigue

By Grace Yoon··7 min read
Deploy-correlated p75 alerting catches regressions in minutes. Here is how to set thresholds that fire when it matters and stay quiet when it does not.

A deploy goes out at 3pm on a Thursday. By 3:45pm, mobile p75 LCP on the product listing page has climbed 900ms. Nobody sees it until Monday morning, when the weekend analytics summary shows a spike in mobile bounce rate. The post-mortem produces two questions that should have been answerable Friday morning: which deploy caused this, and what changed? By Monday, those questions take hours to answer because the team has shipped two more deploys in the meantime, and the specific code change that caused the regression is buried in the git log.

This is the canonical performance regression scenario that deploy-correlated alerting prevents. It is also the scenario that most alerting configurations fail to catch, because most performance alerting is either too sensitive (fires on normal traffic variation) or too coarse (only detects catastrophic failures, not 800ms p75 regressions).

Why Standard Alerting Fails for Performance

Standard infrastructure alerting — error rate thresholds, HTTP 5xx counts, response time averages — does not translate cleanly to web performance. A p75 LCP regression does not produce errors. It does not affect server-side response times. It does not show up in your APM tool's service map. It is a client-side rendering performance change that may coexist with a perfectly healthy server stack.

Threshold-based alerting on performance metrics has a specific failure mode: static thresholds set wide enough to avoid false positives are too wide to catch meaningful regressions. If your p75 LCP has historically ranged from 2.0 to 2.4 seconds, setting an alert threshold at 3.5 seconds will not catch a 600ms regression that moves p75 from 2.2 to 2.8 seconds. That 600ms difference is real, it affects user experience, and it is invisible to the threshold.

Percentage-change alerting (fire when p75 increases by more than X% from recent baseline) is better, but still suffers from baseline contamination: if a slow trend has been building over weeks, your recent baseline is already elevated, and the percentage threshold may not fire even on a significant instantaneous regression on top of that drift.

Deploy-Correlated Alerting: The Core Pattern

Deploy-correlated alerting ties performance metric changes directly to deployment events rather than to absolute thresholds or historical baselines. The mechanism: when a deploy marker is injected into your monitoring pipeline, the alerting system begins comparing post-deploy p75 to pre-deploy p75 for the same pages, same segment (mobile vs. desktop), same time-of-day cohort. A significant divergence between the pre- and post-deploy windows triggers the alert.

The pre/post window design matters significantly:

Window duration. Post-deploy windows need enough sessions to reach statistical confidence. For high-traffic pages with thousands of sessions per hour, a 15-minute post-deploy window may be sufficient. For lower-traffic pages or segments, a 30-60 minute window is more appropriate. Too short, and normal sampling variance triggers false positives. Too long, and the detection lag grows.

Comparison cohort. Comparing post-deploy Friday afternoon p75 to pre-deploy Friday morning p75 is straightforward. Comparing post-deploy Monday morning p75 to pre-deploy Friday afternoon p75 is problematic — the traffic profile (user mix, device distribution, geographic distribution) may differ significantly across the weekend. The comparison cohort should either be the immediate pre-deploy window or the same time slot from the prior week.

Segment filtering. Deploy-correlated alerts should fire at the segment level where regressions are visible, not just at the aggregate level. A regression that affects only mobile 4G users may not move the overall aggregate p75 enough to trigger an aggregate-level alert, but it may represent a 1.5-second mobile regression for 35% of your sessions.

A Scenario: Catching the Regression in Time

A growing e-commerce platform ships a dependency update that pulls in a newer version of a UI component library. The update adds approximately 28KB to their main JavaScript bundle — an amount that passed their bundle size budget check in CI (their budget was set at 30KB headroom). Lighthouse scores in CI are unchanged: the controlled emulation environment does not pick up the execution cost difference on real mobile hardware.

In production, the additional bundle parses slower on mid-range Android devices. Mobile p75 LCP climbs from 2.6 seconds to 3.5 seconds within the first 20 minutes post-deploy, as real-user data accumulates. A deploy-correlated alert fires, linking the p75 increase to that specific deploy hash. The on-call frontend engineer has the regression attributed and a rollback initiated within 35 minutes of deploy time. Total user exposure to the regression: approximately 400 sessions, mostly during afternoon peak traffic.

Without the deploy-correlated alert, the regression would have been noticed in the Monday morning weekly review. Total exposure: the entire weekend traffic volume, and a multi-day debugging process to identify which of the four deploys shipped over the two-day window was responsible.

Avoiding Alert Fatigue

Alert fatigue is the terminal disease of monitoring systems. When an alerting system fires too frequently on non-actionable conditions, engineers learn to ignore it. The fatigue is rational: if the signal-to-noise ratio is low, the cost of investigating every alert exceeds the cost of occasionally missing a real regression. The outcome is that real regressions are missed.

For performance alerting specifically, the primary causes of alert fatigue are:

Alerting on absolute thresholds during high-variance periods. Time-of-day traffic variation affects p75. Peak-hour sessions may skew toward slower device/connection profiles than off-peak sessions. An alert threshold calibrated to off-peak baselines will fire during peak hours on volatile pages without any actual regression occurring.

Alerting on too many segments simultaneously. If you have fifteen page × segment combinations each alerting independently, you will get fifteen alerts for a single regression that affects mobile across all your pages. Alert correlation — grouping related alerts into a single incident with a shared root-cause hypothesis — is essential at any non-trivial scale.

Alerting without sufficient session volume. A p75 calculated from 12 sessions is not statistically reliable. An alerting system that fires on p75 changes from small samples will generate false positives constantly on low-traffic pages. Minimum session count thresholds before evaluating p75 change — typically 50-100 sessions per window — reduce noise significantly.

We are not saying you should alert on fewer things. The right goal is alerting on more things with higher precision — catching real regressions reliably, staying quiet during normal variance. That requires deploy correlation (so you are comparing apples to apples), minimum volume gating (so you are not reacting to sampling noise), and segment-level grouping (so a broad regression generates one clear incident, not twenty simultaneous alerts).

Performance Alerting as Part of Deploy Workflow

The most effective integration of performance alerting is as a standard part of the deploy workflow rather than as a background monitoring system that engineers check occasionally. When a deploy is marked as live, an automatic performance verification window opens. The team expects to hear within 30 minutes whether the deploy caused a measurable p75 change on the affected pages. If the alert fires, the deploy is investigated before the next one ships. If it does not fire, the performance baseline is updated.

This integration pattern changes the relationship between engineering teams and performance data. Performance alerting stops being a separate tool that only the SRE team monitors and becomes a shared signal that the shipping team sees alongside their deployment pipeline. The feedback loop tightens: engineers who see a p75 regression correlated to their specific deploy in real time make different future decisions about bundle size, third-party script additions, and rendering performance than engineers who see performance data in a weekly summary report.

The goal is making performance regressions as visible as error rate spikes — caught immediately, attributed precisely, fixed before the next deploy ships. That visibility does not come from stricter thresholds or more dashboard panels. It comes from making the alert context — which deploy, which segment, how much change — specific enough that every fired alert has an obvious next action attached to it.

← All articles

See the real numbers behind your pages.

Free tier. No credit card.