Answer to the question

The evolution from pagination to infinite scroll in the 2010s, popularized by Facebook and Twitter, dramatically changed content consumption patterns. Early product analysts relied on naive "before and after" comparisons without accounting for seasonal trends and user self-selection. The problem has become more complex with the development of a cross-platform ecosystem where users seamlessly migrate between devices with different interface versions.

It is necessary to isolate the causal effect of implementing infinite scroll on scroll depth metrics and ad monetization. Critical confounders include the gradual geographical rollout, creating staggered timing treatments, and the cross-device migration of users leading to contamination between groups. Simple comparisons of regions are invalid due to structural differences in audience behavior. Analysis at the individual session level ignores carryover effects between devices and distorts retention estimates.

We apply staggered difference-in-differences with adjustments for heterogeneous effects using Callaway-Sant'Anna or Sun-Abraham estimators, which correctly handle the phased implementation. To combat cross-device contamination, we cluster standard errors at the user level and include user fixed effects, treating actual usage of the feature as treatment and the regional rollout schedule as an instrumental variable (IV). When analyzing revenue, we conduct mediation analysis to separate the direct effect of layout changes on ad visibility and the indirect effect through increased engagement. We validate parallel trends using pre-rollout data with CausalImpact to build synthetic control.

Real-world situation

In a media application with 5 million MAUs, a shift from classic pagination to infinite scroll was planned to increase time spent in the app. The measurement challenge was the gradual rollout: starting with Moscow and St. Petersburg, then regions after a month. Additionally, users actively switched between the mobile app (where the new feature was) and the tablet (old version), causing strong contamination between groups.

The first option was a simple comparison of metrics before and after the release in one region. Pros: high calculation speed and minimal data requirements. Cons: impossible to separate the feature effect from seasonal news cycle effects and natural user base growth; resulting figures were skewed by +40% due to holiday traffic.

The second option was a clean geographical A/B testing of Moscow against other regions. Pros: clear separation of groups at the time of slicing. Cons: structural differences in behavior (Muscovites read more business news), plus user migration between regions and devices created leakage of up to 15% into the control group, making estimates invalid.

The chosen solution was staggered DiD with fixed user effects and clustering errors at the regional level. We used the moment a user first entered the app with the new version as the treatment start, and the regional rollout schedule as the IV for IV estimation. This allowed us to account for cross-contamination through devices as partial overlaps of treatment and control, providing unbiased estimates.

The final result was a net increase in scroll depth of +22% (instead of +35% in naive estimates), but RPM fell by 8% due to reduced visibility of ad slots. A decision was made to implement a hybrid mode "load more" with a forced ad block every 10 cards. This yielded +18% in viewing depth while maintaining monetization at the baseline level.

What candidates often overlook

How to correctly handle spatial correlation of errors when implementing geographical rollouts?

Candidates often cluster standard errors only at the user level, overlooking that regional shocks (weather, local news) correlate errors within geography. It is necessary to use double clustering (user + region) or Conley spatial standard errors if precise coordinates are available. Without this, confidence intervals may be too narrow, leading to false positives when testing significance of effects.

How to deal with endogeneity in the rate of app updates, given that active users receive infinite scroll before passive ones?

This is a self-selection issue in staggered adoption. The usual intent-to-treat (ITT) by region gives a conservative estimate, but Treatment-on-the-Treated (TOT) requires an instrument. Use assignment of region/time as IV (instrumental variable) for actual usage of the feature, or apply inverse probability weighting (IPW) with propensity scores based on historical activity. Otherwise, the estimate will be biased towards power users with high baseline engagement.

How to separate the effect of UX improvement from the technical change in ad slot visibility when analyzing revenue?

A mediation analysis or two-stage least squares (2SLS) is required. In the first stage, we assess the effect of infinite scroll on scroll depth (pure UX), and in the second stage — the effect of depth on ad impressions. The direct effect of layout (less ad on the screen) is assessed separately through do-calculus or synthetic control with dummy ad slots. Without this separation, one might mistakenly reject a successful feature due to an apparent drop in monetization that is actually caused by the layout change.

How would you isolate the causal effect of replacing pagination with infinite scroll on content consumption depth and monetization, given that the rollout is gradual across regions and users migrate between devices, creating contamination between the test and control groups?