Product Analytics (IT)Product Analyst

What method should be used to evaluate the causal effect of reducing the intensity of email newsletters from daily to three times a week on 30-day user retention and monetization, when the change is applied to the entire user base simultaneously without the possibility of A/B testing, and the audience is heterogeneous in engagement levels (churned, active, power users) and subject to seasonal activity fluctuations?

Pass interviews with Hintsage AI assistant

Answer to the question.

Historically, email marketing has developed within the paradigm of maximizing touchpoints, where an increase in communication frequency correlated with an increase in revenue until saturation. With the development of the theory of engagement fatigue and tightened spam filters (SpamAssassin, Gmail Promotions Tab), the need for frequency optimization arose, but classic before/after comparisons proved unreliable due to nonlinear effects of satiation and external shocks.

The issue of evaluation lies in the inability to create a control group during a global rollout, the presence of self-selection bias (different segments respond differently to reduced touchpoints), and confounding factors (seasonality, macroeconomic trends, parallel marketing activities). Standard correlational analytics mix causal effects with general trends of product growth or decline.

The optimal solution requires a combination of quasi-experimental methods. We apply Difference-in-Differences (DiD) with Propensity Score Matching (PSM) based on historical engagement metrics (open rate, click rate, recency). For each segment, we build a synthetic control using the Synthetic Control Method, employing correlating time series (organic traffic, direct app visits) as covariates. For inference, we use Causal Impact based on Bayesian Structural Time Series, which allows modeling counterfactuals with confidence intervals. Additionally, we apply Causal Forests to estimate heterogeneous treatment effects across RFM segments. Validation is performed through placebo tests on the pre-intervention period to check the parallel trends assumption and sensitivity analysis to assess robustness to unobserved confounding.

Real-life situation.

An EdTech platform with 2 million users faced a 40% increase in unsubscribe rate over a quarter and decided to reduce the frequency of educational digests from daily to three times a week. The problem was the need to prove to the CEO that reducing frequency would not destroy revenue from power users, while the change was initiated on December 15 — a week before the traditional pre-holiday peak in course purchases, creating a strong temporal confounder.

The first approach considered was a simple comparison of average checks for the week before and after via t-test. The advantages lay in the speed of implementation and clarity for business stakeholders. The downsides were critical: complete disregard for seasonal purchase growth in December led to a false positive effect of a 15% increase in LTV, while in reality, there could have been zero or negative effects from reduced communications.

The second option involved cohort analysis with a 30-day lag, comparing the November and December cohorts. Pros included taking into account the user life cycle and seasonality-adjusted metrics. Cons manifested in the fact that different cohorts had different baseline conversions, and the December cohort was distorted by holiday promotional campaigns, leading to insurmountable selection bias and the inability to isolate the pure effect of newsletter frequency.

The third option — constructing a Synthetic Control based on geographical data, using regions in the CIS with low email channel penetration (where users rely on push and SMS) as a control group for regions highly dependent on email digests. Advantages: the ability to model the counterfactual "what would have happened without the change" at an aggregated time series level. Disadvantages: the assumption of parallel trends was violated due to regional differences in holiday learning traditions, and city data was heavily noise-affected by user migration between regions during the New Year holidays.

The fourth option (chosen) — Difference-in-Differences with exact matching on historical activity (opens, clicks, purchases for 90 days before the change). We used power users (who opened >70% of emails) as the treatment group and dormants (who opened <5% of emails) as the control, since the latter did not actually experience a change in frequency. Advantages: strict control over observed characteristics through PSM and the ability to validate parallel trends using data from previous quarters. Disadvantages: the assumption of no differential trends between active and inactive users required additional verification. For robustness, we applied Causal Impact, using mobile app metrics (sessions, in-app purchases) as control time series, not directly correlating with email frequency but reflecting the overall product trend.

The final result showed that for power users, frequency reduction led to a statistically significant decrease in 30-day retention by 8% (p-value < 0.05, 95% CI [5%, 11%]), but increased lifetime value by 3% due to a decrease in churn to spam lists. For moderately active users, the effect was statistically neutral. The recommendation for the business: return to daily frequency only for the top 10% of users with the highest engagement score through segmentation, while keeping three emails a week for the rest of the base.

What candidates often overlook.

How to distinguish the effect of newsletter frequency from the effect of content quality when, alongside reducing frequency, the team improved copywriting and the design of emails?

The answer requires the application of mediation analysis and instrumental variables (IV). It is necessary to build a two-stage model: first, assess the impact of frequency change on the probability of opening an email (controlling for content quality through metrics such as readability score or engagement rate during the control period), then assess the impact of opening on conversion. Packages like mediation in R or Python (library mediation) are used to decompose the total effect into direct effect (frequency) and indirect effect (quality). A critical nuance for a novice specialist is that if content quality is a collider (depends on frequency due to freed resources from the copywriters' team), front-door adjustment should be applied or lagged quality metrics (quality value with lag=1) should be used as an instrument to isolate the pure effect of frequency.

How to correctly interpret results when violating SUTVA (Stable Unit Treatment Value Assumption), when users share promo codes from emails on social networks, creating spillover effects between treatment and control groups?

Candidates often ignore network interference, assuming the independence of observations. The solution is to shift from individual-level analysis to cluster-level (cluster robust standard errors) or use methods of causal inference under interference. It is necessary to define clusters through social graphs (if data on connections are available) or geographical proximity, then apply exposure mapping for observational data. To assess spillover, neighborhood-based treatment definitions or sinusoidal exposure models are used. It is essential to understand that with positive spillovers (virality of promo codes), standard estimates yield an underestimated effect since the control group partially receives "treatment" through the network. Estimates must be adjusted through inverse probability weighting, considering the degree of neighbor exposure.

How to conduct sensitivity analysis to assess the robustness of results to unobserved confounders (unobserved confounding), such as a simultaneous advertising campaign on Facebook targeting the same audience?

The standard approach in product analytics is the use of E-value (VanderWeele & Ding) to assess the minimum strength of association that an unobserved confounder must have to explain the observed association. Bounding analysis (Rosenbaum bounds) for rank-based tests is also applied. For a novice specialist, it is critically important to understand the technique of negative controls — using outcomes that should not be affected by treatment (for example, the number of sessions in the mobile app if we are changing only the email channel), but which correlate with the presumed confounder. If "reducing email newsletters" affects time in the app (which should not be), this signals the presence of a common confounder (for example, a common marketing budget or seasonality).