To measure the incrementality of offline channels, the Geo-Lift Testing methodology is applied using the Synthetic Control Method. The key idea is to split geographical regions into treatment (where the advertisement is aired) and control (where the campaign is not launched), followed by building a weighted combination of control regions mimicking the behavior of treatment regions before the campaign launch with a 95% correlation accuracy.
To analyze the time series, the Google CausalImpact library is used, which evaluates the causal effect considering covariates (weather data, economic indicators, competitor activity). Data is aggregated in BigQuery, and preprocessing is done in Python using pandas and scikit-learn to select optimal weights for synthetic control through support vector machines (SVM) or Lasso regression.
The company plans a large-scale television campaign with a budget of 50 million rubles in ten major cities but faces a critical issue in measuring effectiveness: standard trackers like AppsFlyer or Adjust only capture digital touchpoints, making it impossible to track the transition from the TV screen to app installation. Additional complexity arises from simultaneous aggressive promotional activity from a competitor and abnormal weather conditions in the target regions, which could distort direct comparisons with previous periods.
The first solution considered was a correlation analysis of time series using the ARIMA model, where forecasts based on historical data are compared with actual install metrics. The pros of this approach include low implementation costs in Python with the statsmodels library and no need to split the advertising budget between regions. The cons are the inability to separate the TV effect from external shocks (competitors' actions, weather), leading to the risk of false attribution of growth to television advertising despite the lack of causal connection.
The second option was addressable TV with classical A/B testing at the household level, where the advertisement would be shown to only part of the audience with the possibility of direct attribution through panel data. The pros consist of strict causality and the ability to measure long-term cohort LTV. The cons include technical complexity of integration with data providers (GfK, TNS), high costs, and long preparation periods (3-4 months), as well as inapplicability to traditional broadcast TV, which reaches the entire population of the region without the ability to target individual users.
The third approach was Geo-Lift Testing with synthetic control, where the campaign is launched in treatment regions, and a weighted combination of similar control regions is used to mimic their behavior. The pros of this method include the ability to establish causality through natural experiments and resilience to common external shocks affecting both groups. The cons include the need for careful selection of control regions with similar seasonality, sensitivity to user migration between cities, and the requirement for a minimum of 12 months of historical data for building quality synthetic control.
The third solution was chosen since the company had detailed data on 40 regions over 18 months stored in BigQuery, allowing the construction of synthetic control with a correlation coefficient above 0.95 for the pre-campaign period. Analysis was performed in the Jupyter environment using the pycausalimpact library, and data preprocessing was executed in SQL and pandas with normalization by audience size.
As a result, a statistically significant incremental increase in organic installs of 23% was found within 14 days after the campaign started, with a 95% confidence interval [15%; 31%], leading to an ROI of 145% and allowing the marketing team to justify an increase in the TV channel budget for the next quarter.
How to handle adstock effects (lag and carryover effect) when analyzing offline campaigns, when the influence of advertising doesn't manifest immediately but is distributed over time?
Candidates often use a simple comparison of "day of airing - day of installation," ignoring that TV advertising has a half-life effect. It is necessary to apply the adstock transformation: $A_t = X_t + \lambda \cdot A_{t-1}$, where $\lambda$ is the decay coefficient (usually 0.3-0.8 for TV), determined through maximum likelihood estimation or Grid Search in scikit-learn. It is also important to consider the carryover effect from previous campaigns; otherwise, the current lift will be overestimated. Cross-validation on previous campaigns with different lags is used to validate $\lambda$.
Why can't simple average comparisons (t-test) be used between treatment and control regions in Geo-Lift testing, even if the regions are randomly selected?
The problem lies in the heterogeneity of variances between regions (different baseline conversion rates, different population sizes) and the presence of cluster correlation (intra-regional dependence of observations). The standard t-test assumes independence of observations and equal variances, leading to inflated statistical significance (false positives). The correct approach is to use Clustered Standard Errors at the regional level or hierarchical Bayesian models in PyMC3 / Stan, which account for the data structure. It is also necessary to check the balance of covariates (propensity score matching) before the test to ensure that the synthetic control is adequate.
What is the fundamental distinction between Marketing Mix Modeling (MMM) and Geo-Lift Testing, and when is each method preferred?
MMM (for example, via the Robyn library from Meta or LightweightMMM from Google) is a correlational model that evaluates the contribution of all channels simultaneously through regression with regularization, but it is sensitive to endogeneity and cannot establish strict causality without instrumental variables. Geo-Lift is a quasi-experiment that establishes causality through exogenous variation (presence/absence of advertising in the region). MMM is preferred for budget optimization across multiple channels and planning, whereas Geo-Lift is necessary for validating specific hypotheses and calibrating MMM. The optimal practice is to use Geo-Lift to calibrate priors in Bayesian MMM, which is implemented through pymc-marketing.