Product Analytics (IT)Product Analyst

What method should be used to quantitatively assess the causal effect of the implementation of an opt-in consent mechanism for the use of third-party cookies on the accuracy of marketing channel attribution and observed conversion, considering that opting out of tracking correlates with the level of user privacy consciousness (self-selection), and that a control group with full tracking is legally unavailable?

Pass interviews with Hintsage AI assistant

Answer to the question

The historical context is shaped by the evolution of privacy regulations (GDPR, CCPA, ePrivacy Directive) which require companies to obtain explicit user consent for data processing. Until 2018, analysts relied on deterministic attribution with full tracking of the user journey. However, the implementation of consent management platforms (CMP) has led to a systematic disappearance of data (missing not at random), distorting funnels and LTV metrics.

The problem lies in endogeneity self-selection: users who opt out of cookies systematically differ in behavior (higher price sensitivity, use of ad-blockers, click less on ads), creating a survival bias in the observed data. A standard cohort comparison with and without consent leads to an overestimation of channel effectiveness, as the “lost” users are not a random sample.

The solution relies on causal inference using instrumental variables (IV) or regression discontinuity design (RDD) based on the threshold values of consent propensity (propensity score). Two-stage least squares (2SLS) is applied, where the design of the CMP-banner (e.g., the position of the “Accept” button) serves as an instrument that affects the probability of consent but does not directly correlate with conversion. To assess the long-term effect, the Synthetic Control Method is used, creating a weighted combination of regions or segments with a high consent rate as “donors” for modeling the counterfactual scenario without the implementation of strict consent. Additionally, probabilistic attribution based on first-party data and server-side tracking is implemented, allowing for the recovery of some “lost” chains through probabilistic models (Markov chains or Shapley value for channels).

Real-life situation

An e-commerce platform team faced a crisis after implementing a GDPR-compliant consent banner in the EU region: the share of opt-outs from tracking reached 60%, and the observed conversion to paying users dropped by 35%. The business anticipated a catastrophic decrease in marketing effectiveness, however, it was necessary to separate the true decline in demand from the artifact of lost attribution data.

The first option considered was a simple comparison of metrics before and after implementation (pre-post analysis). Pros: immediate implementation and clear interpretation. Cons: complete disregard for seasonality (the launch coincided with the start of the summer decline), external competitive campaigns, and changes in iOS App Tracking Transparency algorithms, making the result invalid.

The second option was the comparison of EU traffic with traffic from non-EU countries (geo-experiment). Pros: availability of a control group with full tracking. Cons: fundamental incommensurability of regions due to differences in purchasing behavior, currency fluctuations, and different stages of market development, leading to an estimation bias of 15-20%.

The third option was the application of CausalImpact using a Bayesian structural time-series model. Pros: consideration of temporal dependencies and seasonality. Cons: sensitivity to the choice of covariates (predictors) and the assumption of the absence of synchronous shocks, which is risky in a period of global changes in privacy policies.

The chosen solution was the Synthetic Control Method (SCM) using segments of users with a high historical consent rate (donors) to construct a weighted synthetic EU. Additionally, instrumental variables were applied at the cohort level: randomized A/B tests of banner design (button color, defaults) were used as instruments to estimate the Local Average Treatment Effect (LATE). This allowed isolating the pure effect of data presence rather than banner design.

The final result showed that the true decline in conversion was only 8% (instead of 35%), with the remainder being an artifact of lost attribution. The MTA (Multi-Touch Attribution) model was recalibrated using incrementality-based calibration through geo-based holdouts, restoring the accuracy of ROAS forecasting to within ±3% of pre-consent values.

What candidates often miss

How to correct bias in attribution when part of the users gives partial consent (only necessary cookies), creating incomplete user journeys?

Candidates often suggest simply excluding non-consenting users from the analysis, reinforcing selection bias. The correct approach is to use pattern-mixture models or multiple imputation by chained equations (MICE), accounting for the missing data mechanism (MNAR). It is necessary to model the probability of conversion as a function of observed behavioral signals (first-party events) even in the absence of third-party identifiers, using surrogate outcomes to restore the causal estimand.

Why can standard metrics such as click-through rate (CTR) show growth after the introduction of strict consent, and how should this be interpreted?

This is a classic case of survivorship bias: only highly motivated users who consent to tracking remain, and they already had a high CTR. Candidates miss the need to assess the intention-to-treat (ITT) effect on the whole population, rather than just the per-protocol group. It is necessary to apply complier average causal effect (CACE) analysis using the randomization of consent banner design as an instrument to estimate the effect on “compliers.”

How to distinguish the effect of data loss from the true decline in demand when implementing a consent mechanism where it is legally impossible to create a control group without the banner?

Here, it is critical to apply difference-in-differences (DiD) with a staggered adoption design or synthetic control using “early” and “late” adopters in different jurisdictions. Candidates often overlook the parallel trends assumption, which needs to be validated through event study specifications with leads-and-lags. It is also important to use proxy variables (e.g., aggregate credit card spending data or panel data from providers) as an alternative source of truth to validate internal metrics, adjusting for differential privacy noise.