Product Analytics (IT)Product Analyst (IT)

What method would you use to measure demand cannibalization between a new ML recommendation block on the main screen and existing navigation categories to determine the true incremental GMV increase?

Pass interviews with Hintsage AI assistant

Answer to the question

Background

Large products continuously implement new entry points for content — personalized blocks, AI recommendations, or alternative navigation patterns. Without analyzing cannibalization, the team might mistakenly attribute the success of a new feature, while in reality, there could be merely a flow of users between screens without an increase in overall revenue.

The Problem

It is necessary to separate the incremental effect (new transactions that would not have occurred without the new block) from cannibalization (transactions that moved from old categories to the new block). A standard A/B test at the user level does not solve this issue, as users see both channels simultaneously, and the choice between them creates endogeneity.

Solution

We use a Geo-experiment with synthetic control or clustered randomization by sessions. We randomly assign geographical regions to the test and control groups, measuring the change in GMV not only at the platform level but also disaggregated by navigation categories. We apply the Difference-in-Differences method with categories as panel data to subtract the cannibalized revenue from the overall increase.

Real-life Situation

Problem Description

In a mobile e-commerce application, a new block "Recommended for You" based on TensorFlow ranking models was launched. After a month, the click metric in categories dropped by 25%, while the overall GMV grew only by 5%. The product team argued: was this cannibalization or real optimization of the user journey? It was necessary to determine how much of the 5% was true growth and how much was a flow of existing demand.

Considered Solutions

Solution one: Simple "before/after" comparison of overall GMV. This approach assumes that without the new block, the metrics would have remained unchanged. Pros: maximum speed, does not require experimental infrastructure. Cons: ignores seasonality, marketing campaigns, and organic growth trends, leading to a biased estimate of 15-20%.

Solution two: Classic A/B test at the user_id level with a 50/50 split via the Splitting service. Here, it is assumed that if the block is hidden from the control group, the difference in GMV will show the true effect. Pros: ease of implementation, familiar statistics. Cons: users in the test may still find products through search or categories, creating direct cannibalization within the test group, while the control group without the block generates less data for category comparison.

Solution three: Geo-experiment with synthetic control (SCM). We selected 20 cities with similar GMV dynamics, randomizing 10 into the test group (block included) and 10 into the control group (block not shown). For the control, we constructed a weighted sum of cities closely resembling the test cities prior to the period. Pros: allows measuring the effect at the aggregated market level, naturally accounting for cannibalization between categories within the city. Cons: requires a large sample size (cities), sensitive to regional promotions, complexity in calculating second-order errors.

Chosen Solution and Rationale

We settled on the third option — the geo-experiment with the Synthetic Control Method. The key factor was the inability to measure cannibalization within a single user through a standard A/B test since even in the control group without the block, we do not see the "counterfactual" fate of the transactions that would have flowed into the block in the test group. The geo-level allowed us to see how the buying structure changes across categories in general.

Result

We found that out of the 5% overall GMV increase, 3.2% was cannibalization (flow from the long tail of categories to the top 3 products from the block), while only 1.8% was a truly incremental effect. Based on this data, we adjusted the ranking algorithm, adding penalization for popular products, which raised the net increase to 4.1%.

What Candidates Often Miss

Question 1: Why can’t you just look at the correlation between clicks in the new block and the drop in clicks in the category at the level of the user session?

The answer lies in the endogeneity of self-selection. Users clicking in the new block have a different intent structure (high purchase intent vs. browsing) than those going to categories. A direct correlation will lead to Simpson's paradox: in aggregated data, it may seem that the block "stole" traffic, but at the cohort level with high intent, we will see that they would have bought anyway, just faster. It is necessary to use Causal Forest or Propensity Score Matching to compare users with identical behavior history "before" exposure to the block.

Question 2: How to calculate the minimally significant effect (MDE) for a cannibalization experiment if the effect can be negative for some categories and positive for others?

Here candidates err by applying the standard formula for average effect. In the case of cannibalization, the variance between categories increases since we are dealing with imbalance: some categories lose, others win. It is necessary to use Linear Mixed Models with a random category effect and calculate power for a combined metric: total GMV minus the weighted drop in GMV in cannibalized categories with a risk compensation factor.

Question 3: What is the fundamental difference between experimental measurement of cannibalization in a product and addressing the interference problem in social networks?

In product analytics, cannibalization is a form of "demand transfer" within a single subject (user), which is rarely modeled as interference between units. In social networks (e.g., Facebook), interference is the spillover between users through the social graph. To deal with cannibalization, we use clustering by time or behavior type rather than graph-based randomization. It is important to understand that here the treatment assignment is exposure to the new UI rather than communication between users, so methods like Ego-cluster randomization are not applicable; instead, we use Switchback experiments at the user segment level.