Answer to the Question

Historical Context

The evolution of e-commerce has led to the development of omnichannel logistics, where Click&Collect and pick-up points (Pickup Points, ПВЗ) have become tools for reducing last-mile delivery costs. However, unlike digital features, these changes have a geographically discrete nature and are subject to self-selection effects — clients with high time value ignore pick-up points, while cost-conscious users migrate from courier delivery. Traditional user-level A/B testing is impossible here due to a lack of randomization at the location level and the existence of network effects within neighborhoods.

Problem Statement

The analysis faces three key challenges. Firstly, endogeneity of location: points are opened in areas with high order densities, creating reverse causality (high demand → opening of pick-up points). Secondly, cannibalization: some users simply change their method of receiving orders from delivery to self-pick-up without increasing overall spend. Thirdly, SUTVA violation (Stable Unit Treatment Value Assumption): one user sees the opening of a point near their home and encourages neighbors through social media, creating cross-contamination between “treated” and “control” neighborhoods.

Detailed Solution

A multi-level strategy of quasi-experimental evaluation is recommended. At the macro level (cities), the Synthetic Control Method is applied — we create a weighted combination of “donor” cities without pick-up points that maximally imitates the dynamics of the test city metrics before the intervention. Weights are calibrated through convex optimization based on pre-intervention data (12-18 months), including seasonality, macroeconomic indicators, and category structure.

At the micro level (users), we utilize Difference-in-Differences with propensity scoring (Propensity Score Matching) to control for observable characteristics, but the key is the introduction of Instrumental Variables (IV). The instrument is the shortest distance from the user’s home to the nearest pick-up point, calculated via the road network. This variable correlates with the choice of self-pick-up (the first stage of two-stage OLS) but does not correlate with latent purchase propensity, isolating the pure Local Average Treatment Effect (LATE).

To account for hybrid orders (intermediate warehouse), we build CausalForest models, separating the effect into subpopulations: immediate adopters, delayed users, and never-takers. Finally, we adjust standard errors for clustering at the district level (clustered standard errors) and check sensitivity to spillover effects through an analysis of concentration within a 500-meter radius.

Real-life Situation

Context: A large fashion marketplace planned to launch a network of 120 Pickup Points in 15 medium-sized test cities (500-800k population) with the goal of reducing logistics costs by 25%. Management required an assessment of whether the presence of pick-up points increases purchase frequency among existing customers or simply shifts traffic from courier delivery.

Option 1: Simple comparison of “city with pick-up points vs city without pick-up points” Pros: Maximally simple implementation, does not require historical data, quick response for the business. Cons: Cities with pick-up points are initially wealthier and more active (selection bias), differences in seasonality and competitive environment can lead to an effect estimate bias of up to 40%. The result becomes unreliable for scaling.

Option 2: Before-After analysis only in test cities Pros: Controls for intercity differences, focuses on trend change. Cons: Does not account for overall market trends in e-commerce growth (in the pandemic year, the baseline trend could have been +30% year-on-year), the endpoint may coincide with local holiday promotions, distorting the picture.

Option 3: Synthetic Control at the city level + IV at the user level Pros: Synthetic Control creates a counterfactual scenario of “what would have happened without pick-up points”, correcting for macro trends, while Instrumental Variables (distance to the point as a random shock for “lazy” users) isolates the causal effect from mere correlation. Cons: Requires at least 12 months of pre-intervention data for each city, the difficulty of interpreting LATE for non-technical stakeholders, computationally intensive.

Chosen solution and rationale We chose a combination of Synthetic Control for intercity validation and Two-Stage Least Squares (2SLS) with a geographical instrument for user metrics. This allowed us to separate the effect of the infrastructure (structural effect) from the effect of conscious choice (behavioral self-selection). It was critically important to prove that even “lazy” users living 200 meters from the new point began to purchase more often without changing their economic characteristics.

Final Result The evaluation showed a true incremental increase in purchase frequency of 12% among users living within the catchment area of the pick-up points (ITT), with cannibalization of courier delivery amounting to 18%, compensated by an 8% increase in average ticket size due to the absence of delivery fees. However, the effect was heterogeneous: significant only for categories such as “shoes” and “accessories”, while no significant effect was found for “home appliances”. This allowed us to adjust the point opening strategy, focusing on fashion malls and discarding points in residential areas where appliances predominated.

What Candidates Often Miss

How to distinguish the effect of opening pick-up points from the effect of a marketing campaign announcing these points when the campaign launches simultaneously with the opening?

Answer: The standard error is the neglect of treatment contamination through the marketing channel. It is necessary to use the Difference-in-Difference-in-Differences (DDD) method or split the sample into two control groups: cities with a campaign (media support) but without physical pick-up points (just an announcement “coming soon”) and cities with full implementation. If an effect is only observed in the second group, it proves the causal role of logistics rather than communication. It is also important to track brand search as a control variable — if it grows equally in both groups, then the increase in revenue in the test group is due precisely to the service convenience, not awareness.

Why can't we use simple matching of users by distance to the pick-up point (closer than 500m vs farther than 2km) as a proxy for testing and control, even if controlling for demographics?

Answer: This violates the positivity assumption and selection on unobservables. Users who choose to live near shopping centers (where pick-up points are usually located) systematically differ in income, employment, and lifestyle from residents of the outskirts. Even with Propensity Score Matching, there remains hidden bias from unobserved confounders (for example, family budget planning). A correct approach is to use Regression Discontinuity Design (RDD), considering delivery zone boundaries or administrative neighborhoods as a random threshold, where on one side of the street, houses are located 300 meters away (treatment), and on the other side — 900 meters away (control), but socioeconomic characteristics are identical.

How to properly account for the time lag between opening pick-up points and the formation of behavior habits (habit formation), if standard attribution windows (7-30 days) underestimate the long-term effect?

Answer: A classic error is the use of a fixed post-period. It is necessary to apply Event Study Design with dynamic lags, modeling the effect separately for the 1st, 3rd, and 6th months after the opening. This allows capturing treatment effect heterogeneity over time — often, the effect increases as the habit forms (learning curve), and then plateaus. It is important to use Cox Proportional Hazards models for the time until first use of the pick-up point, accounting for competing risks (users may churn before adaptation). Moreover, survivorship bias should be adjusted for — users who began using pick-up points may have low churn rates by definition, and they should be compared with a control group with similar survival patterns, not with the entire base.