Historical Context. In recent years, BNPL (Buy Now Pay Later) has become the standard for fintech integrations in retail, allowing users to split payments interest-free. Analysts face a fundamental problem: it is impossible to conduct a randomized experiment, as denying credit approval for ethical and legal reasons is not feasible, and users self-select based on creditworthiness. This creates classic endogeneity, where the observed correlation between using BNPL and high order values is due to pre-existing characteristics of creditworthy customers, rather than the product itself.
Problem Statement. Main challenges include a sharp discontinuity in characteristics at the approval threshold (e.g., 700 points), seasonality (Black Friday, pre-New Year period), cannibalization of future sales (intertemporal substitution), and increased returns due to impulse purchases. It is necessary to isolate the pure incremental effect (LATE - Local Average Treatment Effect) for users at the ‘border’ of approval, minimizing the influence of confounders.
Detailed Solution. The optimal approach is Sharp Regression Discontinuity Design (RDD) at the scoring threshold with a bandwidth of ±30-50 points. The methodology relies on the assumption of local randomness: users with 695 and 705 points are statistically indistinguishable on observed and unobserved characteristics but fall into different groups (control and treatment). Additionally, Difference-in-Differences (DiD) is applied to track dynamics before and after implementation within this bandwidth, controlling for seasonality. To assess cannibalization, an Event Study with lags is used (spending in the t-3, t-2 months before using BNPL). If an instrument (approval threshold) is available, but there is non-compliance (approved but not using BNPL), Fuzzy RDD is applied through Two-Stage Least Squares (2SLS). It is important to check covariate balance (Covariate Balance Tests) and density of distribution (McCrary test) for validation of the design.
An electronics marketplace integrated BNPL from a partner bank with a strict approval threshold of 650 points on the internal scale. The business recorded a 35% increase in average order value for users with BNPL but suspected that this was a self-selection effect of more affluent customers. A decision was needed regarding an increase in credit limit, but an assessment of the true causal effect was required.
Option 1: Simple comparison of 'used BNPL' vs 'did not use' without considering the threshold. Pros: Maximally simple implementation in SQL, does not require complex statistics. Cons: Critical selection bias — approved users have higher income and purchasing history, which leads to an inflated effect estimate of up to +40%, unrelated to the product. The result is unsuitable for decision-making.
Option 2: Before-After analysis for the entire audience without group division. Pros: Considers overall growth trends of the platform and is simple to interpret. Cons: It is impossible to separate the effect of BNPL from seasonal spikes (holiday sales) and simultaneous marketing campaigns. The estimate is biased due to temporary demand shocks.
Option 3: Regression Discontinuity Design (RDD) at the threshold of 650 points with a bandwidth of ±40 points. Pros: Uses a sharp discontinuity in approval probability as a natural experiment, assessing the effect for 'marginal' users who 'just barely' passed or did not pass the threshold. Controls for unmeasurable characteristics in the local neighborhood. Cons: Only estimates the local effect (LATE), which cannot be indiscriminately extrapolated to all users with high scoring; requires a large sample in the neighborhood of the threshold for statistical power.
Chosen Solution: A combination of Sharp RDD for users in the band of 610-690 points with Propensity Score Matching based on historical spending and product categories, supplemented by Difference-in-Differences to track dynamics over 90 days post-purchase. To control for seasonality, week fixed effects (Week Fixed Effects) were introduced. This allowed for isolating the pure product effect from borrower characteristics.
Final Result: A statistically significant increase in average order value of 17% (ITT — Intent-to-Treat) for marginal users was identified, but an 11% increase in return rates due to impulse purchases was also noted. The effect turned out to be heterogeneous: high for electronics (+24%), zero for household chemicals. Based on the data, the approval threshold for risky product categories was adjusted, reducing the return rate by 4% without loss of revenue.
How to distinguish the 'novelty' effect from sustainable behavior change when using RDD?
It is necessary to conduct Dynamic RDD with analysis of the effect over time intervals (cohort-level RDD). The effect is assessed separately for weeks 1-2 (novelty) and months 3-6 (sustainable behavior). If the coefficients differ significantly (checked via Chow test), only the long-term window is used or time interaction with treatment is introduced. It is also important to check for pre-trend parallel — absence of outcome (spending) discontinuities in periods before crossing the threshold, confirming design validity and absence of anticipation effects.
How to correctly assess the cannibalization of future sales (intertemporal substitution) when implementing BNPL?
Standard RDD only assesses the static effect at the time of purchase. For cannibalization, an Event Study with lags and leads (leads/lags) relative to the moment of first BNPL use is constructed. Spending is analyzed in periods t-3, t-2, t-1 (before) and t+1, t+2, t+3 (after) months. If the sum of coefficients on leads (pre-periods) is negative and significant, it indicates borrowing from the future (the user planned the purchase and accelerated it thanks to BNPL). Local Projections of the Jordà method are used for dynamic multipliers, allowing assessment of the pure incremental effect over a long period.
Why is it not possible in this case to use simple propensity score matching (Propensity Score Matching) without RDD, and what assumptions are violated?
PSM requires the assumption of Unconfoundedness (Ignorability), which is impossible with unmeasured characteristics affecting approval (for example, 'financial discipline', informal income sources not captured in scoring). These latent variables correlate with both approval and spending, creating bias. RDD weakens this requirement to local randomness around the threshold (Local Randomization), where unmeasured characteristics are distributed randomly. Candidates often ignore the necessity to check score density (McCrary test) and covariate balance (Covariate balance tests) in the neighborhood of the threshold, which is critical for the validity of conclusions.