Historical Context
In fintech products, identity verification (KYC) is a regulatory requirement that creates significant friction in the user experience. Classic effectiveness evaluation methods require randomized control, which is not feasible for legal and ethical reasons when implementing at scale. Historically, analysts relied on simple cohort reports that did not account for the endogeneity of self-selection and external market shocks.
Problem Statement
It is necessary to isolate the pure effect of undergoing KYC from natural user attrition, seasonal activity fluctuations, and differences in baseline characteristics between those who undergo verification on the first day and those who delay the procedure. The problem is complicated by the fact that late adopters may systematically differ in motivation and financial behavior, creating a survivorship bias.
Detailed Solution
Apply a combination of Difference-in-Differences (DiD) with Propensity Score Matching (PSM) to build a comparable control group from users with delayed verification. Use the Synthetic Control Method as a robustness check, creating a weighted combination of untouched segments (e.g., users from regions with delayed regulatory requirements). To account for seasonality, include time fixed effects (month-of-year fixed effects) and apply an Event Study Design with relative time to check the parallel trends assumption.
The company launched mandatory two-factor identification with documentation for all users over 18 in March, coinciding with tax season. The business noticed a drop in activity but could not separate the effect of KYC from the seasonal decline and mass push notifications sent by competitors. Analysts needed to assess the net impact on 30-day retention and ARPU over 60 days post-implementation.
Option 1: Simple Comparison of Metrics Before and After (Pre-Post Analysis)
Analysts calculate the average retention for the month before KYC and compare it with post-implementation figures. The pros of this approach are maximum simplicity and speed of response without the need for complex models. The cons include ignoring seasonality (March vs. April), external competitive activities, and natural trends of growth or decline in the user base, leading to an estimation bias of up to 40%.
Option 2: Naive DiD with Young Users (16-17 years) as Control
The team proposes comparing changes in the target group (18+) with changes in a group not subjected to KYC. The pros involve accounting for general market trends and seasonality. The cons are critical: teenagers and adults exhibit fundamentally different financial behaviors, violating the assumption of parallel trends; moreover, different cohorts are subject to various life-cycle effects.
Option 3: Synthetic Control with Time Lag
An artificial control group is created as a weighted combination of users from pilot regions where KYC has not yet been implemented, with weights based on the previous 6 months of activity. The pros minimize reliance on a single control group and automatically account for seasonal patterns through a long history. The cons involve high data volume requirements, complex weight interpretation, and sensitivity to outliers in historical periods.
Chosen Solution and Justification
A hybrid approach was selected: PSM-DiD using users who technically delayed KYC for 2-3 weeks as the control group, plus Synthetic Control for validation. This solution balanced observed characteristics (age, device, historical activity) through PSM, while DiD captured temporal effects. Synthetic control confirmed that results were not sensitive to the choice of a specific control group.
Final Outcome
The analysis showed that KYC reduced 7-day retention by 18% in the first week but increased the average ticket by 22% by excluding fraudulent transactions. The net effect on 90-day LTV turned out to be neutral (-2%, statistically insignificant). Based on these data, the product team split the verification process into three micro-steps, reducing friction by 35% without losing anti-fraud efficiency.
How to correctly handle right censoring of data when analyzing the long-term effect of KYC if the observation window is limited and cohorts undergo verification asynchronously?
Candidates often overlook that users who undergo KYC later have less time to exhibit behavior within the observation window, creating bias. It is necessary to apply survival analysis methods, such as the Cox proportional hazards model or Kaplan-Meier estimator, which account for censored observations. Alternatively, for metrics like LTV, Tobit regression or censored data models can be used. It is also important to apply a staggered adoption design in DiD with correct handling of "clean" cohorts, as the standard two-period DiD will yield biased estimates during phased implementation.
Why might the standard propensity score matching (PSM) method provide biased estimates in the context of mandatory verification, and what modifications are necessary to account for temporal dynamics?
Standard PSM ignores temporal dependence and hidden confounders, such as user motivation or expected transaction volume. In the context of KYC, it is critical to use Time-Dependent Propensity Score Matching, where scores are calculated separately for each period, or Inverse Probability of Treatment Weighting (IPTW) with time-varying covariates. Additionally, it is necessary to check the overlap condition to avoid extrapolating beyond observed data and use Coarsened Exact Matching (CEM) to increase robustness in small sample sizes.
How to distinguish the true effect of undergoing KYC from the anticipation effect and verify adherence to the parallel trends assumption?
To separate effects, it is necessary to apply an Event Study Design with dummy variables for relative time before and after the event. If the coefficients for lead variables (periods before KYC) are statistically significantly different from zero, this indicates an anticipation effect or a violation of parallel trends. To check robustness, Placebo tests should be used by shifting the implementation date to earlier periods or Falsification tests on outcome variables that should not have changed. In the case of trend violations, Synthetic Difference-in-Differences (SDiD) can be applied to correct for mismatched trends through reweighting.