Answer to the question

The architecture centers on a distributed Zanzibar-inspired authorization service composed of three tiers: stateless Check Engine evaluation nodes, a globally distributed graph storage Snapshot Database, and an event-driven Watch Pipeline for cache invalidation. This design separates the read-heavy authorization checks from the write-heavy permission updates, enabling independent scaling of evaluation capacity while maintaining strong consistency guarantees for relationship mutations. The system employs Google's zookie pattern to bound staleness without sacrificing the performance benefits of edge caching.

Check Engine nodes deployed at edge locations evaluate authorization queries using local in-memory caches and compact permission bitmaps. These nodes load namespace configurations from a replicated etcd cluster and resolve relationship tuples from a geo-partitioned CockroachDB or Spanner instance, which provides external consistency through TrueTime or hybrid logical clocks. Each node maintains Bloom filters to prevent cache misses from hitting the database when relationships definitely do not exist.

To handle the "new enemy problem" where recent revocations might be invisible to edge caches, the system implements zookie tokens—opaque timestamps encoding snapshot consistency—that force cache misses for sensitive operations within a configurable uncertainty window. Clients receive these tokens with initial checks and must replay them when accessing high-value resources, ensuring that recently revoked permissions are immediately visible without requiring global cache invalidation. This mechanism balances the need for low latency with the security requirement of immediate revocation visibility.

Cache invalidation leverages an Apache Kafka-backed Watch Pipeline that propagates tuple changes to all edge Check Engines using consistent hashing, ensuring that revocation storms trigger staggered cache refreshes rather than synchronized database bombardment. The pipeline employs jittered exponential backoff to prevent thundering herds when widely-shared objects experience permission changes. This architecture ensures that the system maintains sub-10ms latency for cache hits while guaranteeing causal consistency for permission updates across geographically distributed nodes.

Situation from life

A global document collaboration platform serving 50 million enterprise users experienced catastrophic latency spikes during peak hours when evaluating complex sharing hierarchies. Each document access required traversing nested group memberships and inherited permissions stored in a monolithic PostgreSQL cluster, resulting in 500ms+ query times and frequent timeouts during bulk permission updates when employees changed departments or projects. The engineering team required a solution capable of sub-10ms latency while maintaining strict security guarantees during cascading revocations across nested folder structures.

The first approach evaluated maintaining a centralized PostgreSQL cluster with aggressive Redis materialized views caching permission paths. Pros included strong ACID guarantees ensuring immediate visibility of revocations and straightforward transaction semantics for complex multi-table updates. Cons involved severe write bottlenecks during bulk permission changes, inevitable cache stampede risks when popular documents updated, and fundamental inability to scale read throughput geographically without complex read replicas that introduced unacceptable replication lag for security-critical decisions.

The second approach suggested a fully denormalized Apache Cassandra deployment with application-side permission resolution and TTL-based cache expiration. Pros encompassed excellent write throughput for relationship mutations and inherent multi-region availability without single points of failure. Cons revealed unacceptable eventual consistency tradeoffs where revoked permissions remained visible for minutes due to gossip protocol delays, and the lack of atomic cascading deletes created security holes where users retained access to resources after being removed from parent groups, violating the principle of least privilege.

The team ultimately selected a Zanzibar-style architecture utilizing CockroachDB for the relationship tuple storage, Envoy sidecars as Policy Enforcement Points, and horizontally scaled Check Engine nodes with Least Recently Used caches fronted by Bloom filters. This choice balanced the need for strong consistency in permission writes via serializable default isolation with edge-performance requirements through local evaluation caches and Apache Kafka-driven invalidation streams. The result reduced p99 authorization latency from 500ms to 4ms, supported 15 million checks per second globally, and ensured permission revocations propagated to all edge nodes within 150 milliseconds while maintaining 99.99% availability.

What candidates often miss

How do you prevent authorization checks from returning stale "allow" decisions immediately after a permission revocation without sacrificing the performance benefits of distributed caching?

Candidates frequently overlook the zookie pattern or version vectors, instead proposing global cache invalidation or database reads for every check. The solution requires the authorization service to return a consistency token with each decision that encodes the snapshot timestamp of the data used. For sensitive operations or after recent revocation events, the client must present this token, forcing the Check Engine to verify against the central store if its local cache predates the token timestamp. This ensures causal consistency without requiring global cache invalidation or sacrificing read performance for the majority of requests.

How would you architect the cache invalidation mechanism to avoid thundering herd effects when a widely-shared object has its permissions modified, potentially triggering millions of simultaneous cache refreshes?

The key technique involves consistent hashing of cache keys combined with jittered exponential backoff and request coalescing at the edge nodes. When the Watch Pipeline broadcasts a tuple change, edge nodes do not immediately invalidate but instead schedule invalidation using a hash of the object ID to spread refreshes over a time window. Additionally, each Check Engine maintains a flight group for in-progress checks, ensuring that concurrent requests for the same object share a single backend query result, preventing database overload during popular object updates.

Why is using a simple directed graph traversal insufficient for modeling ReBAC policies, and how do you handle intersection and exclusion constraints in a distributed evaluation environment?

Simple graph traversal fails to capture set operations required for sophisticated policies like "allow only if user is in group A AND NOT in group B". The solution implements a rewrite system where namespace configurations compile to decision trees evaluated via reverse-index lookups, storing both positive and negative relationship tuples explicitly. For intersection constraints, the system queries both sets and computes the intersection at the Check Engine, while exclusions employ short-circuit evaluation with early termination. This approach ensures that complex boolean logic evaluates efficiently without requiring multiple round trips to the database.