Descriptors were formalized in Python 2.2 alongside new-style classes to provide a unified protocol for attribute access control. Prior to this innovation, built-in types like property and classmethod relied on special-case logic hardcoded into the interpreter. The introduction of the descriptor protocol allowed user-defined classes to exhibit behaviors previously reserved for built-ins. The convention of passing None for the instance parameter emerged organically from the need to distinguish between class-level and instance-level access without fragmenting the protocol into multiple methods.
Without a mechanism to detect when access occurs on the class itself, descriptors would be forced to return themselves unconditionally, preventing the implementation of class-level properties or schema introspection. Alternatively, the protocol would require separate hook methods for class versus instance access, significantly complicating the object model. The challenge lay in designing a single method signature capable of handling both access patterns elegantly while maintaining backward compatibility and minimal performance overhead.
The __get__(self, instance, owner) method signature receives None for the instance parameter when accessed as Class.attribute, and the actual instance object when accessed as instance.attribute. The owner parameter always receives the defining class. This allows descriptors to implement branching logic: returning metadata or the descriptor itself when instance is None, or returning computed values when an instance exists. This convention enables the implementation of classmethod and staticmethod in pure Python, and supports advanced patterns like class-level validation schemas.
A data engineering team required a declarative validation framework where field definitions provided metadata when inspected on the class for automatic OpenAPI documentation generation, but performed data validation when accessed on instances. The initial implementation using naive descriptors failed because accessing User.email on the class returned the raw descriptor object, offering no type information or constraints.
One approach considered was implementing separate class methods for metadata retrieval. This involved creating a get_schema() method that manually inspected the class dictionary to extract field information. While explicit and easy to understand for junior developers, this created a dangerous disconnect between field definitions and their introspection capabilities. Pros: Straightforward implementation requiring no advanced Python knowledge. Cons: Violated the DRY principle, demanded maintenance of parallel logic structures, and proved error-prone when field definitions evolved.
The second approach leveraged the descriptor protocol's None convention by checking if instance is None inside __get__. When this condition was true, the descriptor returned a FieldSchema object containing type constraints and validators; otherwise, it performed validation and returned the actual value. Pros: Unified API under a single attribute name, followed Pythonic conventions, and provided automatic inheritance support. Cons: Required deep understanding of the CPython attribute lookup mechanism and proved harder to debug for developers unfamiliar with descriptor internals.
A third option involved using a metaclass to intercept class creation and inject synthetic properties for schema access. While this offered complete control over class behavior, it introduced significant complexity into the class hierarchy and complicated debugging efforts. Pros: Total behavioral control. Cons: Over-engineered for the requirements, affected method resolution order calculations, and increased import time overhead substantially.
The team selected the second solution because it utilized existing CPython mechanisms without introducing additional abstraction layers. The None check provided sufficient context to distinguish between documentation-time and runtime access patterns while reducing the codebase by forty percent compared to the explicit method approach.
The resulting framework allowed User.email to return a comprehensive schema object, while user.email returned the validated string value. This dual behavior enabled automatic OpenAPI specification generation through simple class inspection, reducing documentation maintenance by ninety percent and eliminating an entire category of synchronization bugs between implementation and documentation.
How do data descriptors (implementing both __get__ and __set__) differ from non-data descriptors in the attribute lookup precedence, and why does this distinction prevent instance dictionaries from shadowing class attributes in some cases but not others?
Data descriptors implement both __get__ and __set__, while non-data descriptors implement only __get__. In Python's attribute resolution mechanism, data descriptors take precedence over the instance's __dict__. This means that assignment to instance.attr will always invoke the descriptor's __set__ method, even if the instance previously had that key in its dictionary. Conversely, non-data descriptors allow the instance dictionary to shadow them; if you assign instance.attr = value, the instance gains a new entry in __dict__, and subsequent accesses retrieve this value instead of invoking the descriptor. This distinction is crucial for implementing cached properties (non-data) versus read-only attributes (data). Candidates frequently overlook that merely defining __set__ changes lookup semantics even if the method simply raises AttributeError, which is exactly how property objects enforce immutability.
Why must custom descriptors implement __set_name__ rather than capturing the attribute name in __init__, particularly when the same descriptor instance is assigned to multiple class attributes or used with inheritance?
When a single descriptor instance is assigned to multiple names (e.g., x = y = MyDescriptor()), storing the name in __init__ causes the second assignment to overwrite the first, leading to incorrect name resolution. Furthermore, during class inheritance, parent class descriptors are not re-initialized for subclasses. The __set_name__ method, introduced in Python 3.6, is invoked by the interpreter exactly once during class creation, receiving both the owner class and the attribute name. This ensures correct binding even with complex inheritance or multiple assignments. Without this method, descriptors cannot generate accurate error messages or perform introspection requiring their attribute name, resulting in silent failures during metaprogramming operations.
How does the descriptor protocol interact with __slots__, and what specific failure mode occurs when a custom descriptor in a slotted class shares its name with a slot?
Python's __slots__ mechanism implements data descriptors internally to manage attribute storage in fixed-size arrays rather than dictionaries. When you define __slots__ = ['name'], CPython creates a descriptor for name in the class dictionary. If you subsequently define a custom descriptor with def name(self): ..., you override the slot descriptor, breaking the slot mechanism entirely. This causes AttributeError because the custom descriptor lacks the C-level slot protocols necessary to access the slot storage. Candidates often miss that slot descriptors are data descriptors with specialized C implementations. The solution requires either using a distinct attribute name for the custom descriptor or carefully delegating to the original slot descriptor's __get__ and __set__ methods, though this requires rigorous handling to prevent infinite recursion.