When a Python class defines __eq__ to customize equality comparison, the interpreter automatically sets __hash__ to None unless explicitly overridden. This renders the instance unhashable, preventing its use as a dict key or set member. The underlying invariant requires that objects comparing equal via __eq__ must yield identical hash values; violating this causes undefined behavior in hash-based collections. Consequently, attempting to use such an object as a mapping key raises TypeError: unhashable type.
A development team was building a session management service where User objects served as keys in an in-memory cache dict to store active sessions. The User class implemented __eq__ to compare instances based on user_id, ensuring that two different objects representing the same database user were treated as equal. The initial implementation looked like this:
class User: def __init__(self, user_id, name): self.user_id = user_id self.name = name def __eq__(self, other): if not isinstance(other, User): return NotImplemented return self.user_id == other.user_id
Initially, the team did not implement __hash__, assuming default behavior would suffice. However, when the service attempted to cache a session using cache[user] = session_data, Python raised TypeError: unhashable type: 'User', crashing the service.
The team considered three solutions. The first approach used id(self) as the hash value. This was rejected because it violated the critical invariant: two distinct User instances with the same user_id would have different hashes despite being equal via __eq__. This caused them to appear as different keys, breaking cache lookups entirely and allowing duplicate entries for the same logical user.
The second approach used hash(self.user_id) as the hash value. This satisfied the invariant since equal users share the same user_id. However, this required ensuring user_id was immutable, as mutable hash values would cause the object to become "lost" in the dictionary if the ID changed after insertion.
The third option abandoned using User objects as keys, instead using the string user_id directly. While safe and simple, this sacrificed type safety and required maintaining a separate mapping from IDs to User objects, complicating the codebase with additional lookup logic.
The team chose the second solution, adding the following implementation to the class:
def __hash__(self): return hash(self.user_id)
They also made user_id a read-only property to ensure immutability. This preserved the ability to use User instances as keys while maintaining correct equality semantics. The result was a robust cache that correctly identified users regardless of object instance identity.
Why does Python automatically set __hash__ to None when __eq__ is defined but __hash__ is not?
When a class defines __eq__, the default identity-based hash inherited from object becomes logically invalid. The default __hash__ relies on id(self), meaning two distinct objects have different hashes. If __eq__ is overridden to compare values, two different instances could be equal but would have different hashes, violating the fundamental rule that a == b implies hash(a) == hash(b). Python prevents this inconsistency by setting __hash__ to None, explicitly marking the class as unhashable rather than allowing dangerous default behavior that would cause erratic dictionary performance or unreachable keys.
What happens if a mutable object is used as a dictionary key after implementing __hash__ based on mutable fields?
If __hash__ depends on mutable state, the hash value can change after the object is inserted into a dict. Dictionaries store keys in hash buckets based on the hash value at insertion time. If the hash later changes due to mutation, subsequent lookups calculate a different hash and search a different bucket, making the original key unreachable. The object remains in memory but cannot be found or deleted via normal key access. This creates a memory leak and logical inconsistency, which is why Python requires that hashable objects be immutable or based on immutable identifiers.
How does the @dataclass decorator handle __eq__ and __hash__ generation, and what is the risk of using unsafe_hash=True?
By default, @dataclass generates __eq__ based on field values but sets __hash__ to None, making instances unhashable. This conservative default prevents bugs with mutable dataclasses. To enable hashing, you must either set frozen=True (making fields read-only and generating a safe __hash__) or explicitly set unsafe_hash=True. The unsafe_hash=True parameter forces Python to generate __hash__ based on field values even if the fields are mutable. This is dangerous because if any field changes after the object is used as a dictionary key, the hash changes and the key becomes unreachable, leading to the "lost key" problem described previously. Candidates often miss that unsafe_hash is not merely a warning but a functional risk that breaks dictionary invariants.