PythonProgrammingSenior Python Developer

Through which reconstruction mechanism does **Python**'s `pickle` module allow classes to bypass `__init__` by supplying arguments directly to `__new__`?

Pass interviews with Hintsage AI assistant

Answer to the question

The pickle module's protocol evolved to handle objects where __init__ has side effects or expensive computations. Early protocols required calling __init__ during unpickling, which caused issues with resources like file handles or database connections. Protocol 2 introduced __getnewargs__, and Protocol 4 extended this with __getnewargs_ex__ to support keyword arguments, providing finer control over object reconstruction.

When unpickling objects, Python typically needs to recreate the object state. If __init__ performs validation, opens network sockets, or modifies global state, re-executing it during unpickling can be incorrect or inefficient. The challenge is to restore the object's state without triggering these initialization side effects, using only the stored data to reconstruct the instance via the lower-level __new__ constructor.

The __getnewargs_ex__ dunder method (or __getnewargs__ for older protocols) allows a class to return a tuple of (args, kwargs) that pickle passes directly to __new__, completely skipping __init__. This method is called during the reconstruction phase, and its return value dictates how the instance is created from the serialized bytes. This approach ensures that the object is instantiated with the correct initial state without invoking any initialization logic that might be inappropriate for a restored object.

import pickle class DatabaseConnection: def __new__(cls, dsn, timeout=30): instance = super().__new__(cls) instance.dsn = dsn instance.timeout = timeout return instance def __init__(self, dsn, timeout=30): # Expensive operation we want to skip during unpickle self.socket = create_socket(dsn, timeout) def __getnewargs_ex__(self): # Return args and kwargs for __new__ return ((self.dsn,), {'timeout': self.timeout}) def __getstate__(self): # Don't pickle the socket return {'dsn': self.dsn, 'timeout': self.timeout} def __setstate__(self, state): self.dsn = state['dsn'] self.timeout = state['timeout'] # Re-establish socket if needed, or leave for lazy init # Usage conn = DatabaseConnection('postgresql://localhost', timeout=60) serialized = pickle.dumps(conn, protocol=4) restored = pickle.loads(serialized) # __init__ not called

Situation from life

A data processing pipeline caches Redis connection objects that hold open TCP sockets and authentication tokens. When serializing these cache entries to disk for persistence between application restarts, calling __init__ during unpickling attempts to create new socket connections immediately, which fails in offline environments or creates resource leaks. This scenario requires a serialization strategy that preserves connection parameters while deferring actual network establishment until the application explicitly requests it.

Implement __getstate__ to return only the connection parameters (host, port, auth), and __setstate__ to manually set attributes and optionally reopen the connection. This approach is compatible with older pickle protocols and explicit. However, it still invokes __init__ during the default unpickling process unless carefully avoided with __reduce__, potentially triggering side effects before __setstate__ can clean up.

Implement __reduce__ to return a tuple of (callable, args, state), where the callable is a class method or __new__ itself. This provides complete control over reconstruction but is verbose and requires manual management of the state dictionary. This increases code complexity and the risk of version mismatches between the class structure and pickled data.

Implement __getnewargs_ex__ to return ((host, port), {'auth': token}), allowing pickle to call __new__(host, port, auth=token) directly while bypassing __init__. This solution was chosen because it leverages modern protocol 4 features, cleanly separates the 'create blank instance' phase from 'initialize resources' phase, and avoids the boilerplate of __reduce__. The result is a robust caching system where connection objects are restored with their configuration intact but sockets remain closed until explicitly needed, preventing resource exhaustion during batch unpickling operations.

What candidates often miss

Why does __getnewargs_ex__ prevent __init__ from being called, while __setstate__ alone does not?

When pickle reconstructs an object, it checks for __getnewargs_ex__ (or __getnewargs__). If present, the unpickler calls __new__(*args, **kwargs) with the returned values and immediately applies the state via __setstate__ if available, skipping __init__ entirely. In contrast, without these methods, pickle uses the default construction path which always invokes __init__ after __new__. Candidates often assume __setstate__ overrides initialization, but __setstate__ merely patches the instance after __init__ has already executed, which is too late for side-effect prevention.

What occurs if __getnewargs_ex__ returns a value that is not a tuple of two elements?

The pickle protocol strictly requires __getnewargs_ex__ to return a tuple of length 2: (args_tuple, kwargs_dict). If it returns a single tuple of arguments (like __getnewargs__), Python will raise a TypeError during unpickling because it attempts to unpack the result into __new__(*args, **kwargs). If it returns None or other types, the unpickler may crash or behave unpredictably, differing from __getnewargs__ which expects just a tuple of arguments.

How does __getnewargs_ex__ interact with __reduce_ex__ when both are defined?

__reduce_ex__ is the higher-level protocol method that orchestrates serialization. If a class defines __getnewargs_ex__, __reduce_ex__ (specifically in protocol 4+) automatically incorporates its return value into the reduction tuple using the NEWOBJ_EX opcode. If both are present but __reduce_ex__ returns a custom callable not using the standard reconstruction path, it takes precedence, potentially ignoring __getnewargs_ex__ entirely.