In Python, when working with threads, objects in memory are accessible to all threads without copying. However, when working with processes through multiprocessing, standard Python objects (such as lists and dictionaries) are passed between processes via serialization (usually through pickle), and each process gets its own copy of the object. Changes to the object in one process do not affect the copy in another process.
multiprocessing.Manager().dict(), should be used for communication between processes.# Multiprocessing from multiprocessing import Process, Manager def worker(d): d['a'] = 42 if __name__ == '__main__': # Simple dict does not work between processes managed_dict = Manager().dict() p = Process(target=worker, args=(managed_dict,)) p.start(); p.join() print(managed_dict['a']) # 42
In what cases will editing a regular list (list) in one process be reflected in another process?
Never, if working with multiprocessing. Regular Python objects are not shared between processes. To share, special primitives—Manager objects (for example, Manager().list(), Manager().dict())—must be used to synchronize state between processes.
Story
The web crawler development team initialized job queues as lists and distributed them among processes via multiprocessing. Jobs were "lost" because processes did not see each other's updates. They fixed it by using Manager().list().
Story
In an analytics service, logs were attempted to be collected into nested dicts within threads. Under high-load scenarios, this led to race conditions and data corruption due to a lack of locks.
Story
In an ETL service, while trying to gather data objects collected during the processing stage across multiple processes, duplication/loss of data was discovered: programmers did not take into account that each process was working with its own copy of the structure rather than a shared one.