ProgrammingPython Developer

What happens when passing composite objects (such as a list of dictionaries) between threads or processes in Python? What is important to remember when programming multithreaded and multiprocess applications related to mutable collections?

Pass interviews with Hintsage AI assistant

Answer.

In Python, when working with threads, objects in memory are accessible to all threads without copying. However, when working with processes through multiprocessing, standard Python objects (such as lists and dictionaries) are passed between processes via serialization (usually through pickle), and each process gets its own copy of the object. Changes to the object in one process do not affect the copy in another process.

Key differences:

  • Multithreading (threading): All threads operate in a shared address space. Any objects (including nested collections) are shared.
  • Multiprocessing (multiprocessing): Each process has its own memory; objects are passed via serialization. Special structures, such as multiprocessing.Manager().dict(), should be used for communication between processes.

Example:

# Multiprocessing from multiprocessing import Process, Manager def worker(d): d['a'] = 42 if __name__ == '__main__': # Simple dict does not work between processes managed_dict = Manager().dict() p = Process(target=worker, args=(managed_dict,)) p.start(); p.join() print(managed_dict['a']) # 42
  • If a regular dict were used, changes in the process would be invisible in the main process.

Trick question.

In what cases will editing a regular list (list) in one process be reflected in another process?

Answer:

Never, if working with multiprocessing. Regular Python objects are not shared between processes. To share, special primitives—Manager objects (for example, Manager().list(), Manager().dict())—must be used to synchronize state between processes.

Examples of real errors due to lack of knowledge of the nuances of the topic.


Story

The web crawler development team initialized job queues as lists and distributed them among processes via multiprocessing. Jobs were "lost" because processes did not see each other's updates. They fixed it by using Manager().list().


Story

In an analytics service, logs were attempted to be collected into nested dicts within threads. Under high-load scenarios, this led to race conditions and data corruption due to a lack of locks.


Story

In an ETL service, while trying to gather data objects collected during the processing stage across multiple processes, duplication/loss of data was discovered: programmers did not take into account that each process was working with its own copy of the structure rather than a shared one.