ProgrammingBackend Developer

Explain how the built-in data serialization mechanism works in Python (the pickle and json modules). What are their differences, what are they used for, and what dangers does careless serialization pose?

Pass interviews with Hintsage AI assistant

Answer

In Python, the pickle and json modules are used for serialization (converting objects into a byte sequence or a string for storage/transmission):

  • pickle serializes any Python objects (even classes and functions!), but the result is a binary sequence. It works only within the Python ecosystem.
  • json serializes only simple data structures (dict, list, str, int, float), but the result is a string (a universal format) compatible with other languages.

Using Pickle for storing/sending data between untrusted parties is dangerous, as arbitrary malicious code can be executed during deserialization. json does not have this drawback.

Example:

import pickle import json # Pickle (binary serialization) data = {'x': 10, 'func': lambda x: x + 1} with open('data.pkl', 'wb') as f: pickle.dump(data, f) # JSON (only simple objects) data = {'x': 10, 'y': [1, 2, 3]} with open('data.json', 'w') as f: json.dump(data, f)

Trick Question

Question: Can pickle be used to serialize and save any Python objects between sessions? Why is this mechanism not recommended for saving user data?

Answer:

No, using pickle indiscriminately is a bad practice. Besides security (loading a "foreign" pickled object can compromise execution), there is the issue of version mismatches in Python or classes—serialized objects may fail to load or behave incorrectly if the class structure has changed.

Example:

# Loading a pickle file, class structure has changed import pickle with open('old_version.pkl', 'rb') as f: obj = pickle.load(f) # AttributeError or structure mismatch

History

Example 1

In a large project, pickle was used to store user profiles. After updating Python and changing classes, the structure of serialized objects lost compatibility, leading to system failure and data loss for most users.


Example 2

In a web service, pickle was used for user sessions. A malicious user uploaded a malicious pickled object, allowing code injection on the server.


Example 3

An attempt to serialize functions via pickle for transmission over the network failed in several environments: pickled lambdas cannot be transferred between machines with different configurations/versions of Python.