The @dataclass decorator is one of the tools introduced in Python 3.7 to reduce boilerplate code when creating simple data storage classes. Thanks to type annotations, Python automatically generates the __init__, __repr__, __eq__, and other methods.
Background:
Before the introduction of dataclass, developers wrote boilerplate classes manually, implementing constructors, comparison methods, repr, and often switched to named tuples or libraries like attrs. The introduction of @dataclass standardized and simplified this process.
Problem:
Boilerplate code, duplication of constructor and comparison method code often led to errors and complicated the maintenance of large applications.
Solution:
Using type annotations and the special @dataclass decorator allows for automatic generation of all necessary methods in the class.
Code example:
from dataclasses import dataclass @dataclass class Point: x: int y: int p1 = Point(10, 20) p2 = Point(10, 20) print(p1 == p2) # True, __eq__ is generated automatically print(p1) # Point(x=10, y=20), __repr__ is generated automatically
Key features:
Does @dataclass change inheritance behavior (specifics during inheritance)?
Yes. When inheriting dataclass classes, special attention is needed: the fields of the base class come before the fields of the derived one, and there may be errors when there are conflicts in constructors/argument order. If the base and derived classes have fields with the same names, the latter will override the former.
Can mutable default values be used in dataclass fields?
No, you cannot directly use such objects (e.g., a list) as defaults — you must use field(default_factory=list). Otherwise, all instances of the class will share the same collection.
Example:
from dataclasses import dataclass, field @dataclass class User: values: list = field(default_factory=list)
Is @dataclass fast for all scenarios? Is it suitable for optimal storage of large data arrays?
No. dataclass is not the most efficient option for memory optimization. For storing millions of objects, it is better to use __slots__, namedtuple, or specialized structures — dataclass adds auxiliary fields and does not save memory like slots do. You can combine them by passing the slots=True parameter (Python 3.10+), or use slots manually.
@dataclass class Cart: items: list = [] # error! c1 = Cart() c2 = Cart() c1.items.append("a") print(c2.items) # ['a'] — all Cart instances share one list
Pros:
Cons:
from dataclasses import dataclass, field @dataclass class Cart: items: list = field(default_factory=list) c1 = Cart() c2 = Cart() c1.items.append("a") print(c2.items) # []
Pros:
Cons:
field(default_factory=...) (which necessitates separate study).