LEARN · DEBUGGING GUIDE

Debugging Mutable Default Arguments in Python Dataclasses

Python dataclasses silently share mutable default values across instances, causing data corruption. This guide shows you exactly how to detect, fix, and prevent this classic Python footgun.

IntermediatePython7 min read

What this usually means

The root cause is Python's evaluation of default arguments at function definition time. When you define a dataclass with a mutable default (like `[]` or `{}`), that default object is created once and shared across all instances that don't provide an explicit value. This is not a dataclass-specific bug—regular classes with `__init__` have the same problem—but dataclasses often make it more subtle because the defaults are declared alongside type annotations, making them easy to overlook. The fix is to use `field(default_factory=list)` instead of `[]` as the default.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 11. Check if any dataclass field has a mutable default (list, dict, set, or custom mutable object) directly assigned, e.g., `items: list = []`.
  • 22. Print `id()` of the default attribute for two different instances: `print(id(instance1.items))` and `print(id(instance2.items))`. If IDs match, you have shared state.
  • 33. Run a simple test: create two instances, modify the default attribute on one, and check if the other is affected.
  • 44. Search codebase for `= []` or `= {}` in dataclass field definitions using grep: `grep -rn "\[\]" --include='*.py' | grep '@dataclass'`.
  • 55. Use Python's `dataclasses.fields()` to inspect field defaults programmatically: `for f in fields(MyClass): if f.default is not dataclasses.MISSING and isinstance(f.default, (list, dict, set)): print(f'Field {f.name} has mutable default')`.
  • 66. Review recent commits that added dataclasses with default empty containers.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchDataclass definitions in your codebase, especially newly added ones: `grep -rn '@dataclass' --include='*.py'`.
  • searchAny field with a default of `[]`, `{}`, `set()`, or similar: `grep -rn '= \[\]\|=\|= set()' --include='*.py'`.
  • searchPull requests or commits that introduced dataclasses with default mutable values.
  • searchTest files that create multiple instances of the same dataclass and modify default attributes.
  • searchProduction monitoring dashboards showing data anomalies (e.g., duplicated entries in lists).
  • searchPython documentation for dataclasses.field and default_factory.
  • searchPyCharm or VSCode warnings: many IDEs flag mutable defaults in dataclass fields.
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningDirectly assigning `[]` or `{}` as default in a dataclass field definition.
  • warningCopy-pasting a regular class with mutable defaults into a dataclass without adapting the defaults.
  • warningAssuming that dataclasses automatically deep-copy defaults per instance (they don't).
  • warningUsing a mutable object from a third-party library as a default without wrapping in `field(default_factory=...)`.
  • warningMigrating from a manually written `__init__` to dataclass decorator and forgetting to change mutable defaults to `default_factory`.
  • warningNot catching the issue in code review because the default looks innocent.
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildUse `field(default_factory=list)` instead of `= []` for mutable defaults.
  • buildFor custom mutable objects, pass a factory function or lambda: `field(default_factory=MyList)` or `field(default_factory=lambda: [])`.
  • buildUse immutable defaults where possible: `items: tuple = ()` instead of `list`.
  • buildSet default to `None` and initialize in `__post_init__`: `def __post_init__(self): if self.items is None: self.items = []`.
  • buildLeverage static analysis tools like `pylint` or `mypy` with plugins that warn about mutable defaults (e.g., `pylint`'s `dangerous-default-value`).
  • buildAdd a unit test that explicitly checks for shared mutable state between instances.
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedCreate two instances, call a method that mutates the default field on one, and assert the other's field is unchanged.
  • verifiedPrint `id()` of the field for both instances; they should differ after the fix.
  • verifiedRun the full test suite to ensure no cross-test contamination.
  • verifiedInspect the field's default with `dataclasses.fields(MyClass)[0].default`; it should be `dataclasses.MISSING` when using `default_factory`.
  • verifiedDeploy to staging and run a smoke test that exercises the dataclass under load.
  • verifiedUse memory profiling to confirm that each instance has its own copy of the mutable object.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningUsing `field(default_factory=[])` — that still evaluates the list once at definition time. Always pass a callable, not a value.
  • warningThinking that `copy.deepcopy` is applied automatically; it's not.
  • warningFixing only one field while leaving other mutable defaults in the same class.
  • warningRelying solely on `__post_init__` without handling the `None` default case correctly (e.g., forgetting to check for `None`).
  • warningAdding `field(default_factory=list)` but forgetting to import `field` from `dataclasses`.
  • warningIgnoring warnings from IDEs or linters about mutable defaults.
( 07 )War story

Shared Shopping Cart: A Dataclass Mutable Default Production Meltdown

Backend EngineerPython 3.9, FastAPI, PostgreSQL, Docker

Timeline

  1. 10:15PagerDuty alert: Shopping cart items duplicated for user_id=4521.
  2. 10:18Checked logs: cart.items list had duplicate entries, e.g., ['item1', 'item1'].
  3. 10:22Reproduced locally with a simple script: creating two cart instances and adding items to one affected the other.
  4. 10:25Identified the Cart dataclass: `items: List[str] = []`.
  5. 10:28Confirmed shared state by printing `id()` of items for two instances: same ID.
  6. 10:30Deployed hotfix: changed to `items: List[str] = field(default_factory=list)`.
  7. 10:35Monitored logs; duplicate entries stopped. Verified with synthetic test.
  8. 10:45Postmortem: found 3 other dataclasses with similar issues in codebase; fixed all.

The alert hit at 10:15 AM on a Thursday. A user's shopping cart showed duplicate items—every item appeared twice. I jumped into the logs and saw that the `cart.items` list had entries like `['item1', 'item1']`. My first thought was a race condition in the API endpoint, but the endpoint was synchronous and single-threaded per request. Then I noticed the pattern: all duplicates were the same item, not two different items added simultaneously.

I reproduced the bug locally with a simple script: `cart1 = Cart(); cart2 = Cart(); cart1.add_item('item1'); print(cart2.items)`. Sure enough, `cart2.items` showed `['item1']`. I printed `id(cart1.items)` and `id(cart2.items)`—identical. That's when I remembered the classic Python mutable default argument trap. The `Cart` dataclass had `items: List[str] = []`, so the empty list was created once at class definition and shared across all instances.

I pushed a hotfix changing the default to `field(default_factory=list)`, redeployed, and the duplicates stopped immediately. The postmortem revealed three other dataclasses with the same anti-pattern. We added a linter rule to catch mutable defaults in dataclasses and wrote a unit test that creates two instances, modifies one, and asserts the other is unchanged. The lesson: never use `[]` or `{}` as a default in a dataclass (or any class). Always use `default_factory` or `None` with `__post_init__`.

Root cause

Mutable default argument `[]` in a dataclass field, causing the same list object to be shared across all instances.

The fix

Changed `items: List[str] = []` to `items: List[str] = field(default_factory=list)`.

The lesson

Always use `field(default_factory=...)` for mutable defaults in dataclasses. Add linter rules and unit tests to catch this early.

( 08 )Why Python Evaluates Defaults at Definition Time

Python's default argument evaluation happens once when the function (or class) is defined, not each time the function is called. This is a design choice for performance—avoiding recreation of default objects on every call. For immutable defaults like integers or strings, this is fine because they can't be mutated. But for mutable objects like lists or dicts, the same object is reused, leading to shared state across all callers.

In dataclasses, the `__init__` method is generated automatically. When you write `items: list = []`, the generated `__init__` has a default argument `items=[]`. That empty list is created once at class definition time. Every time you create a `Cart()` without providing an explicit `items` argument, the same list object is assigned to `self.items`. This is why modifying `cart1.items` also modifies `cart2.items`.

( 09 )The Correct Use of field(default_factory=)

The `dataclasses.field()` function provides a `default_factory` parameter that takes a callable (e.g., a function or class) invoked each time a default is needed. For example, `field(default_factory=list)` calls `list()` every time a new instance is created, producing a fresh empty list. This is exactly what you want.

Common mistakes: passing `default_factory=[]` (which evaluates to the same list object, defeating the purpose) or forgetting to import `field`. Also, be careful with lambda: `default_factory=lambda: []` works but is less readable than just `list`. For dicts, use `default_factory=dict` or `default_factory=lambda: {}`. For sets, use `default_factory=set`.

( 10 )Alternative Approaches: None + __post_init__

Some teams prefer the pattern of setting default to `None` and initializing in `__post_init__`. Example: `items: list | None = None` and then `def __post_init__(self): if self.items is None: self.items = []`. This is explicit and works well, but it adds boilerplate and can mask type issues (e.g., someone passing `None` explicitly).

The `default_factory` approach is more idiomatic for dataclasses. However, if you need to initialize based on other fields, `__post_init__` is necessary. For simple cases, stick with `default_factory`.

( 11 )Detecting Mutable Defaults with Static Analysis

PyCharm and VSCode with Pylance will warn you about mutable defaults in dataclass fields. You can enforce this in CI with `pylint` (enable `dangerous-default-value` warning) or `mypy` with the `dataclass` plugin. A simple grep can also catch most cases: `grep -rn '= \[\]\|=\|= set()' --include='*.py' | grep '@dataclass'`.

Consider adding a custom unit test that introspects all dataclasses in the codebase and checks for mutable defaults. Use `dataclasses.fields()` and inspect `field.default` for each field. If it's a mutable object (list, dict, set, etc.), fail the test. This catches regressions early.

( 12 )When the Mutable Default is Not a Built-in

Sometimes the default is a custom mutable object, e.g., `config: Config = Config()`. Even if `Config` is defined elsewhere, the same instance is shared. The fix is to use `default_factory=Config` or `default_factory=lambda: Config()`. Be careful: if `Config` takes arguments, you need a lambda or a custom factory function.

For mutable objects from third-party libraries, the same rule applies. Always wrap in `default_factory`. If the object is expensive to create, consider lazy initialization in `__post_init__` or use a singleton pattern if intentional sharing is desired (rarely).

Frequently asked questions

Does this bug affect regular classes with __init__?

Yes. The same issue occurs in any class or function with mutable default arguments. Dataclasses just make it easier to write because you declare defaults inline with type annotations. The fix is the same: use a sentinel like None and initialize in __init__, or use a factory. In dataclasses, field(default_factory=...) is the most convenient.

Can I use `field(default_factory=lambda: [])` instead of `list`?

Yes, but it's unnecessary. `list` is a callable that returns a new empty list, so `default_factory=list` is cleaner and faster. Use lambda only if you need to pass arguments (e.g., `lambda: [0] * 10`).

What if I want the default to be a shared mutable object intentionally?

That's a bad idea in most cases because it leads to surprising behavior. If you must share state, consider using a class variable or a singleton pattern explicitly. Document it heavily. But 99% of the time, you want per-instance state.

Does using `list` as default_factory create a new list every time?

Yes. `list()` is called each time a new instance is created, producing a distinct empty list. This is exactly what you need to avoid shared state.

How do I find all such bugs in an existing codebase?

Run a script that imports all modules, then uses `dataclasses.fields()` on every dataclass to check if any field has a mutable default. Alternatively, use `grep` as earlier. Linters like `pylint` with `--enable=W0102` can catch them.