ADR-0017: Standardize backend testing on pytest, Testcontainers and full request flows
- Status: Accepted
- Date: 2026-03-25
- Deciders: avm
- Supersedes:
- Superseded by:
Related ADRs
Context
The layered testing strategy already exists, but backend-specific rules for the test harness, doubles, and test data were still easy to dilute. Without a hard policy, teams quickly fall into several bad patterns:
- using SQLite as a cheap substitute for PostgreSQL and hiding real storage constraints;
- skipping Redis tests or replacing them with in-memory fakes;
- mocking internal services and repositories so heavily that tests stop validating the real execution path;
- building test payloads from raw
dictobjects and weakening contract boundaries; - scattering fixtures and helpers through random files instead of mirroring the project structure.
Decision
Backend testing is standardized as follows.
Test runtime:
pytestis the only backend test runner;- integration and persistence tests use real
PostgreSQLandRedisthroughTestcontainers; - SQLite is forbidden as the backend test database and is not an accepted substitute for
PostgreSQL; - migrations and persistence behavior are validated against the same relational model used by the production path.
Allowed execution path:
- backend integration and API tests validate the full path
request -> endpoint -> service -> repo -> db -> repo -> service -> endpoint -> response; - internal services, repositories, Unit of Work objects, and persistence adapters are not mocked in normal backend tests;
- when testing an external API client, substitution happens through a fake external service with predictable responses rather than by mocking internal methods;
- only external API boundaries and external side effects outside the service's ownership may be mocked.
Test data and factories:
- test data is created through
fakerandpolyfactory; - raw
dictobjects are not the main way to build test input and output; - payloads and expected values should be assembled through typed models, factories, dataclasses, or Pydantic structures;
- fixtures and factories must provide repeatable, typed arrangements.
Test organization:
- the test tree mirrors the project structure;
- backend tests live under
src/backend/tests/and reflectapps/,core/, andruntime/; - fixtures are declared only in
conftest.pyfiles at the appropriate scope; - tests are written as function-based pytest tests only;
- class-based test containers,
unittest.TestCase, and grouping through test classes are forbidden.
Consequences
Positive
- integration tests validate the real backend flow instead of a pre-mocked call chain;
- persistence and transaction behavior are exercised against production-like
PostgreSQLandRedis; - test data keeps typed boundaries and survives refactors better;
- test structure stays predictable and mirrors the service structure.
Negative
- integration tests are heavier and slower than SQLite or in-memory doubles;
- fake external services require more setup than direct monkeypatching.
Neutral
- unit tests for pure domain logic remain valid, but they still should use typed factories and function-based pytest style.
Alternatives considered
- using SQLite for cheap repository tests;
- mocking services and repositories in most endpoint tests;
- using raw
dictpayload builders as the main form of test data; - allowing test classes and arbitrary fixture placement next to individual test files.
Follow-up work
- [ ] add backend test tooling for
TestcontainerswithPostgreSQLandRedis - [ ] add static checks against SQLite and class-based pytest tests
- [ ] add
fakerandpolyfactoryfactories to the template baseline