The Line Between Unit and Integration Tests Is Thinner Than You Think
The Line Between Unit and Integration Tests Is Thinner Than You Think#
Last November I got into a debate with Airam and Joan over lunch about whether a test that calls a use case with an in-memory repository is a unit test or an integration test. The conversation lasted an hour and we did not agree. Last week Diego and I had the exact same argument over a pull request. Same tension, same lack of resolution.
This post is me trying to actually figure out what I think.
What the Textbooks Say#
The classical definition is clean. Almost suspiciously clean.
Unit test: tests a single unit (function, class, method) in complete isolation. All dependencies are mocked or stubbed. Fast. Deterministic. If it fails, you know exactly what broke.
Integration test: tests two or more units working together. Real collaborators, real interactions. Slower. When it fails, you know something broke but you might need to investigate where.
The test pyramid says: many unit tests at the base, fewer integration tests in the middle, even fewer end-to-end tests at the top. The logic is economic — unit tests are cheap to write, cheap to run, and pinpoint failures precisely.
This framing has been the dominant model for decades. It works. I am not here to say it is wrong.
But I am here to say it leaves a question unanswered.
The Behavior Question#
Kent C. Dodds flipped the pyramid into a trophy and said something that stuck with me: "The more your tests resemble the way your software is used, the more confidence they can give you."
This sounds obviously correct. If a user clicks a button and expects a toast notification, the most valuable test is one that renders the component, simulates a click, and asserts that the toast appears. Not one that checks whether showToast was called with the right arguments.
Testing behavior, not implementation.
But here is the tension nobody talks about clearly enough: testing behavior almost always means testing multiple units together. The click handler calls a function, which calls a service, which updates state, which triggers a re-render, which shows the toast. If your test covers that entire chain, it is not a unit test by any classical definition. It is an integration test.
So are we saying integration tests are better than unit tests? That is not what Dodds says, but it is where the logic leads if you are not careful.
Where I Keep Getting Stuck#
In the project I work on daily, we follow hexagonal architecture. Clean layers: domain entities, application use cases, infrastructure adapters. The dependency rule flows inward. The domain has zero external dependencies.
Here is the question that started the argument with Airam and Joan:
// Is this a unit test or an integration test?
describe("GetBudgetStatus", () => {
it("excludes transactions from excluded accounts", async () => {
const budgetRepo = new InMemoryBudgetRepository();
const txRepo = new InMemoryTransactionRepository();
const accountRepo = new InMemoryAccountRepository();
// ... seed data ...
const useCase = new GetBudgetStatus(budgetRepo, txRepo, accountRepo);
const result = await useCase.execute(budgetId);
expect(result.totalSpent).toBe(500); // excludes the excluded account
});
});
Airam's position: it is an integration test. It tests multiple collaborators working together — the use case, three repositories, the domain logic inside Account and Transaction entities. Even though the repositories are in-memory stubs, the test exercises the interaction between real objects.
Joan's position: it is a unit test. The "unit" is the use case. The repositories are test doubles. No I/O, no database, no network. It runs in milliseconds. By any practical definition, this is a unit test.
I kept going back and forth. They were both right. That was the problem.
The Definitions Are Broken (Or At Least Incomplete)#
Here is what I think is actually happening: the classical definitions assume a world where "unit" means "a single class or function." But in practice, the useful boundary is not the class — it is the behavior.
A use case is a unit of behavior. It takes an input, orchestrates some domain logic, and produces an output. The fact that it collaborates with entities and repositories internally does not make testing it an integration test — any more than testing a function that calls two private helper functions would be an integration test.
The alternative — mocking every dependency of the use case — gives you tests that are tightly coupled to implementation details. Change the internal wiring of the use case and every test breaks, even if the behavior is identical.
// Over-mocked: coupled to implementation, not behavior
it("calls findExcludedAccountIds before sumByCategory", () => {
const mockAccountRepo = { findExcludedAccountIds: vi.fn().mockResolvedValue(["acc-1"]) };
const mockTxRepo = { sumByCategory: vi.fn().mockResolvedValue([]) };
// ... 15 more mocks ...
await useCase.execute(budgetId);
expect(mockAccountRepo.findExcludedAccountIds).toHaveBeenCalledBefore(
mockTxRepo.sumByCategory
);
});
// This test knows too much about HOW the use case works.
// Refactor the internals and it breaks even though the output is identical.
That test has 100% isolation and zero value. It will pass today and break tomorrow when you reorder two lines that have no effect on the result. Meanwhile, the test with in-memory repositories tells you whether the behavior is correct and it survives refactoring.
Where I Landed (For Now)#
After four months of having this argument with different people, here is my current mental model:
Test boundaries should match behavior boundaries, not class boundaries.
In hexagonal architecture, the natural boundaries are:
-
Domain entities and value objects — pure logic, no dependencies. These are genuine unit tests. Test them in isolation. Mock nothing because there is nothing to mock.
-
Use cases — orchestrate domain logic through ports. Test them with in-memory implementations of the ports. Call this whatever you want — unit test, integration test, I genuinely do not care about the label anymore. What matters: they test behavior, they run fast, they survive refactoring.
-
Infrastructure adapters — talk to real external systems. These need actual integration tests (hit a real database, call a real API). Expensive, slow, worth having a few of.
-
UI components — render, interact, assert. React Testing Library's philosophy is right here. Test what the user sees, not what the component does internally.
The test pyramid still applies, but the layers map to architecture layers, not to class-level isolation.
The Diego Conversation#
Diego pushed back on something specific: "If you test use cases with in-memory repos, you are not testing the MongoDB queries. What if the aggregation pipeline has a bug?"
He is right. And that is exactly why you also need infrastructure-level tests that hit the real database. The in-memory repo tests verify the business logic. The MongoDB repo tests verify the query logic. Different concerns, different test levels.
The mistake is thinking you have to choose one or the other. You need both. The question is just how many of each.
My ratio in practice: for every MongoDB repository, I have 2-3 tests that verify the query logic against a real database. For the use case that depends on that repository, I have 5-10 tests with in-memory doubles that cover all the business rule edge cases. The repository tests are slow and I run them less often. The use case tests are fast and I run them on every save.
What I Actually Think About the Pyramid#
The test pyramid is a useful heuristic with a misleading name at the base. "Unit test" suggests class-level isolation. What it should say is "fast, focused, reliable test." Whether that test exercises one class or five classes working together through in-memory collaborators is an implementation detail of the test itself.
Kent C. Dodds's testing trophy is a correction in the right direction — test behavior, not implementation — but it can be misread as "just write integration tests." That is not what he means, and it is not what I am saying either.
What I am saying: the line between unit and integration is thinner than the textbooks suggest, and drawing it at the class boundary produces worse tests than drawing it at the behavior boundary.
Airam, Joan, Diego — if you are reading this, I still do not have a clean answer. But I think I have a useful one. And I suspect next time we argue about it, we will argue about something else instead. That is progress.