Automated Tests in Python

Making your tests work harder and smarter.. to make your life easier.

Like any good developer, you write tests all the time. Regardless of whether you practice test-driven development, write your tests last, or somewhere in between, you are already sold on the benefits. In modern programming, this is expected in most engineering teams - a great situation!

What's not universally clear: when most people say "unit tests", they really mean "automated tests" - and for Python applications, there are three distinct kinds of automated test you can write. Each have their own strengths, and are appropriate in different situations:

Unit tests,
Integration tests, and
End-to-end tests.

Let's look at what these are.

Unit Tests

A unit test covers a specific component in isolation. This "component" can be a function, a class, or some composition of them together. As I define it here, a unit test checks for a specific response to a particular input. For example, suppose you have implemented a callable named wordcount, which takes a single string argument - a body of text - and returns a dictionary. The dictionary's keys are words in the body of text; the values are the number of times that word appears:

>>> wordcount('foo bar foo  ')
{'foo': 2, 'bar': 1}

A simple unit test for this might look like:

from wordcount import wordcount

class TestUnit(unittest.TestCase):
    def test_wordcount(self):
        self.assertDictEqual(
            {'foo' : 2, 'bar' : 1},
            wordcount('foo bar foo  '))

The key idea of a unit test - testing a component in isolation - extends in some non-obvious ways. Strictly speaking, an automated test is not a unit test if it does any of the following:

Any operation over the network
Read from or write to any data store
Connect with any third-party service or API
Touch the file system, except perhaps to load some initial test data or a fixture

These are more properly thought of as integration tests. In practice, we've all seen (and written) tests that we call "unit tests", and which do one or more of the above. More on this later.

Integration Tests

An integration test checks how two different components or subsystems interact with each other. Like a unit test, it generally checks for a specific response to a particular input, though what constitutes "input" can be a little more complicated.

One category of integration test has to do with how your application works (or doesn't) with some external resource:

Your Python code persists a set of records to a PostgreSQL table, in a way that triggers a stored procedure. You want to verify the final outcome.
Your Python code inserts a document into a Mongo collection, and you want to verify that it includes all the required fields.
A side effect of your code is that a certain file is uploaded to an AWS S3 bucket. You want to verify this file was uploaded, and has the right contents.

(Another item on this list: writing a test to verify two microservices work together correctly. This kind of integration test has some fun surprises, so it has its own section below.)

Unlike a unit test, this all depends on something outside of the immediate codebase being exercised. Because it tests the actual integration, such a test will catch bugs that unit tests never will. The downside is speed (such tests wills obviously run much slower, generally speaking), but also reliability (if your network connection hiccups, you can't run that S3 test). There is also the ops/sysadmin factor; an integration test may not run if the dev environment isn't set up with a test database, sandboxed AWS credentials, etc., while unit tests can generally be made to run from a fresh source control checkout and virtualenv.

"Internal" integration tests

Another form of integration test is more internal to your application - testing how several components or services within the application integrate. Let's pick a nice boring example: an employee expense-tracking system. Your relevant classes are Employee and Expense:

class Expense:
    def __init__(self, description, amount):
        self.description = description
        self.amount = amount
        self.is_paid = False
        
class Employee:
    def __init__(self, name, employee_id):
        self.name = name
        self.id = employee_id
    # Etc.

Now say we have some services used in the application - one of which is an expense tracker service:

class _ExpenseTracker:
    def __init__(self):
        # Maps employee IDs to a list of expenses
        self._expenses = dict()

    def addExpense(self, employee: Employee, expense: Expense):
        # Record an expense for this employee.

    # And other methods...

# There can be only one.
# (This is one good way to do singletons in Python, by the way.)
expensetracker = _ExpenseTracker()

And also a reimbursement service:

from collections import defaultdict
class _Reimburser:
    def __init__(self):
        self._payout_history = defaultdict(int)

	def reimburseEmployee(self, employee: Employee):
	    # Reimburse the employee.
	
    # And other methods...

reimburser = _Reimburser()

(I'm eliding some code for space, but this is fully implemented if you'd like more detail.) Then an integration test might look like this:

import unittest
from expensetracking import Expense, Employee, reimburser, expensetracker
class TestPayouts(unittest.TestCase):
    def test_expenses_paid(self):
        employee = Employee('Aaron Maxwell', 128)
        expensetracker.addExpense(employee, Expense('frisbee', 7.25))
        expensetracker.addExpense(employee, Expense('hockey stick', 49.95))
        expensetracker.addExpense(employee, Expense('cool sunglasses', 29.99))

        # Total of all expenses so far is $87.19.
        self.assertEqual(87.19, expensetracker.totalUnpaidExpensesForEmployee(employee))
        # And I have not been reimbursed yet at all.
        self.assertEqual(0, expensetracker.totalPaidExpensesForEmployee(employee))
        self.assertEqual(0, reimburser.totalPaidForEmployee(employee))

        # Now the reimburser service starts the reimbursement process.
        reimburser.reimburseEmployee(employee)

        self.assertEqual(0, expensetracker.totalUnpaidExpensesForEmployee(employee))
        self.assertEqual(87.19, expensetracker.totalPaidExpensesForEmployee(employee))
        self.assertEqual(87.19, reimburser.totalPaidForEmployee(employee))

(This is an abridged version of the full test - use the source.)

End-To-End Tests

An end-to-end test extends further than an integration test, to validate an entire flow in your application. Imagine you are implementing a business networking website. A typical flow for a new user may look like:

User creates and verifies a new account.
User imports a list of contacts, in the form of email addresses.
Some of these are existing members of the network site (set up in the test fixture), and accept the new user into their network.
Some of the contacts are not existing members, triggering some kind of invitation flow.

An end-to-end test will exercise all these steps in sequence, making assertions and checks at many intermediate points. The idea is to exercise a full application flow of business value. It is especially helpful in a continuous deployment environment, where the engineering team can deploy many times per day. Excellent automated tests are critical to ensure code implementing one feature does not break a different one.

End-to-end tests do not replace manual testing, but they automate significant parts of it.

Some frameworks provide a kind of testing client or driver you can use to implement end-to-end tests. Django, for example, provides a test web-browser client in its django.test.TestCase class. If you are building on a framework that doesn't provide a suitable test client, or have implemented your own framework, creating your own is often a good return on your effort. That test client can then be used as a foundation for implementing a wide range of end-to-end tests over time.

End-to-end tests generally do not test the UI. What we are focused on here are tests for Python code itself. Your application provides hooks to the UI layer (in the above example, it would be a set of HTTP endpoints); the end-to-end test exercises those hooks directly, rather than exercising the UI layer itself. That requires a different sort of test, covered by tools like Selenium.

Actually, there is one situation where an end-to-end test can easily test the UI: when you are developing a command-line tool. Then your test can invoke that tool in a subshell, invoking a full command line, and measuring the results and output.

The Distinctions Are Fuzzy

In practice people have different ideas of where the line between a unit test and integration test is drawn, and think of what I'm calling end-to-end tests as just a variant of an integration test. I advise you to be flexible on this rather than dogmatic with your coworkers. It's not worth wasting one second arguing over semantics.

That said, I have broken these down in the way I did because it tends to be a very useful delineation, especially over the long term as an application evolves and is developed over time. That last bit is important, because the early stages of development can mislead you here. The vocabulary isn't so important, but the ideas are.

Something I have seen a lot is automated tests that are almost unit tests (as I define them), except they read from and write to a database like PostgreSQL, MySQL or MongoDB. This doesn't seem like a big deal at first, during the initial weeks or even months of a new code base's lifetime. The problem that eventually shows up has to do with how the different tests give different types of feedback.

The Real Value of Unit Tests

If your automated tests are segmented in terms of unit and integration tests, you probably have a build environment in which they can independently run. This will be different targets in Make or Pavement or whatever tool you use (even if there is also a target that runs all tests). The point of the restrictions on unit tests is so they run very fast. Ideally, all unit tests run in less than one minute total on your development machine. Even better if it is under a dozen seconds. This allows you to run them early and often as you develop, quickly discovering if your changes break some other part of the system. You can separately run the integration and end-to-end test suites - which may take several minutes to run, or longer - with less frequency.

That threshold of one minute is important for you cognitively. That means it impacts your concentration, your focus, and your productivity as a developer. When you are holding thoughts in your mind, waiting 15 seconds is unlikely to cause you to lose your context; waiting two minutes probably will.

Database calls are expensive in terms of wall-clock time. In the early stages of a project, what sometimes happens is that unit tests are written that trigger database operations. Because the application is young, the body of tests are small, and the full unit test suite completes very quickly. As you and your teammates add more code - more classes, more services, more components - the wall-clock time keeps creeping upward. Before long, you are waiting too many minutes for the complete unit test suite to run.

What I just described for database calls all applies to test code calling out to some external API, or loading a resource over HTTP, and so on. If you find yourself in the habit of starting your unit tests runs just before a coffee or bathroom break, you've lost your agility - regardless of the cause.

And this is the reason for the strictures on unit tests above. By following them, even very substantial test suites can run quickly. Being explicit on the three types of test from the start can save your project a lot of trouble later. And for an existing code base (which you are statistically more likely to be working on right now), you can incrementally improve your test suite over time, starting with the next test you write.

Testing Microservices

If your application is split into microservices, you will need automated tests that check the coupling between two or more of them. This is a kind of integration test, and quickly becomes very important for the reliability of such systems.

Depending on how modular and self-contained your different services can be - which depends on both the team's style of implementing, as well as the requirements of the problem domain - this probably won't scale indefinitely. The graph of different services can become too tightly and fully connected. You'll know this happens when you are trying to write a test that checks how services A and B interact, but to get them to bootstrap, you have to write mocks for services C, D, E, and maybe F, just so A and B can get to the testable point.

If this happens, you have a couple of options. The first thing I would look at is whether it's practical to evolve the individual services to be more self-contained. Having to mock more than one or two services is a kind of code smell, that warns you the microservices may be more interleaved than they need to be.

Still, in my experience, sometimes this is unavoidable; some key services just have to interoperate with many others, and that's sincerely the best way to implement the system. In that case, you will need to rely on end-to-end tests more.

In fact, even if you do not have this problem, I strongly recommend implementing at least one comprehensive end-to-end test for any microservices-based application, and to create it early on. The subtle couplings between N different services quickly transcend a level that can be easily and fully understood by the human mind. An end-to-end test will always catch bugs that simpler integration tests will not; this is exponentially true for microservices.

Other Dimensions

This article describes one useful dimension for classifying automated tests for Python code. There are other classifications as well, such as functional tests, system tests, user acceptance tests, white-box and black-box tests, and many more. These are mostly orthogonal.

There many more potential advanced Python testing topics - tests for nondeterministic code, testing multithreaded code, test design patterns, and more. Ping me if you'd like to see one in particular, or have a comment on this essay.

White Paper For Teams