Advanced Functions

In this chapter, we go beyond the basics of using functions. I’ll assume you can define and work with functions taking default arguments:

>>> def foo(a, b, x=3, y=2):
...     return (a+b)/(x+y)
...
>>> foo(5, 0)
1.0
>>> foo(10, 2, y=3)
2.0
>>> foo(b=4, x=8, a=1)
0.5

Notice the last way foo is called: with the arguments out of order, and everything specified by key-value pairs. Not everyone knows that you can call any function in Python this way. So long as the value of each argument is unambiguously specified, Python doesn’t care how you call the function (and this case, we specify b, x and a out of order, letting y be its default value). We’ll leverage this flexibility later.

This chapter’s topics are useful and valuable on their own. And they are important building blocks for some extremely powerful patterns, which you learn in later chapters. Let’s get started!

Accepting & Passing Variable Arguments

The foo function above can be called with either 2, 3, or 4 arguments. Sometimes you want to define a function that can take any number of arguments - zero or more, in other words. In Python, it looks like this:

# Note the asterisk. That's the magic part
def takes_any_args(*args):
    print("Type of args: " + str(type(args)))
    print("Value of args: " + str(args))

See carefully the syntax here. takes_any_args() is just like a regular function, except you put an asterisk right before the argument args. Within the function, args is a tuple:

>>> takes_any_args("x", "y", "z")
Type of args: <class 'tuple'>
Value of args: ('x', 'y', 'z')
>>> takes_any_args(1)
Type of args: <class 'tuple'>
Value of args: (1,)
>>> takes_any_args()
Type of args: <class 'tuple'>
Value of args: ()
>>> takes_any_args(5, 4, 3, 2, 1)
Type of args: <class 'tuple'>
Value of args: (5, 4, 3, 2, 1)
>>> takes_any_args(["first", "list"], ["another","list"])
Type of args: <class 'tuple'>
Value of args: (['first', 'list'], ['another', 'list'])

If you call the function with no arguments, args is an empty tuple. Otherwise, it is a tuple composed of those arguments passed, in order. This is different from declaring a function that takes a single argument, which happens to be of type list or tuple:

>>> def takes_a_list(items):
...     print("Type of items: " + str(type(items)))
...     print("Value of items: " + str(items))
...
>>> takes_a_list(["x", "y", "z"])
Type of items: <class 'list'>
Value of items: ['x', 'y', 'z']
>>> takes_any_args(["x", "y", "z"])
Type of args: <class 'tuple'>
Value of args: (['x', 'y', 'z'],)

In these calls to takes_a_list and takes_any_args, the argument items is a list of strings. We’re calling both functions the exact same way, but what happens in each function is different. Within takes_any_args, the tuple named args has one element - and that element is the list ["x", "y", "z"]. But in takes_a_list, items is the list itself.

This *args idiom gives you some very helpful programming patterns. You can work with arguments as an abstract sequence, while providing a potentially more natural interface for whomever calls the function.

Above, I’ve always named the argument args in the function signature. Writing *args is a well-followed convention, but you can choose a different name - the asterisk is what makes it a variable argument. For instance, this takes paths of several files as arguments:

def read_files(*paths):
    data = ""
    for path in paths:
        with open(path) as handle:
            data += handle.read()
    return data

Most Python programmers use *args unless there is a reason to name it something else.^[6] That reason is usually readability; read_files is a good example. If naming it something other than args makes the code more understandable, do it.

Argument Unpacking

The star modifier works in the other direction too. Intriguingly, you can use it with any function. For example, suppose a library provides this function:

def order_book(title, author, isbn):
    """
    Place an order for a book.
    """
    print("Ordering '{}' by {} ({})".format(
        title, author, isbn))
    # ...

Notice there’s no asterisk. Suppose in another, completely different library, you fetch the book info from this function:

def get_required_textbook(class_id):
    """
    Returns a tuple (title, author, ISBN)
    """
    # ...

Again, no asterisk. Now, one way you can bridge these two functions is to store the tuple result from get_required_textbook, then unpack it element by element:

>>> book_info = get_required_textbook(4242)
>>> order_book(book_info[0], book_info[1], book_info[2])
Ordering 'Writing Great Code' by Randall Hyde (1593270038)

Writing code this way is tedious and error-prone; not ideal.

Fortunately, Python provides a better way. Let’s look at a different function:

def normal_function(a, b, c):
    print("a: {} b: {} c: {}".format(a,b,c))

No trick here - it really is a normal, boring function, taking three arguments. If we have those three arguments as a list or tuple, Python can automatically "unpack" them for us. We just need to pass in that collection, prefixed with an asterisk:

>>> numbers = (7, 5, 3)
>>> normal_function(*numbers)
a: 7 b: 5 c: 3

Again, normal_function is just a regular function. We did not use an asterisk on the def line. But when we call it, we take a tuple called numbers, and pass it in with the asterisk in front. This is then unpacked within the function to the arguments a, b, and c.

There is a duality here. We can use the asterisk syntax both in defining a function, and in calling a function. The syntax looks very similar. But realize they are doing two different things. One is packing arguments into a tuple automatically - called "variable arguments"; the other is un-packing them - called "argument unpacking". Be clear on the distinction between the two in your mind.

Armed with this complete understanding, we can bridge the two book functions in a much better way:

>>> book_info = get_required_textbook(4242)
>>> order_book(*book_info)
Ordering 'Writing Great Code' by Randall Hyde (1593270038)

This is more concise (less tedious to type), and more maintainable. As you get used to the concepts, you’ll find it increasingly natural and easy to use in the code you write.

Variable Keyword Arguments

So far we have just looked at functions with positional arguments - the kind where you declare a function like def foo(a, b):, and then invoke it like foo(7, 2). You know that a=7 and b=2 within the function, because of the order of the arguments. Of course, Python also has keyword arguments:

>>> def get_rental_cars(size, doors=4,
...         transmission='automatic'):
...     template = "Looking for a {}-door {} car with {} transmission...."
...     print(template.format(doors, size, transmission))
...
>>> get_rental_cars("economy", transmission='manual')
Looking for a 4-door economy car with manual transmission....

And remember, Python lets you call any function just using keyword arguments:

>>> def bar(x, y, z):
...     return x + y * z
...
>>> bar(z=2, y=3, x=4)
10

These keyword arguments won’t be captured by the *args idiom. Instead, Python provides a different syntax - using two asterisks instead of one:

def print_kwargs(**kwargs):
    for key, value in kwargs.items():
        print("{} -> {}".format(key, value))

The variable kwargs is a dictionary. (In contrast to args - remember, that was a tuple.) It’s just a regular dict, so we can iterate through its key-value pairs with .items():

>>> print_kwargs(hero="Homer", antihero="Bart",
...     genius="Lisa")
hero -> Homer
antihero -> Bart
genius -> Lisa

The arguments to print_kwargs are key-value pairs. This is regular Python syntax for calling functions; what’s interesting is happening inside the function. There, a variable called kwargs is defined. It’s a Python dictionary, consisting of the key-value pairs passed in when the function was called.

Here’s another example, which has a regular positional argument, followed by arbitrary key-value pairs:

def set_config_defaults(config, **kwargs):
    for key, value in kwargs.items():
        # Do not overwrite existing values.
        if key not in config:
            config[key] = value

This is perfectly valid. You can define a function that takes some normal arguments, followed by zero or more key-value pairs:

>>> config = {"verbosity": 3, "theme": "Blue Steel"}
>>> set_config_defaults(config, bass=11, verbosity=2)
>>> config
{'verbosity': 3, 'theme': 'Blue Steel', 'bass': 11}

Like with *args, naming this variable kwargs is just a strong convention; you can choose a different name if that improves readability.

Keyword Unpacking

Just like with *args, double-star works the other way too. We can take a regular function, and pass it a dictionary using two asterisks:

>>> def normal_function(a, b, c):
...     print("a: {} b: {} c: {}".format(a,b,c))
...
>>> numbers = {"a": 7, "b": 5, "c": 3}
>>> normal_function(**numbers)
a: 7 b: 5 c: 3

Note the keys of the dictionary must match up with how the function was declared. Otherwise you get an error:

>>> bad_numbers = {"a": 7, "b": 5, "z": 3}
>>> normal_function(**bad_numbers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: normal_function() got an unexpected keyword argument 'z'

This is called keyword argument unpacking. It works regardless of whether that function has default values for some of its arguments or not. So long as the value of each argument is specified one way or another, you have valid code:

>>> def another_function(x, y, z=2):
...     print("x: {} y: {} z: {}".format(x,y,z))
...
>>> all_numbers = {"x": 2, "y": 7, "z": 10}
>>> some_numbers = {"x": 2, "y": 7}
>>> missing_numbers = {"x": 2}
>>> another_function(**all_numbers)
x: 2 y: 7 z: 10
>>> another_function(**some_numbers)
x: 2 y: 7 z: 2
>>> another_function(**missing_numbers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: another_function() missing 1 required positional argument: 'y'

Combining Positional and Keyword Arguments

You can combine the syntax to use both positional and keyword arguments. In a function signature, just separate *args and **kwargs by a comma:

>>> def general_function(*args, **kwargs):
...     for arg in args:
...         print(arg)
...     for key, value in kwargs.items():
...         print("{} -> {}".format(key, value))
...
>>> general_function("foo", "bar", x=7, y=33)
foo
bar
y -> 33
x -> 7

This usage - declaring a function like def general_function(*args, **kwargs) - is the most general way to define a function in Python. A function so declared can be called in any way, with any valid combination of keyword and non-keyword arguments - including no arguments.

Similarly, you can call a function using both - and both will be unpacked:

>>> def addup(a, b, c=1, d=2, e=3):
...     return a + b + c + d + e
...
>>> nums = (3, 4)
>>> extras = {"d": 5, "e": 2}
>>> addup(*nums, **extras)
15

There’s one last point to understand, on argument ordering. When you def the function, you specify the arguments in this order:

Named, regular (non-keyword) arguments, then
the *args non-keyword variable arguments, then
the **kwargs keyword variable arguments, and finally
required keyword-only arguments.

You can omit any of these when defining a function. But any that are present must be in this order.

# All these are valid function definitions.
def combined1(a, b, *args): pass
def combined2(x, y, z, **kwargs): pass
def combined3(*args, **kwargs): pass
def combined4(x, *args): pass
def combined5(u, v, w, *args, **kwargs): pass
def combined6(*args, x, y): pass

Violating this order will cause errors:

>>> def bad_combo(**kwargs, *args): pass
  File "<stdin>", line 1
    def bad_combo(**kwargs, *args): pass
                          ^
SyntaxError: invalid syntax

Sometimes you might want to define a function that takes 0 or more positional arguments, and 1 or more required keyword arguments. You can define a function like this with *args followed by regular arguments, forming a special category, called keyword-only arguments. If present, whenever that function is called, all must specified as key-value pairs, after the non-keyword arguments:

>>> def read_data_from_files(*paths, format):
...     """Read and merge data from several files,
...     which are in XML, JSON, or YAML format."""
...     # ...
...
>>> housing_files = ["houses.json", "condos.json"]
>>> housing_data = read_data_from_files(
...     *housing_files, format="json")
>>> commodities_data = read_data_from_files(
        "commodities.xml", format="xml")

See how format's value is specified with a key-value pair. If you try passing it without format= in front, you get an error:

>>> commodities_data = read_data_from_files(
...     "commodities.xml", "xml")
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: read_data_from_files() missing 1 required keyword-only argument: 'format'

Functions As Objects

In Python, functions are ordinary objects - just like an integer, a list, or an instance of a class you create. The implications are profound, letting you do certain very useful things with functions. Leveraging this is one of those secrets separating average Python developers from great ones, because of the extremely powerful abstractions which follow.

Once you get this, it can change the way you write software forever. In fact, these advanced patterns for using functions in Python largely transfer to other languages you will use in the future.

To explain, let’s start by laying out a problematic situation, and how to solve it. Imagine you have a list of strings representing numbers:

nums = ["12", "7", "30", "14", "3"]

Suppose we want to find the biggest integer in this list. The max builtin does not help us:

>>> max(nums)
'7'

This isn’t a bug, of course; since the objects in nums are strings, max compares each element lexicographically.^[7] By that criteria, "7" is greater than "30", for the same reason "g" comes after "ca" alphabetically. Essentially, max is evaluating the element by a different criteria than what we want.

Since max's algorithm is simple, let’s roll our own that compares based on the integer value of the string:

>>> def max_by_int_value(items):
...     # For simplicity, assume len(items) > 0
...     biggest = items[0]
...     for item in items[1:]:
...         if int(item) > int(biggest):
...             biggest = item
...     return biggest
...
>>> max_by_int_value(nums)
'30'

This gives us what we want: it returns the element in the original list which is maximal, as evaluated by our criteria. Now imagine working with different data, where you have different criteria. For example, a list of actual integers:

integers = [3, -2, 7, -1, -20]

Suppose we want to find the number with the greatest absolute value - i.e., distance from zero. That would be -20 here, but standard max won’t do that:

>>> max(integers)
7

Again, let’s roll our own, using the built-in abs function:

>>> def max_by_abs(items):
...     biggest = items[0]
...     for item in items[1:]:
...         if abs(item) > abs(biggest):
...             biggest = item
...     return biggest
...
>>> max_by_abs(integers)
-20

One more example - a list of dictionary objects:

student_joe = {'gpa': 3.7, 'major': 'physics',
               'name': 'Joe Smith'}
student_jane = {'gpa': 3.8, 'major': 'chemistry',
                'name': 'Jane Jones'}
student_zoe = {'gpa': 3.4, 'major': 'literature',
               'name': 'Zoe Fox'}
students = [student_joe, student_jane, student_zoe]

Now, what if we want the record of the student with the highest GPA? Here’s a suitable max function:

>>> def max_by_gpa(items):
...     biggest = items[0]
...     for item in items[1:]:
...         if item["gpa"] > biggest["gpa"]:
...             biggest = item
...     return biggest
...
>>> max_by_gpa(students)
{'name': 'Jane Jones', 'gpa': 3.8, 'major': 'chemistry'}

Just one line of code is different between max_by_int_value, max_by_abs, and max_by_gpa: the comparison line. max_by_int_value says if int(item) > int(biggest); max_by_abs says if abs(item) > abs(biggest); and max_by_gpa compares item["gpa"] to biggest["gpa"]. Other than that, these max functions are identical.

I don’t know about you, but having nearly-identical functions like this drives me nuts. The way out is to realize the comparison is based on a value derived from the element - not the value of the element itself. In other words: each cycle through the for loop, the two elements are not themselves compared. What is compared is some derived, calculated value: int(item), or abs(item), or item["gpa"].

It turns out we can abstract out that calculation, using what we’ll call a key function. A key function is a function that takes exactly one argument - an element in the list. It returns the derived value used in the comparison. In fact, int works like a function, even though it’s technically a type, because int("42") returns 42.^[8] So types and other callables work, as long as we can invoke it like a one-argument function.

This lets us define a very generic max function:

>>> def max_by_key(items, key):
...     biggest = items[0]
...     for item in items[1:]:
...         if key(item) > key(biggest):
...             biggest = item
...     return biggest
...
>>> # Old way:
... max_by_int_value(nums)
'30'
>>> # New way:
... max_by_key(nums, int)
'30'
>>> # Old way:
... max_by_abs(integers)
-20
>>> # New way:
... max_by_key(integers, abs)
-20

Pay attention: you are passing the function object itself - int and abs. You are not invoking the key function in any direct way. In other words, you write int, not int(). This function object is then called as needed by max_by_key, to calculate the derived value:

# key is actually int, abs, etc.
        if key(item) > key(biggest):

For sorting the students by GPA, we need a function extracting the "gpa" key from each student dictionary. There is no built-in function that does this, but we can define our own and pass it in:

>>> # Old way:
... max_by_gpa(students)
{'gpa': 3.8, 'name': 'Jane Jones', 'major': 'chemistry'}

>>> # New way:
... def get_gpa(who):
...     return who["gpa"]
...
>>> max_by_key(students, get_gpa)
{'gpa': 3.8, 'name': 'Jane Jones', 'major': 'chemistry'}

Again, notice get_gpa is a function object, and we are passing that function itself to max_by_key. We never invoke get_gpa directly; max_by_key does that automatically.

You may be realizing now just how powerful this can be. In Python, functions are simply objects - just as much as an integer, or a string, or an instance of a class is an object. You can store functions in variables; pass them as arguments to other functions; and even return them from other function and method calls. This all provides new ways for you to encapsulate and control the behavior of your code.

The Python standard library demonstrates some excellent ways to use such functional patterns. Let’s look at a key (ha!) example.

Key Functions in Python

Earlier, we saw the built-in max doesn’t magically do what we want when sorting a list of numbers-as-strings:

>>> nums = ["12", "7", "30", "14", "3"]
>>> max(nums)
'7'

Again, this isn’t a bug - max just compares elements according to the data type, and "7" > "12" evaluates to True. But it turns out max is customizable. You can pass it a key function!

>>> max(nums, key=int)
'30'

The value of key is a function taking one argument - an element in the list - and returning a value for comparison. But max isn’t the only built-in accepting a key function. min and sorted do as well:

>>> # Default behavior...
... min(nums)
'12'
>>> sorted(nums)
['12', '14', '3', '30', '7']
>>>
>>> # And with a key function:
... min(nums, key=int)
'3'
>>> sorted(nums, key=int)
['3', '7', '12', '14', '30']

Many algorithms can be cleanly expressed using min, max, or sorted, along with an appropriate key function. Sometimes a built-in (like int or abs) will provide what you need, but often you’ll want to create a custom function. Since this is so commonly needed, the operator module provides some helpers. Let’s revisit the example of a list of student records.

>>> student_joe = {'gpa': 3.7, 'major': 'physics',
        'name': 'Joe Smith'}
>>> student_jane = {'gpa': 3.8, 'major': 'chemistry',
        'name': 'Jane Jones'}
>>> student_zoe = {'gpa': 3.4, 'major': 'literature',
        'name': 'Zoe Fox'}
>>> students = [student_joe, student_jane, student_zoe]
>>>
>>> def get_gpa(who):
...     return who["gpa"]
...
>>> sorted(students, key=get_gpa)
[{'gpa': 3.4, 'major': 'literature', 'name': 'Zoe Fox'},
 {'gpa': 3.7, 'major': 'physics', 'name': 'Joe Smith'},
 {'gpa': 3.8, 'major': 'chemistry', 'name': 'Jane Jones'}]

This is effective, and a fine way to solve the problem. Alternatively, the operator module’s itemgetter creates and returns a key function that looks up a named dictionary field:

>>> from operator import itemgetter
>>>
>>> # Sort by GPA...
... sorted(students, key=itemgetter("gpa"))
[{'gpa': 3.4, 'major': 'literature', 'name': 'Zoe Fox'},
 {'gpa': 3.7, 'major': 'physics', 'name': 'Joe Smith'},
 {'gpa': 3.8, 'major': 'chemistry', 'name': 'Jane Jones'}]
>>>
>>> # Now sort by major:
... sorted(students, key=itemgetter("major"))
[{'gpa': 3.8, 'major': 'chemistry', 'name': 'Jane Jones'},
 {'gpa': 3.4, 'major': 'literature', 'name': 'Zoe Fox'},
 {'gpa': 3.7, 'major': 'physics', 'name': 'Joe Smith'}]

Notice itemgetter is a function that creates and returns a function - itself a good example of how to work with function objects. In other words, the following two key functions are completely equivalent:

# What we did above:
def get_gpa(who):
    return who["gpa"]

# Using itemgetter instead:
from operator import itemgetter
get_gpa = itemgetter("gpa")

This is how you use itemgetter when the sequence elements are dictionaries. It also works when the elements are tuples or lists - just pass a number index instead:

>>> # Same data, but as a list of tuples.
... student_rows = [
...      ("Joe Smith", "physics", 3.7),
...      ("Jane Jones", "chemistry", 3.8),
...      ("Zoe Fox", "literature", 3.4),
...      ]
>>>
>>> # GPA is the 3rd item in the tuple, i.e. index 2.
... # Highest GPA:
... max(student_rows, key=itemgetter(2))
('Jane Jones', 'chemistry', 3.8)
>>>
>>> # Sort by major:
... sorted(student_rows, key=itemgetter(1))
[('Jane Jones', 'chemistry', 3.8),
 ('Zoe Fox', 'literature', 3.4),
 ('Joe Smith', 'physics', 3.7)]

operator also provides attrgetter, for keying off an attribute of the element, and methodcaller for keying off a method’s return value - useful when the sequence elements are instances of your own class:

>>> class Student:
...     def __init__(self, name, major, gpa):
...         self.name = name
...         self.major = major
...         self.gpa = gpa
...     def __repr__(self):
...         return "{}: {}".format(self.name, self.gpa)
...
>>> student_objs = [
...      Student("Joe Smith", "physics", 3.7),
...      Student("Jane Jones", "chemistry", 3.8),
...      Student("Zoe Fox", "literature", 3.4),
...      ]
>>> from operator import attrgetter
>>> sorted(student_objs, key=attrgetter("gpa"))
[Zoe Fox: 3.4, Joe Smith: 3.7, Jane Jones: 3.8]

^[6]This seems to be deeply ingrained; once I abbreviated it *a, only to have my code reviewer demand I change it to *args. They wouldn’t approve it until I changed it, so I did.

^[7]Meaning, alphabetically, but generalizing beyond the letters of the alphabet.

^[8]Python uses the word callable to describe something that can be invoked like a function. This can be an actual function, a type or class name, or an object defining the __call__ magic method. Key functions are frequently actual functions, but can be any callable.

Next Chapter: Decorators

Previous Chapter: Creating Collections with Comprehensions