Pandas' (and Python's) powerful abstractions

Pandas' (and Python's) powerful abstractions

Here's a function I wrote:

  1. def load_df(path, rename_map, extra=None):
  2. df = pd.read_csv(path)
  3. df.rename(columns = rename_map, inplace=True)
  4. df = filter_test_records(df) # common filters for all dataframes
  5. if extra:
  6. df = extra(df)
  7. return df[ list(rename_map.values()) ]

This is pulled from real production code. "df" is a Pandas dataframe. And this function, load_df(), is a "building block" function. It's used to load data from a variety of different CSV files, and process them into dataframes that are useful in my program.

Notice the "extra" argument. From what you see in this fragment, what does "extra" do?

More precise questions:

1) What's the TYPE of extra?

Answer: It's a function.

2) What does it return?

Answer: It returns a dataframe.

3) What does it let you do?

Answer: Anything you want with the input dataframe. You can do any kind of modification, transformation, filtering, or even constructing a totally different dataframe.

In other words... extra() is a customization hook. When you're writing a function that will load a dataframe from a CSV file, and you want to use load_df()... but you also need to do some extra, specific customization... you can just write your own extra function to do that.

Here's an example, pulled from the same codebase:

  1. def load_survey_df(path):
  2. rename_map = {
  3. 'Your Email' : 'email',
  4. 'Date Updated' : 'date_survey_completed',
  5. 'Name' : 'survey_firstname',
  6. 'Last' : 'survey_lastname',
  7. }
  8. def extra_filter(df):
  9. return df[df['Completion Status'] == 1]
  10. return load_df(path, rename_map, extra = extra_filter)

See how you're creating a function, extra_filter(), and passing it in as an argument to load_df(). You don't CALL extra_filter() yourself; load_df() does.

This is a good example of one of Python's most powerful and important abstractions:

Function objects.

And it's realistic. Literally - the above is copied verbatim from production code I wrote, solving a real business problem.

Which makes now a good time to think about how you can use this idea... in your own code, today.

How can you make your own code more expressive, more powerful, more impressive, with customization hooks like this?

(That is the kind of thing you learn to do in Powerful Python Bootcamp.)

P.S. Another, more complex example:

  1. def load_scheduling_df(path):
  2. rename_map = {
  3. 'Customer email' : 'email',
  4. 'Customer name' : 'scheduled_fullname',
  5. "Meeting date and time in Owner's time zone" : 'date_scheduled',
  6. 'Web conferencing link' : 'webinar_id',
  7. }
  8. def extra_filter(df):
  9. def extract_webinar_id(link):
  10. # convert "https://foo.com/webinar/12345" to "12345"
  11. m = re.search(r'/(\d+)$', link)
  12. assert m is not None, link
  13. return m.group(1)
  14. df = df[df['Status'] == 'Completed']
  15. df = df.assign(webinar_id = lambda x: x.webinar_id.apply(extract_webinar_id))
  16. return df
  17. return load_df(path, rename_map, extra = extra_filter)

Do you see how it fits the same general pattern? Defining an "extra" function, and pass it to load_df().



Bootcamp