Emergent abstractions
Here's an interesting class, from a program I wrote:
- from datetime import date, timedelta
-
- class DateInterval:
- BEGINNING_OF_TIME = date(1, 1, 1)
- END_OF_TIME = date(5_000, 1, 1)
-
- def __init__(self, start, end):
- if start is None:
- start = self.BEGINNING_OF_TIME
- if end is None:
- end = self.END_OF_TIME
- assert start <= end, (start, end)
- self.start = start
- self.end = end
-
- @classmethod
- def all(cls):
- return cls(cls.BEGINNING_OF_TIME, cls.END_OF_TIME)
-
- @classmethod
- def from_args(cls, args):
- return cls(args.start_date, args.end_date)
-
- def __iter__(self):
- when = self.start
- while when <= self.end:
- yield when
- when += timedelta(days=1)
-
- def __contains__(self, when):
- return when >= self.start and when <= self.end
Use it like:
- >>> interval = DateInterval(date(2050, 1, 1), date(2059, 12, 31))
- >>> some_day = date(2050, 5, 3)
- >>> another_day = date(2060, 1, 1)
- >>>
- >>> some_day in interval
- True
- >>> another_day in interval
- False
I've copy-pasted this code, unedited, with all its perfect imperfections. If you think some part could be done better, you're probably right. Toy code can be perfect; realistically complex production code, in my experience, NEVER is. You just eventually get it "good enough", then move on to one of the other dozen tasks you were supposed to get done by yesterday.
Anyways:
When designing that collection of classes that will make up a software system...
(And you data scientists, pay close attention. Because I'm revealing a MASSIVE secret to how you become a "type B" data scientist...)
When you're designing all this, you typically come up with a list of classes representing certain abstractions in the problem space.
By "abstraction", I mean defining a concept precisely enough that it can be represented by (a) one or more chunks of data, and (b) a collection of functions (methods) that operate on that data.
Some of these abstractions are concrete nouns. If you wrote code for an online shopping website, you may have classes named
- Customer
- Product
- Coupon
- ShoppingCart
And so on. Or your Python role-playing game may have classes for Player, HealingPotion, Goblin (which inherits from Monster), Sword (which inherits from Weapon), et cetera.
Notice these classes are all something you can visualize. Each is something you can at least imagine to be real, that you could pick up, move around, put in a wheelbarrow.
But other abstractions are, well, abstract. Like DateInterval. Have you ever held a DateInterval in your hand? Could you put THAT in a wheelbarrow? No way. It's a pure abstraction, an idea, that only exists and only makes sense inside the ethereal context of a running program.
I find that in real software, many of my most useful classes are non-tangible in this same way.
And perhaps because of that, I sometimes don't imagine them at first. Instead, they EMERGE.
That's what happened with DateInterval. Originally I didn't have it in my code. But at some point, I had a more or less working program, that did 80% of everything it was supposed to do. It wasn't done, but it was starting to get close.
And as I thought about how to add the next feature, I realized there were a lot of methods taking "start" and "end" date arguments, scattered around many different classes. And many of them needed to check whether a date was in a certain range, defaulting to certain behaviors if one or both of those boundaries were omitted.
So I asked myself: how could I simplify the code? The code that already exists, as well as the remaining code I know I'm going to write?
And in that moment, between my ears, DateInterval popped into being.
This is what I mean by "emergent abstraction". The abstraction, DateInterval, wasn't part of any bottom-up design of the system. It wasn't something I realized was needed early on in the process. The need for it emerged.
(In this case, it emerged as I was coding the application. But there's no reason it could not have emerged during the design phase, had I chosen to be detailed enough there. The point is that it emerged as the system became more completely specified.)
Now, another question: what made it POSSIBLE for me to come up with DateInterval?
To recognize the situation where it would help, then actually write it to behave the way it's supposed to?
DateInterval is not "hello world" level stuff. It uses class methods, the iterator protocol, generator functions, magic methods, and a couple of important subtle design tradeoffs that aren't obvious until you stare really hard.
In most courses, books, etc., you learn about features of Python in isolation.
But real code isn't like that. In real code, you're ALWAYS using MANY language features, interlocked together, all the time. Like DateInterval.
And there's a level of mastery of Python that lets you SEE a DateInterval-shaped hole in your code, and then suddenly, magically, know how to code the perfect piece to fill that hole...
And you do it again, and again, and again. Until you end up with a program that seems so beautiful and amazing, you can hardly believe it came out of you.
Do you want to be the kind who can perceive emergent abstractions? To be in the top 1% of all Python developers in the world?
There are many paths to get there. Some are faster than others.
Your homework, if you choose:
Look over the code you wrote in the past week. Or the code that you're writing today.
And pay attention to anything that seems repetitive. Especially when you look just beneath the code, if you catch my meaning...
And ask yourself:
"What emergent abstractions can I see? What Something-shaped hole can I perceive, when I look at the code, in my mind's eye?"
Because when you truly SEE your code, you don't see it with your eyes. You see it with your mind.
In that space where you can gaze upon it inside of you, where the code REALLY lives.
Let me know what you find there.
P.S. As I said, DateInterval
above is far from
perfect. After writing this article, I went back, with the sharper
vision of hindsight, to see how to improve it. Here's the more
polished version:
- from datetime import (
- date,
- MINYEAR,
- MAXYEAR,
- timedelta,
- )
-
- class DateInterval:
- BEGINNING_OF_TIME = date(MINYEAR, 1, 1)
- END_OF_TIME = date(MAXYEAR, 12, 31)
-
- def __init__(self, start=None, end=None):
- if start is None:
- start = self.BEGINNING_OF_TIME
- if end is None:
- end = self.END_OF_TIME
- if start > end:
- raise ValueError(f"Start date {start} must not be after end date {end}")
- self.start = start
- self.end = end
-
- @classmethod
- def all(cls):
- return cls(cls.BEGINNING_OF_TIME, cls.END_OF_TIME)
-
- def __contains__(self, when):
- return self.start <= when <= self.end
-
- def __iter__(self):
- num_days = 1 + (self.end - self.start).days
- for offset in range(num_days):
- yield self.start + timedelta(days=offset)
Do you agree with all the changes? What else do you see to improve? Let me know.