The Wrong Abstraction#
Overview#
Sandi Metz is a well-known Ruby developer and author who has discussed the dangers of premature abstraction in software development.1 During her talk at RailsConf 2014, she recommended that developers tolerate duplication until they have a better understanding of the problem they are trying to solve. Later they can refactor the code to remove the duplication. This advice is very similar to write code that is easy to delete, not easy to extend.
How did this idea come about?#
When Sandi initially gave the talk, she wasn't aiming to focus solely on the "wrong abstraction". Instead, it was a talk about all sorts of small things that you can do to make your code better. However, there was such a strong reaction to the "wrong abstraction" component that she decided to follow up with a blog post on the topic.
What causes the "wrong abstraction"?#
Sandi gives a few examples of what cause the "wrong abstraction": when you are following the DRY principle too rigidly,2 when you get anchored by existing code,3 or you do not want the code to go to waste.4
DRY Principle#
Most developers are taught the DRY principle at some point (usually early) in their career. This principle is not bad in itself, but it can lead to the "wrong abstraction". It can be tempting to abstract something away when you see duplication in your code, however, the premature optimization can lead to a local minima that doesn't actually help you in the long run.
Anchoring Effect#
Existing code has a strong influence on any new code that you write. You may see a pattern or abstraction that is almost right for your new code. It is easy to take that existing code, add a parameter to fit their new use case, and conditionally branch based on that parameter. This process can repeat itself until the abstraction is so complex that it is no longer useful.
Sunk Cost Fallacy#
When you look at existing code, you can be temped to think, "I can't throw this away, it has so much work put into it already." This can lead you to try to fit your new code into the existing abstraction, even if it doesn't quite fit. This is known as the sunk cost fallacy.4
How do you avoid the "wrong abstraction"?#
Tolerate Duplication#
If you haven't yet found the right abstraction, but you see a bit of duplication in your code, you should just be willing to tolerate it for a little bit. This will give you a little bit more time to understand the problem better and find the right abstraction.
DUPE
Tag#
If you see duplication in your code, you can add a DUPE
tag, in a similar way to how you would add a TODO
tag. You can even put a little unique identifier to tie the multiple pieces of duplication together. This way you will not lose track of the duplication and can come back to it at a better time.
The Fastest Way Forward is Back#
If you have already made the "wrong abstraction", the fastest way forward is undo that abstraction. This means:
- Re-introduce the abstraction in-line at every place that it was called from
- For each of those re-introduced abstractions:
- Determine the subset of code that is actually used for that call, via the parameters that are passed in
- Remove the rest of the abstraction that is not used
- Add
DUPE
tags as necessary
Criticisms#
I do not agree with all of the criticisms of this piece, but some have merit. I am including the most constructive criticisms below.
What does the "wrong abstraction" even mean?#
A lot of responses on HN were very positive and agreed with Sandi's advice overall. Most complaints were that she didn't go far enough in explaining what the "wrong abstraction" is.
Rule of Three#
sbov on HN:
You should have at least 3 instances before creating an abstraction to reduce your code5
This was repeated by many others as well,678910 which is commonly referred to as the Rule of Three.11
nbardy goes further and advocates for a rule of 5 or 6.12 They argue that one is focused on development, and the context switch of stopping to think about an abstraction at 3 instances is costly. Instead, you should push through development, and then come back at the end to clean up the duplicated code.
While I don't want to set a strict rule at 5 or 6, the general idea of pushing through development and coming back to think about the abstraction later is a good one. In Sandi's original talk, she was in the midst of a refactor at the time when she noticed duplication, and she said not to get side-tracked by the abstraction. So like Sandi, mark the duplication with a tag, and come back to it later.
Rapidly Changing Code#
wellpast on HN:
When we compose systems, we are looking for...stable dependencies. What I think the writer means by "the wrong abstraction" is a "volatile dependency".13
This general idea is a good one, and lines up nicely with the greater context of the talk. Sandi specifically didn't want to abstract the code mid-refactor, and you could equate this to a volatile dependency. The code is changing rapidly, and you are not yet sure what the right abstraction is. wellpast was not the only one to bring up this point of holding off on abstracting volatile code.6
Abstractions are Dependencies#
wellpast on HN:
The real offense when we factor duplicated code is the new dependency that is added to the system... Every dependency you bring into your code architecture costs you and should be judiciously introduced. De-duplication of code alone is rarely a strong enough reason to add a dependency.13
While I agree with the general idea that abstractions are dependencies, and dependencies should be judiciously introduced, the idea that an abstraction is a dependency feels like a distraction in this context. De-duplication of code alone is rarely a strong enough reason to add an abstraction would be a better way to phrase this, so the focus is on the reasons for the abstraction, rather than understanding how abstractions are dependencies.
Balance of Abstraction#
pkolaczk on HN:
I'm afraid this thinking can be used as an excuse for avoiding any attempts at finding proper abstractions...Insufficient abstraction leads to increased complexity, not just duplication.14
This is a valid concern, there should be a balance between never duplicate code and be scared you have the "wrong abstraction". Some heuristics to help with this balance can be found in what does the "wrong abstraction" mean.
Not enough nuance#
Jason Swett has a blog post on the nuances of duplication, which critiques The Wrong Abstraction and other similar ideas. He argues that the "wrong abstraction" is too simplistic, and therefore is not useful advice.
He poses that duplication is not simply duplicate code, but rather duplicate behavior. And that duplication has levels of danger, based on 3 factors: how discoverable the duplication is, the overhead required to maintain the duplication, and how volatile / trafficked the code is.
I am not sure if this is even really a disagreement with Sandi's post. They are just different lenses to view the same problem: don't abstract for the sake of abstraction, make sure the abstraction has purpose.
Note: Jason has a shorter post that directly responds to Sandi's post, but its a bit click-baity, so I would stick with the longer post.15