I always keep in mind that in software development it is very important to keep each business concept in one place only. Code duplication is a problem that we should try to avoid or at least restrict and systematically remove when needed.
But, sometimes, we blindly follow the DRY principle, without having in mind that each decision has a cost. In this post, I will expose some points that can help us decide when should we remove the duplicates or when should we live with them.
- We should differentiate between duplication of business concepts, rules, validations, flow, etc., and duplication at the implementation level (for example source code with little duplication).
- The duplication of business concepts and business definition should be avoided as much as possible.
- Sometimes duplication at the code level can be a hint, meaning that there is an abstraction waiting to be discovered, an emergent behavior of the system or a pattern that is common in our application. It is important to minimize, but it is not a drama if there is "some" amount of duplication. But be very careful to not generate a premature abstraction. In my experience, premature abstractions are much worse than duplication.
- If you develop using TDD it is better to duplicate code to reach green, and, once in green and well covered by the tests, refactor to eliminate the duplicates.
- Depending on the language (C++, Python, Java...) there are some kinds of duplication that either have a high cost of elimination or that don't have an idiomatic solution. In these cases, we must eliminate duplication only when the result is easier to understand (not only for you, but for the whole team). In the face of doubt, readability must always prevail.
- It is important to know that when we eliminate duplication we usually create a common class, a library, a method or any other artifact that allows us to reference / use it from several parts of our code. This is a dependency between the client code and the code to be reused. Dependency is one of the strongest relationships between code and always comes with an important cost. We should always keep this in mind when evaluating whether we should eliminate duplication or not.
- We should always depend on artifacts that are more stable than ourselves. That is: if we extract common code to a library, but the API of this library changes all the time, it means that we have created a bad/wrong abstraction and that the maintaining cost will increase a lot.
- It is important to not have duplication of business concepts (low-level duplication is less important).
- We should always evaluate the danger of creating a premature abstraction.
- We should evaluate the trade-offs between the cost of adding a new dependency vs. the improvements in the maintainability cost derived from removing the duplications.
- If readability is harmed by eliminating duplication, we are doing it wrong.
- Is better to follow a process of use, use, reuse, and later create the abstraction, instead of directly creating the abstraction.
If you can't afford to wait before creating an abstraction, or if you can't eliminate/modify it once you detected that it is not the correct one, you are generating more complexity than the one generated by leaving a small duplication.