Monday, February 27, 2017

Applying the DRY principle

Post previously published in Spanish Aplicación del principio DRY

I always keep in mind that in software development it is very important to keep each business concept in one place only. Code duplication is a problem that we should try to avoid or at least restrict and systematically remove when needed.

But, sometimes, we blindly follow the DRY principle, without having in mind that each decision has a cost.  In this post, I will expose some points that can help us decide when should we remove the duplicates or when should we live with them.
  • We should differentiate between duplication of business concepts, rules, validations, flow, etc., and duplication at the implementation level (for example source code with little duplication). 
  • The duplication of business concepts and business definition should be avoided as much as possible.
  • Sometimes duplication at the code level can be a hint, meaning that there is an abstraction waiting to be discovered, an emergent behavior of the system or a pattern that is common in our application. It is important to minimize, but it is not a drama if there is "some" amount of duplication. But be very careful to not generate a premature abstraction. In my experience, premature abstractions are much worse than duplication.
  • If you develop using TDD it is better to duplicate code to reach green, and, once in green and well covered by the tests, refactor to eliminate the duplicates.
  • Depending on the language (C++, Python, Java...) there are some kinds of duplication that either have a high cost of elimination or that don't have an idiomatic solution. In these cases, we must eliminate duplication only when the result is easier to understand (not only for you, but for the whole team). In the face of doubt, readability must always prevail.
  • It is important to know that when we eliminate duplication we usually create a common class, a library, a method or any other artifact that allows us to reference / use it from several parts of our code. This is a dependency between the client code and the code to be reused. Dependency is one of the strongest relationships between code and always comes with an important cost. We should always keep this in mind when evaluating whether we should eliminate duplication or not.
  • We should always depend on artifacts that are more stable than ourselves. That is: if we extract common code to a library, but the API of this library changes all the time, it means that we have created a bad/wrong abstraction and that the maintaining cost will increase a lot.

In summary:

  • It is important to not have duplication of business concepts (low-level duplication is less important).
  • We should always evaluate the danger of creating a premature abstraction.
  • We should evaluate the trade-offs between the cost of adding a new dependency vs. the improvements in the maintainability cost derived from removing the duplications.
  • If readability is harmed by eliminating duplication, we are doing it wrong.
  • Is better to follow a process of use, use, reuse, and later create the abstraction, instead of directly creating the abstraction.

If you can't afford to wait before creating an abstraction, or if you can't eliminate/modify it once you detected that it is not the correct one, you are generating more complexity than the one generated by leaving a small duplication.

Although the DRY principle may seem simple to understand, its application, as it always happens in software development, is not simple or systematic.


Freya said...

Hello Eduardo,
I always love to read your blog and your rules and principles of creating software always inspires me to do things in better way. I really enjoyed reading this Applying the DRY principle. Keep it up.

Freya, UK

Fran Reyes said...

"Depending on the language (C++, Python, Java...) there are some kinds of duplication.." Do you have any example? Thanks :)

eferro said...

Hi Fran
I was talking about different idioms used in some languages... For example, if you are using a C++, iterating over a collection and doing several operations is expected in a for a loop. And usually you don't try to abstract this iteration, is a common idiom... but for the same chunk of code implemented in clojure, the usual implementation try to avoid the explicit iteration (using filters, maps, reduce, etc) or abstract it using high order functions.

Similar thing for Go error handling... is repetitive... but is an accepted duplication and trying to remove it will confuse common Go developers. The same for Go not having generics and the amount of boilerplate needed...

So, look for C++, Java, Go idioms and a lot of them include boilerplate code. But is "expected" boilerplate and sometimes, try to remove this boilerplate is counterproductive.

I hope is more clear now... :)