Thursday, January 07, 2021

Small batches for the win / Continuous Delivery

In software product development the batch size deeply affects flow efficiency and resource utilization. In this post, we will try to explain the underlying principles that make a good strategy to use small batches when we are developing software.

Let’s start with some basic concepts:

  • Batch: A group of items that move together to the next step in the process. In this case, is a group of changes that are deployed to production at the same time.
  • Item/change: Each one of the individual units of work composes a Batch. In our case any kind of change that affects our system, including new features, code improvement, configuration changes, bug fixes, and experiments.
  • Holding cost: The sum of the cost associated with delaying the deployment of each one of the items. In summary, the cost of delaying the feedback or the value delivered by each item. For example the cost of delay of a new feature or the cost associated with not putting in production a bugfix, etc.
  • Transaction cost: the cost associated with the deployment. The cost of executing a deployment (cost of people, cost of infrastructure, etc).


Batch size / Direct cost

If we only take into account the transaction cost the optimal solution is to make huge batches, as we only pay the transaction cost when we deploy. For example, deploying once a year.

If we only take into account the holding cost, the optimal solution is to use batches with only one item, to avoid delaying any kind of holding cost.

The reality is that if we try to optimize the two variables at the same time we have a U-Curve optimization problem.


U-curve graph




We can see that from the optimal batch size the total cost only grows and that before the optimal batch size we have a penalty due to our transaction cost. So a good strategy is always to minimize the batch size until the transaction cost makes smaller batches inefficient.

Batch size / Risk management


When we develop software each component is coupled with other components of the system. That is, that any part of the code has relations with other parts, such as static or runtime dependencies, common messages, data model dependencies, etc. If we invest in  good internal quality, we will minimize the unneeded coupling, but in the worse scenario, each part of the system can be potentially coupled with another part.

When we create a batch with a size of N changes to be deployed at the same time the potential interactions that can happen are:

  • Each change with the current version of the system. N
  • Each change with the rest of the changes in the batch. That is all the 1-to-1 relations between each change. (N*(N-1))/2


Potential Interactions (I)
Batch size (N)

I = N + (N*(N-1))/2

Whereas this formula describes the number of potential interactions, in general not all of those combinations are possible.

Iterations graph




The basic problem with the batch size for software development is that the universe of potential interactions (I) grows very fast. In fact, it is a quadratic function.

We can quickly conclude that the following problems grow depending on the size of the universe (quadratic):

  • The probability of an error or a negative impact (functional, performance, cost, etc).
  • The cost of detecting/troubleshooting an error.


At the same time we can see that the size of the batch (the number of changes) affects (linearly) the following:

  • The number of teams/people required to coordinate/synchronize the deployment (code freeze, communication, testing, etc).
  • The possibility of having a change difficult to revert (data model changes, high volume migrations, etc).

 
Let's illustrate how fast the problem grows with an example
If we have an error in 1 of 100 interactions we can see how fast grows the possibility of having an error:



Probability error. Batch size 5

Probability error. Batch size 25

Probability error. Batch size 50

With 25 changes, we already have 88% chance of having at least an error, and with 50 is near sure (99%). And we previously saw that these errors are more difficult to diagnose, and have more possibilities of not being easy to revert.

So clearly, for software development, increasing the size of the deployments (batch size) greatly increases (much more than linearly) the risk associated (risk of having an outage, loose availability, and frustrating our customers).

Batch size / Indirect cost and consequences

In the previous sections, we have seen the direct impact of the batch size on a product development process;
  • An increase in the total cost for a batch size greater than the optimal size.
  • A near quadratic growth of the risk associated with a deployment. 
  • An increasing number of production outages.

In addition to those direct costs, these are other indirect costs and consequences when batch size is large:
  • Lots of multitasking and the corresponding productivity and focus lost. The normal flow of work is frequently interrupted with problems that come from previous deployments.
  • A tendency to disconnect from the operation and impact of our changes in production. i.e.: when deploying something that you did several weeks ago.
  • Low psychological safety, because of the amount of risk and the probability of outages associated with the way of working.
  • Worse product decisions because there are fewer options to get fast feedback or to design new experiments to get more feedback.
  • Lack of ownership is a derived consequence of the previous points.

Conclusions

As we have shown, the size of the batch has many effects:
  • Risk and outage probability are proportional (and worse than linear) to the number of changes included in a deployment.
  • Our batch size should be as small as our transaction cost (deployment cost) allow us.
  • Large batches generate important indirect costs and consequences (lack of ownership, multitasking, low psychological safety, etc).

Consequently, we should invest as much as possible to try to work in small batches. If we detect that we already reached the optimal batch size for our current transaction cost but we are still having the same problems that we had before, perhaps the next step is to invest heavily in lowering the transaction cost (deployment automation, independent deployments per team, deployment time, etc).

In short:

Small batches -> faster feedback
Small batches -> Customer Value sooner
Small batches -> Reduce direct cost
Small batches -> Reduce deployment risk
Small batches -> Reduce errors
Small batches -> Reduce mean time to recover
Small batches -> Improve psychological safety


Small batches for the win!

The good news is that there is already an engineering capability focused on working in small batches. Continuous Delivery!

"The goal of continuous delivery is to make it safe and economic to work in small batches. This in turn leads to shorter lead times, higher quality, and lower costs." 

The importance of working in small batches has been validated statistically in the studies conducted by DevOps Research and Assessment (DORA) since 2014. You can see the details of these studies in the book Accelerate (Nicole Forsgren, Jez Humble, Gene Kim).


References:

Saturday, January 02, 2021

Books I've read lately 2020




These are the books that I read lately:



Related:


Thursday, December 24, 2020

Good talks/podcasts (Dec 2020 II)

 


 These are the best podcast/talks I've seen/listen to recently:

  • AgileByExample 2017: Donald Reinertsen - Making Money with Variability (Donald Reinertsen) [Lean Product Management, Product, Product Strategy] Interesting talk where Donald explains why, unlike in Lean Manufacturing, we can exploit the variability to innovate and achieve better outcomes in product development.
  • Beyond Features: rethinking agile software delivery (Dan North) [Agile, Inspirational, Lean, Lean Software Development] (⭐⭐⭐⭐⭐) Maybe we've been thinking about delivery all wrong. Maybe features aren’t the point after all. Maybe there are other kinds of work that we should recognise, schedule and track as first class citizens. Maybe this could take some of the uncertainty out of the delivery process, and give us back our sanity. Maybe.
  • GOTO 2020 • When To Use Microservices (And When Not To!) (Sam Newman, Martin Fowler) [Architecture, Architecture patterns, Evolutionary Design, Microservices] Interesting conversation about tradeoffs to be considered for using a microservices architecture.
  • SLO TheoryWhy the Business needs SLOs (Danyel Fisher, Nathen Harvey) [Observability, Operations, Technical Practices] (⭐⭐⭐⭐⭐) Great explanation about SLI, SLOs, error budgets and how to introduce them to improve our production operations.
Reminder, All these talks are interesting even just listening to them, without seeing them.

Related: 

Saturday, December 19, 2020

Good talks/podcasts (Dec 2020 I)


 

 These are the best podcast/talks I've seen/listen to recently:

  • Cadence: Uber’s Workflow Engine with Maxim Fateev (Maxim Fateev) [Architecture, Scalability] Interesting podcast about cadence, an scalable workflow engine.
  • O11ycast - Ep. #12, Speed of Deployment with Rich Archbold of Intercom (Rich Archbold) [Continuous Delivery, Devops, Engineering Culture, Platform, Technical Practices] In episode 12 of O11ycast, Charity Majors and Liz Fong-Jones speak with Rich Archbold of Intercom. They discuss the crucial importance of timely shipping, high-cardinality metrics, and the engineering value of running less software.
  • The Art of Modern Ops · Camille Fournier on Building Internal Kubernetes Platforms (Camille Fournier) [Engineering Culture, Platform, Platform as a product] In this latest episode of the “Art of Modern Ops” Camille Fournier (@skamille), Managing Director at Two Sigma and Cornelia Davis (@cdavisafc), CTO at Weaveworks discuss what it takes to build an internal platform within your organization.
  • Continuous Delivery and Data Management (Dave Farley) [Continuous Delivery, Technical Practices, XP] In this episode Dave Farley describes the basics of a Continuous Delivery, DevOps, approach for Data. How do we apply the software engineering disciplines of CD to Data? How do we configuration manage, deploy, migrate and test the data related aspects of our systems?
  • Continuous Delivery simply explained (Dave Farley) [Continuous Delivery, Technical Practices, XP] (⭐⭐⭐⭐⭐) In this episode see Continuous Delivery explained by Dave Farley as he explores the fundamentals in a way that helps us to understand the real value in this advanced approach to software development, this engineering-for-software.
  • Continuous Deployment or Continuous Delivery? | When To Release (Dave Farley) [Continuous Delivery, Technical Practices, XP] In this episode, Dave Farley explores the differences and helps you to decide which approach is best for you - continuous delivery vs continuous deployment. 

 

Reminder, All these talks are interesting even just listening to them, without seeing them.

Related: 

Monday, November 09, 2020

Good talks/podcasts (Nov 2020 I)


 

 These are the best podcast/talks I've seen/listen to recently:

  • Enterprise Architecture = Architecting the Enterprise? (Gregor Hohpe) [Architecture, Architecture patterns, Engineering Culture] (⭐⭐⭐⭐⭐) This session takes a serious but light-hearted look at the role of enterprise architects in modern IT organizations.
  • You Must Be CRAZY To Do Pair Programming (Dave Farley) [Agile, Technical Practices, XP] (⭐⭐⭐⭐⭐) One of the best descriptions I have heard of the usefulness of this practice. Dave provides pair programming examples, describes some pair programming best practices, and challenges some thinking about pair programming patterns and anti-patterns.
  • The GIST Framework (Itamar Gilad) [Lean Product Management, Product, Product Discovery, Product Strategy] In this talk from #MTP Engage Manchester consultant Itamar Gilad takes us through his GIST (goals, ideas, steps, tasks) framework.
  • Talking Serverless #27 - Gojko Adzic Partner at Neuri Consulting (Gojko Adzic) [Architecture, Microservices, Serverless]
  • Reboot Your Team (Christina Wodtke) [Engineering Culture, Product, Product Team, Teams] (⭐⭐⭐⭐⭐) Christina told us how to reboot the team you have, or build a healthy one from the ground up. 

  • The Product-Led Journey (John Cutler) [Lean Product Management, Product, Product Discovery, Product Strategy] Interesting insights into the changes needed to become a product led company.
  • Scale, Microservices and Flow (James Lewis) [Agile, Architecture, Engineering Culture, Microservices, Teams] Interesting presentation explaining the relationship between high performance teams, flow, complex adaptive systems, and the organization of teams and how it affects the organization scalability. 

 

 

Reminder, All these talks are interesting even just listening to them, without seeing them.

Related: 

Sunday, November 08, 2020

Small Safe Steps workshop

 


 

If your team wants to have better strategies to manage large and risky changes, improve their slicing skills, or practice techniques as parallel changes, branch by abstraction, look at this Small Safe Steps workshop.

I have prepared information so that anyone can easily facilitate the workshop.

Please use the material, run the workshop, give me improvement feedback.  And if you need advice on running or adapting it, contact me, and I will be happy to help.

 

Small Safe Step workshop

 

Additional references:

Recommended readings: