Sunday, May 15, 2022

Good talks/podcasts (May 2022 I)


These are the best podcast/talks I've seen/listen to recently:
  • Full Cycle Developers at Netflix (Greg Burrell) [Architecture, Devops, Operations] [Duration: 0:48:00] (⭐⭐⭐⭐⭐) This talk presents Netflix' journey from siloed teams to our Full Cycle Developer model for building and operating our services at Netflix. Greg discusses the various approaches they’ve tried, the motivations that pushed them to keep evolving, and the lessons learned along the way.
  • Building DevX Teams, my story (Cirpo Cinelli) [Developer Productivity, Devex, Teams] [Duration: 0:42:00] (⭐⭐⭐⭐⭐) In this presentation, Cirpo talks about his past 4 years of experience setting up a DevX team from scratch, the main challenges, the pain, the gain, and the lessons learned.
  • How To Avoid Big Upfront Design (Dave Farley) [Architecture, Continuous Delivery, Evolutionary Design] [Duration: 0:18:00] In this episode, Dave Farley, describes how to avoid big up-front design, but also how to do enough design to make progress safely and allow us the freedom to learn and adapt our designs as we learn.
  • The Root Causes of Product Failure (Marty Cagan) [Product, Product Strategy, Product Team] [Duration: 0:49:00] Interesting talk on how the best companies and product teams work.
  • TDD Discipline - Thinking Ahead Without Coding Ahead (Jason Gorman) [Technical Practices, XP, tdd] [Duration: 0:11:00] Jason Gorman demonstrates a very common mistake TDD newbies make - writing the code you \*believe\* is required, instead of just the code needed to pass your tests.
Reminder, All these talks are interesting even just listening to them.


Monday, April 18, 2022

System: Control its evolution / or be its slave

 When we develop software (digital products), we usually deal with complex systems. We are not prepared (by definition) for emergent behaviors of these types of systems. As a result, it is hard to predict its future evolution and very easy to lose control over it over time.

If we don't control complexity, it will become increasingly difficult to adapt the system to what we need, and we will move from developing and evolving the system to spending more time reacting to it as it evolves. We will become slaves to the system rather than controlling its evolution. We will be spending our time working on production incidents, scalability improvements under pressure, and being in a firefighting mode instead of adding value to the product in a sustainable and continuous manner. Needless to say, the business impact is terrible (poor efficiency, unhappy customers, difficulty in defining strategy, etc).

In my experience, we can keep the system under control by using appropriate tactics.

Tactics to control system evolution

I have used the following tactics to control the evolution of systems I have worked on:

  • Good basic hygiene: automated tests, minimum observability, product instrumentation, etc.
  • Simple design based on current needs: Using incremental design and lean product development we can develop simple solutions that fulfill our current needs but allow us to evolve the system in the future.
  • Evolutionary architectures: Invest in an architecture that can evolve and that allows us to postpone decisions (reversible decisions, fitness functions, observability, etc.).
  • Acceptance criteria: Make the acceptance criteria explicit, so that we all know what we consider sufficient hygiene (testing, observability, performance, etc.).
  • Define product limits: Define product limits that allow us to convert problems commonly considered "technical" to business/product problems. For example, instead of talking about scalability as an abstract concept, concrete it at the product level with clear limits (number of concurrent users/requests, maximum response times, etc).
  • Look for boundaries and use them to assign clear ownership for each identified part. The concepts of bounded contexts of DDD and value stream can assist in identifying such limits within the system. Having clear ownership of each part of the system is fundamental since we must remember that we work on complex socio-technological systems.
  • Work in small safe steps: Controlling the evolution of the system is complex and requires us to constantly rethink and adapt. We can only accomplish this if we work in small safe increments of functionality that allow us to change direction without leaving features half-integrated.

The use of these tactics is usually adequate, but as with everything in our profession, it depends, there are times when a little upfront planning and attention to future needs would be beneficial. See "It depends" / The development Mix (Product, Engineering, Hygiene.

Below I include several examples of how I have utilized some of these tactics to control the evolution of the system and not be at its mercy.


System Load / Scalability evolution

Having performance or stability/availability problems when a system is under high load is very common. The most common approaches for this evolution dimension are:

  • Doing nothing and reacting when we discover the first problems (potentially damaging the relationship with our users due to production incidents).
  • Solving “imaginary” scalability problems by preparing/designing our systems for a load that is not real and generating a high waste. Sometimes this is the default option chosen in engineering simply for the pleasure of solving an interesting problem.

I think that there is a more sensible approach that allows us to control evolution without generating waste (See Lean software development). In this case, we need to have some metrics about the current load and a rough idea of how much maximum load our system can support (without degrading).

With this information, we can define a soft limit (Product Limit) for the load, as the maximum supported load minus a threshold that gives us time to evolve the system to support more load.

As we can see in the example diagram, we receive a notification at t1 informing us that we reach the moment to improve the scalability of the system. Between t1 and t2, we improve the scalability to support a new load maximum and redefine the new soft limit and repeat the process when the limit is reached (t3).

Storage volume growth

When we store information it is very common that the associated growth of the information volume reaches a point when the technology selected is no longer valid or that we can start to have some problems (high cost, backup problems, etc). 

As in the previous example, we can do nothing and react after we start to have problems or use a tactic to control the evolution/growth.

In this case, if we need to maintain all the historical information, we can follow a similar approach as in the previous example about scalability. But if we don’t need to store the historical information we can create a soft limit to detect when we should start to develop an automated data purge process that periodically removes obsolete information. This approach is very lean because it allows us to postpone the implementation of the data purge process until the last responsible moment.

Controlled errors and quotas

One of the easiest ways to lose control of a system is to have no controlled way of handling errors. When we don’t control errors, we are in risk of having cascade errors and affecting more users. This is one of the most common examples of losing control of the system.

One of the first tactics that help us to maintain control of a system is to detect unexpected runtime errors, avoid crashing and at least send a message to the user and avoid as much sa possible that this error impacts other users.

Another good tactic is to define quotas per user (Product Limits) and use these quotas to throttle requests and show a warning message to the user. In the message, we can inform the user to contact support and use this opportunity to get more information about how the user is using the system or even offer a better SLA to the user.

Committed capacity

When we work in a multitenant SaaS environment is very common that the resources used by the sum of all of our customers increase very fast. One interesting tactic to maintain the control of the evolution of the system in this context is to define how much capacity (of a resource) we commit to each type of user (Product limits). Even knowing that customers will not fully utilize their full capacity, having these definitions allows us to identify the worst-case load scenarios we could have.

In the example diagram, we define the different quotas for each type of user (basic, professional, enterprise). With these definitions and the number of users of each type, we can calculate the maximum committed load at any moment and use this information to make decisions.

This tactic is used frequently by AWS. For each type of resource, they define default limits, and to extend these limits you need to contact support. AWS can use the information from support requests to make very accurate capacity plans and make better product decisions.

Modular monolith

Building a system from scratch is not an easy task, and it is very easy to lose control of the system as it grows (more developers / more functionality). The two most common failures are:

  • Growing a monolith organically without committing effort to modularization and ending up with a big ball of mud (
  • Incorporating a microservices architecture when it is not yet time to scale greatly increases complexity without obtaining any of the benefits of this architecture.

In general, better results can be achieved by developing a monolith, but organizing the code and data into internal modules (modular monolith). As we become familiar with the system and domain, we organize it internally into modules (in alignment with bounded contexts), so we can maintain control of the architecture throughout its evolution. 

When we see the need, we can extract some of these modules into independent services.

It is essential to create mechanisms that help us maintain this modularity during evolution (to perform architecture tests to avoid not allowed dependencies between modules, to use different schemas in the DB, to analyze dependencies periodically).

In addition, we can use these modules to assign ownership of each to a team, allowing each team to work independently.


When we lose control over the evolution of the system:

  • - we become inefficient and fail to create value.
  • - we lose customer trust (incidents, errors, etc).
  • - we react to problems instead of being able to follow our product strategy.

The business impact is huge. Therefore, it is essential to manage complexity and keep evolution under our control.

The tactics I discuss in this article are quite simple and focus on detecting needs early enough so that we can react in a planned way (rather than reacting under pressure). Moreover, they do not require large investments, in fact, they allow us to have a lean approach and avoid over-engineering. But they do require good hygiene practices (testing, observability, reasonable quality, etc). 

References and related content:


The post has been improved based on feedback from:

Thank you very much to all of you

Saturday, April 09, 2022

Good talks/podcasts (April 2022 I)

These are the best podcast/talks I've seen/listen to recently:
  • Autonomy, mastery, purpose (Drive summary) (Dan Pink) [Inspirational, Management] [Duration: 0:03:00] (⭐⭐⭐⭐⭐) Daniel Pink shares a study about what truly motivates employees. Excerpted from his talk on "motivation."
  • The puzzle of motivation (Dan Pink) [Inspirational, Management] [Duration: 0:18:00] (⭐⭐⭐⭐⭐) Dan Pink examines the puzzle of motivation, starting with a fact that social scientists know but most managers don't: Traditional rewards aren't always as effective as we think.
  • The secret to giving great feedback (LeeAnn Renninger) [Management] [Duration: 0:05:00] Cognitive psychologist LeeAnn Renniger shares a scientifically proven method for giving effective feedback.
  • Making Badass Developers (Kathy Sierra) [Developer Productivity, Devex, Inspirational] [Duration: 0:23:00] Interesting talk to understand how cognitive load works, how we learn and how we should take into account these concepts to improve our experience as developers.
  • Responsible Engineers and Outcomes with Mary and Tom Poppendieck (Mary Poppendieck, Tom Poppendieck) [Agile, Engineering Culture, Lean Product Management, Lean Software Development] [Duration: 0:49:00] Mary and Tom Poppendieck discuss about "Responsible Engineers and Outcomes". They walkthrough concrete examples of what it looks like for lean teams to maximize outcomes and minimize outputs. While exploring these examples, they touch on customer interaction, feedback, the responsible engineer, and the single threaded leader.
  • Production - Designing for Testability (Michael Bryzek) [Continuous Delivery, Devops, Engineering Culture, Testing in production, testing] [Duration: 0:50:00] (⭐⭐⭐⭐⭐) Michael Bryzek explores what it’s like to build quality software with no development, QA, or staging environments. He includes a deep dive into “verification in production” and what it really takes to build software that can safely be tested continuously in production.
  • Keynote: Creating a Holistic Developer Experience (Jasmine James) [Developer Productivity, Devex] [Duration: 0:15:00] (⭐⭐⭐⭐⭐) Great talk to understand what is development experience.
  • OOP is Dead! Long Live OODD! (David West) [OOP, Software Design] [Duration: 1:07:00] Interesting talk about object orientation, the history and how in many cases we don't understand it correctly. It's a bit ranty, but very interesting and inspiring in any case.
Reminder, All these talks are interesting even just listening to them.


Thursday, March 31, 2022

Mesa Redonda: Colaboración entre perfiles de producto y desarrollo (Code Sherpas)

El pasado 24 de Marzo tuve el placer de participar en una mesa redonda sobre "Colaboración entre perfiles de producto y desarrollo" donde pude aprender y compartir con Marta Manso, María Granadino, Isabel Garrido y Cristina Verdi.

Gracias a Code Sherpas por la organización y dinamización del evento.

Monday, March 28, 2022

Our DevOps Journey @ ClarityAI [2022 03 DevOps Lisbon]

Few days ago (2022-03-14), I had the pleasure of talk at the DevOps Lisbon about the last two year evolution of the engineering organization at ClarityAI. In my presentation, I examined the evolving DevOps culture as well as strategies for structuring teams according to Team Topologies' ideas.
Thanks to the Devos Lisbon for the invitation and to all the participants for the attention.


Slide deck

Original Doc (with notes)


Sunday, March 13, 2022

Good talks/podcasts (March 2022 II)


These are the best podcast/talks I've seen/listen to recently:
  • Small Batches. The four types of problems (Adam Hawkins) [Lean, Mental models] [Duration: 0:07:00] Good pill to understand the classification of problems in Lean (troubleshooting, gap from standard, target condition, and open-end problems).
  • Ten (Hard-Won) Lessons of the DevOps Transition (Randy Shoup) [Devops, Engineering Culture, Inspirational] [Duration: 0:26:00] (⭐⭐⭐⭐⭐) This talk discusses the cultural change required to adopt a devops mentality. Excellent advice and warnings derived from Randy's experience leading teams at eBay, Google, and KIXEYE.
  • Don't Get Down-Leveled or How to Tell a Good Story (From a Principal at Amazon) (Meta) [Engineering Career] [Duration: 0:15:00] An excellent guide on how to tell a story during a behavioral interview.
  • CTO Craft 2021: Cross the River by Feeling the Stones (Simon Wardley) [Product Strategy, Technology Strategy] [Duration: 0:30:00] In this session, Simon examines the issue of situational awareness. Using examples from government and the commercial world, Simon explores how we can map our environment, identify opportunities to exploit and learn to play the game.
  • Frozen DevOps? Team Topologies Comes to the Rescue! - DevOpsDays Poznań 2021 (Manuel Pais) [Devops, Engineering Culture, Teams, team topologies] [Duration: 0:36:00] In this talk Manuel covers the self-imposed limitations of blindly following some “myths” around DevOps. Most organizations are stuck in the "frozen middle" of DevOps evolution due to a lack of organizational sensemaking. They must think beyond technical capabilities to unleash the potential of their teams to deliver with greater autonomy and a sense of purpose.
  • Expert Talk: DevOps & Software Architecture GOTO 2021 (Dave Farley, Simon Brown) [Architecture, Continuous Delivery, Inspirational] [Duration: 0:40:00] (⭐⭐⭐⭐⭐) In a world where software architecture is evolving rapidly, we are confronted with new challenges. Simon Brown, Dave Farley, and Hannes Lowette cover some of the recent trends in software architecture touching on hot topics such as DevOps and how to deal with complexity.
Reminder, All these talks are interesting even just listening to them.