Sunday, November 14, 2021

Improve or Die

A software development/product development organization should always be learning and improving.

When the organization is not learning or improving means that it is going backward, software development is a complex socio-technical system formed by several interrelated reinforcing loops. Some of the loops are positive (virtuous cycles) and some negative (vicious cycles), but the problem is that in such a complex system is difficult to find any balance, so in general, we are always moving. 

So the question is, in which direction? Are we learning and improving as a team, or are we dying or falling behind

Even if we managed to maintain a continuous flow of development with stable quality and speed (which is impossible), the whole ecosystem around us continues to improve and advance, so even in that case, we would be losing ground.

In general, the reinforcing loops are generated by things that compound with time or volume. 

For example, these are some things with negative compound effects: 

  • Complexity and basal cost of the product (cumulative features)
  • Quality problems.
  • Technical debt (if not managed).

Virtuous cycles examples:

  • Continuous delivery requires quality, requires small batches, removes silos, improves ownership, etc.
  • Product Team ownership requires autonomy, requires product instrumentation, requires learning from customers, generating more value, etc. 
  • etc

Vicious cycles examples:

  • Unmanaged technical debt, remove capacity from the team, generate more pressure, generate more technical debt, etc.
  • Accidental complexity makes difficult to understand the code, so generate more bugs, generate more pressure for the team, generate poor solutions with more accidental complexity, etc.
  • A bad deployment process generates frustration, so we tend to larger batches that are riskier, so have more problems, generate even more frustration, etc.
  • etc.

When we have several of these vicious cycles, it is easier than it seems to fall into a downward spiral from which we cannot get out.

So, are you investing in breaking the vicious cycles of poor quality, high resource usage, and unmanaged technical debt? Or are you investing in improving your virtuous cycles of working in small batches, with ownership and high quality?

Are you improving, or are you dying and falling behind?


And if the problem is that you don't know what a high-performance technology organization should look like, you are lucky; we now have information on how it should be (Accelerate book).


Related:


Sunday, November 07, 2021

Good talks/podcasts (November 2021 I)



These are the best podcast/talks I've seen/listen to recently:

  • Debt Metaphor (Ward Cunningham) [Inspirational, Technical Practices, Technology Strategy, XP] [Duration: 0:05:00] (⭐⭐⭐⭐⭐) Ward Cunningham reflects on the history, motivation and common misunderstanding of the "debt metaphor" as motivation for refactoring.
  • EP 47: How to scale engineering processes w/ Twitter's VP of Engineering (Maria Gutierrez) [Engineering Career, Engineering Culture, leadership] A very interesting interview with Maria Gutierrez. Great lessons about team management, building a company culture, hiring, and mentorship.
  • Getting Started With Microservices (Dave Farley) [Architecture, Architecture patterns, Continuous Delivery] In this episode, a microservices tutorial, Dave Farley describes the microservices basics that help you to do a better job. He describes three different levels that we need to think about when designing a service and offers his advice on how to focus on the right parts of the problem to allow you to create better, more independent, services, based on Dave’s software engineering approach.
  • Industry Keynote: The DevOps Transformation (Jez Humble) [Agile, Continuous Delivery, Devops, Engineering Culture, leadership] (⭐⭐⭐⭐⭐) In this talk Jez will describe how to implement devops principles and practices, how to overcome typical obstacles, and the outcomes DevOps enables. A must-see talk.
  • Lunch & Learn How to Misuse DORA DevOps Metrics (Bryan Finster) [Devops, Engineering Culture, leadership] Interesting presentation in which bryan describes an agile/devops transformation, telling us about mistakes and successes. Interesting learnings, tips, and ideas.
Reminder, All these talks are interesting even just listening to them.

Related: 

    Wednesday, October 27, 2021

    "It depends" / The development Mix (Product, Engineering, Hygiene)

    We already know that is everything about context. I read a lot of blog posts talking about how much time a team should use for decoupling components, introducing a new cache system, or improving the scalability of their systems. When reading this type of content, I always think that they are completely right and completely wrong. Everything in our profession depends a lot on the context (the moment of the company, the business strategy, the market traction, etc.).

    I use a mental model that helps me classify the work we do, which allows me to communicate and make decisions. I call this mental model "The Mix".

    In "The Mix", I classify the work we do as product engineers in:
    • Normal product development.
    • Implementing the Engineering Roadmap.
    • Basic hygiene work.

    Normal product development

    Normal product development should be the most common type of work for a Stream Aligned team. It should help to fulfill the mission of the team. It can be composed of new feature development, discovery experiments, feature evolution, etc. I prefer a very lean approach for this work, following agile development methods such as XP or Lean Software Development. It is essential to generate the expected outcomes with the minimal amount of code possible and a good internal quality that minimizes the maintenance cost. Following YAGNI, KISS, Simple design is the perfect approach for this kind of work. We don't know the future. The most efficient way to work is to have the most simple solution that covers our customer's needs without making any "speculative" design that generates tons of accidental complexity in 99% of the cases.

    Summary:
    • Focus on outcomes for the customer within the business constraints.
    • Evolutionary design uses Simple design and avoids creating anything for future "expected/invented" needs.
    • Use a Lean approach (working in small safe steps).
    • Avoid solving problems that we don't have.
    • High-speed feedback loop.
    • Aligned with the Product Roadmap.

    Implementing the Engineering Roadmap

    In parallel to the product work, it is very common to identify engineering needs derived from the company's engineering strategy. This strategy should prepare and maintain the current and future engineering capability. Examples of this type of work are:
    • Designing the system for fast expected growth (the number of customers, engineering team size, etc.).
    • A technology stack change.
    • A change in the delivery strategy (from On-Prem to SaaS, from Web to mobile, etc.).
    • Prepare the architecture to enable work in autonomous teams.
    • This kind of work usually affects several Stream Aligned teams simultaneously and requires coordination at the engineering organization level.
    • These initiatives require a lot of investment and should be coordinated with the product roadmap and aligned with the company's general strategy.

    Summary:
    • Focus on outcomes for the internal architecture and engineering processes.
    • Require more upfront effort to design the solution.
    • It can be implemented with an agile approach but based on the initial design.
    • Low-speed feedback loop.
    • By definition, try to solve problems that we don't have (yet).
    • It is aligned with the Engineering Roadmap (coordinated with the Product Roadmap).

    Basic hygiene work

    To develop any nontrivial product, we need to have some practices and development infrastructure that I consider basic hygiene. I'm talking about having a reasonable test strategy, zero-downtime releases, good internal code quality, basic security practices, etc.
    In the middle of 2021, not considering these points above seems simply a lack of professionalism. 
    So the Basic hygiene work includes any effort we make to implement or improve these minimal practices.

    Of course, I am a big fan of product discovery with prototypes, and these, for example, do not have to have the same test strategy. But remember, a prototype that ends up in production, staying in front of our customers for months, is not a prototype. It is a trap.
     


    Using The Mix

    Thinking about these three types of work and separating them helps me be more explicit about the context and the appropriate trade-offs in each situation. For example, suppose we are in a Normal Product Development initiative. In that case, we cannot expect big architecture change decisions to emerge, and it is better to focus on small safe steps that add value. At the same time, we take notes to consider some initiatives to introduce in the engineering roadmap.

    A mature product organization will introduce performance, scalability, and availability initiatives into the product roadmap. In a less mature organization, those needs are likely to be missing from the product roadmap, and it is up to engineering to fight to get them into the engineering roadmap.

    We can summarize the different dimensions in this table:
     
    Product development Engineering Roadmap Hygiene
    Source
    Product
    Engineering
    Team (+Engineering)
    Development
    Small Safe Steps
    Upfront Planning + Small Safe Steps
    Small Safe Steps
    Practices
    YAGNI, KISS, TDD...
    Evolutionary Architecture, Load Testing, Testing in Production, migration planning...
    Clean Code, CI, CD, Zero downtime, Observability...
    Type of needs
    Current needs
    Future needs
    Prerequisite
    Value Delivery
    Very Fast
    Slow
    Very Fast
    Coordination Needs
    The team should be autonomous
    Coordination with other teams
    Coordination with other teams


    If we analyze the current trends in technology using this mental model, some questions arise:
    • How do technologies like PaaS or Serverless influence this Mix?
    • How does working in Cloud vs. working on-prem affect the engineering roadmap?
    • Does it make sense to consider ourselves good professionals if we don't have strong knowledge about hygiene factors?
    • How does the mix change in the different phases of the company (startup pre-product-market fit, scale-up, big tech)? And with the life cycle of the product?
    The interesting thing about mental models is that they help us think. I hope this model is as valuable for you as it is to me.

    Related / Other mental models

    Sunday, October 17, 2021

    Good talks/podcasts (October 2021 II)



    These are the best podcast/talks I've seen/listen to recently:

    • How conscious investors can turn up the heat and make companies change (Vinay Shandal) [Inspirational] [Duration: 0:13:00] In a talk that's equal parts funny and urgent, consultant Vinay Shandal shares stories of the world's top activist investors, showing how individuals and institutions can take a page from their playbook and put pressure on companies to drive positive change. "It's your right to have your money managed in line with your values," Shandal says. "Use your voice, and trust that it matters."
    • Software at Scale 13 - Emma Tang: ex Data Infrastructure Lead, Stripe (Emma Tang) [Big Data, Data Engineering, Operations, Platform, Technical Practices] [Duration: 0:41:00] (⭐⭐⭐⭐⭐) Effective Management of Big Data Platforms. Very interesting discussion about the technological and organizational challenges of maintaining big data platforms.
    • Improving Software Flow (Randy Shoup) [Agile, Continuous Delivery, Engineering Culture, Inspirational, Technical leadership] [Duration: 0:46:00] (⭐⭐⭐⭐⭐) Great presentation, in which Randy, starts from the 5 ideals of the Unicorn project (Locality and Simplicity, Focus, Flow, and Joy, Improvement of Daily Work, Psychological Safety, Customer Focus) to describe what we can do as technical leaders and as engineers to improve our ability to build and deliver software.
    • Developer Productivity with Utsav Shah (Utsav Shah) [Devex, Devops, Platform, Platform as a product] [Duration: 0:41:00] In this episode of Software Engineering Daily podcast, Utsav Shah talk about developer productivity in the context of the monolith, CI/CD, and best practices for growing teams.
    • Simplifying The Inventory Management Systems at the World’s Largest Retailer Using Functional Programming Principles (Scott Havens, Gene Kim) [Architecture, Architecture patterns, Functional, Technical leadership, Technology Strategy] [Duration: 2:02:00] (⭐⭐⭐⭐⭐) Havens shares his views on what makes great architecture great. He details what happened when an API call required 23 other synchronous procedures calls to return a correct answer. He discusses the challenges of managing inventory at Walmart, how one implements event sourcing patterns on that scale, and the functional programming principles that it depends upon. Lastly, he talks about how much category theory you need to know to do functional programming and considerations when creating code in complex systems. It is recommended to first watch the talk https://www.youtube.com/watch?v=n5S3hScE6dU or listen to the podcast https://itrevolution.com/the-idealcast-episode-22/
    Reminder, All these talks are interesting even just listening to them.

    Related: 

    Sunday, October 03, 2021

    Good talks/podcasts (October 2021 I)

     


    These are the best podcast/talks I've seen/listen to recently:

    • You're Testing WHAT? (Gojko Adzic) [Technical Practices, Testing in production, testing] [Duration: 0:38:00] Gojko presents five universal rules for test automation, that will help you bring continuous integration and testing to the darkest corners of your system. Learn how to wrestle large test suites into something easy to understand, maintain and evolve, at the same time increasing the value from your automated tests.
    • Using Observability to improve the Developer Experience (Borja Burgos) [Devex, Devops, Platform, Platform as a product] [Duration: 0:15:00] Observability is often associated with production and live environments, but it shouldn't be! In this talk we'll explore innovative ways in which modern observability tools and best practices can be leveraged during development to: improve developer productivity, identify regressions earlier in the SDLC, and increase the performance and reliability of our CI/CD workflows.
    • Continuous Delivery (Jez Humble) [Agile, Continuous Delivery, Engineering Culture, Lean Software Development] [Duration: 0:47:00] (⭐⭐⭐⭐⭐) Great 2012 presentation on Continuous Delivery. Jez discusses the value of CD to the business. He presents the principles and related practices, including value stream mapping, deployment pipelines, acceptance test-driven development, zero-downtime releases, etc. This talk is a while old, but still as relevant as the first day.
    • Test Driven Development with Geepaw Hill (Clare Sudbery, GeePaw Hill) [Agile, Technical Practices, tdd, testing] [Duration: 0:50:00] Clare talks to Geepaw Hill about why he loves TDD so much and how he spreads that love to software teams all over the world.
    • Software Delivery Glossary (Adam Hawkins) [Continuous Delivery, Lean, Lean Software Development] [Duration: 0:09:00] This podcast describes a few software delivery concepts (Lead time, Deployment frequency, MTTR, Change Failure Rate, Jidoka, Kaizen).
    • Churn FM. EP131 How the best leaders empower their product teams and set them up for success (Marty Cagan) [Inspirational, Lean Product Management, Product, Product Team] [Duration: 0:36:00] An interesting conversation about the tremendous gap between how the best companies operate and how the rest work, what they do differently, and why.  They also discussed how the best leaders empower their teams, how real product discovery and product work happens and then talked about how alignment is a consequence of a good product strategy.
    Reminder, All these talks are interesting even just listening to them.

    Related: 

    Sunday, September 26, 2021

    Code reviews (Synchronous and Asynchronous)

    There are different types of Code Reviews with different objectives. This article only refers to those Code Reviews that another team member does before incorporating the change in the shared code repository.

    The objective of these reviews is to improve the quality, avoid errors introduced in the shared code repository, and prevent issues from going out to production.

    Therefore the code reviews we are talking about:

    • Avoid errors/bugs by acting as a safeguard.
    • Occur within the flow of making changes to production.

    In this blog post, we will not discuss other types of Code Reviews that are done outside of the normal development flow.

    Most common: Asynchronous code reviews.

    In this context, a very common (and for some, recommended) way of performing code reviews is that a developer (A) works individually on a task (d) and upon completion of the general change, creates a Pull Request (PR)/ Merge Request (MR) that another person in the team must review asynchronously.

    We can see an example of this way of working in the following diagram. In it, we can see how developer A generates PR/MRs that developer B reviews, generating proposals for changes that developer A should incorporate.

    In the example, the feature/change is composed of increments d1, d2, d3, and we see that we deploy it to production at the end of the code review of increment d3.



    We can see that A's development flow is constantly interrupted to wait for the corresponding feedback for each code review. This way of working delays the release to production, increasing the time it takes to receive real feedback from customers.

    Because each individual code review takes so long, there is an unconscious and natural tendency for developers to work in increasing larger increments. At least, this is what I found in many teams I have worked with.

    It is a curious situation because, on the one hand, the industry says that working in small safe steps is a recommended practice. On the other hand, many companies use async reviews that tend to discourage them.

    So the most common way of working looks like the following diagram:


    Moreover, in this case, by generating large PRs/MRs, the code review loses meaning and, in many cases, becomes a pantomime (LGTM). (See: how we write/review code in big tech companies)
    With this last way of working, the lead time to production improves a little, but at the cost of taking bigger steps and losing the benefits of detailed code reviews.

    When working with asynchronous code reviews, the reality is usually more complex than what we have seen in the previous diagrams. The reality usually includes many developers, simultaneous code reviews and cross flows in which there is a strong tendency for individuals to work alone and do continuous context switching.




    In fact, there is no incentive for the team to focus and collaborate on a single user story at the same time since everyone would be busy with their tasks and doing asynchronous code reviews for each other.

    Another problem that often occurs is that code reviews do not have sufficient priority and generate long queues of pending reviews. These queues cause the lead time to grow and create deployments with larger batches of changes (more risk).

    Is there a better way?

    Fortunately, there are synchronous code reviews and continuous code reviews.

    Synchronous, high priority code reviews

    The first step is to give the highest priority to code reviews and to do them immediately and synchronously between the person who developed the increment and the person in charge of doing the review.

    This way of working reduces the total lead time of each increment and encourages working in small increments.

    As a side effect, the code reviews should be more straightforward because the developer themself explains them and can express in detail why they have taken each of the decisions.


    This way of working requires a lot of coordination between team members, and it is not easy to make them always synchronous, but of course, it is a simple step if we are already doing asynchronous code reviews.

    It is as simple as making PRs/MRs top priority, and as soon as you have one ready, coordinate with someone else to review it together on a single computer.

    Pairing/Ensemble programming: Continuous Code Review

    Pair programming: "All code to be sent into production is created by two people working together at a single computer. Pair programming increases software quality without impacting time to deliver. It is counter intuitive, but 2 people working at a single computer will add as much functionality as two working separately except that it will be much higher in quality. With increased quality comes big savings later in the project.". http://www.extremeprogramming.org/rules/pair.html
     
    When we work using pair programming, as recommended in extreme programming, two people design and develop the code. This way of working generates a continuous review of the code, so it no longer requires a final review phase. 

    Of course, from the point of view of coordination, it is the simplest system, since once the pairs have been formed, each pair organizes itself to make increments, which are implicitly reviewed continuously.

    In this case, the diagram representing this way of working would be: 



    A new way of working known as mob/ensemble programming has appeared as an evolution of the pair programming practice. In this modality, the whole team works simultaneously on a single task and using a single computer. 

    The whole team collaborates in the design and development, limiting the context switching and minimizing the lead time for the task. As in pair programming, mob/ensemble programming performs a continuous and synchronous review of the code.

    Of the ways of working that we have seen, pair or group programming has the shortest lead time for each change, fewer context switching interruptions, and requires less coordination.

    Even so, pair and group programming is not a simple practice and requires effort and a lot of practice. IMHO it is clear that this way of working takes the quality benefits usually associated with code reviews to the extreme.

    The industry vision, my personal vision

    The industry seems to agree that code reviews are a good practice and should be implemented by all development teams.
    On the other hand, it is also recommended in most cases to work in small increments so that the complexity and risk of each change can be better controlled.

    With these two points in mind, it seems a good idea to use Code Reviews and make them as synchronous and continuous as possible to work on those small increments of low complexity and low risk.

    In my case, I have been successfully pushing pair or mob programming for a few years now, creating a stream of small increments that are reviewed continuously.

    In both Alea Soluciones and TheMotion, we used pair programming by default and mob programming occasionally. Both companies' deployment frequency was daily, and we could make considerable changes in small and safe steps. In addition to this constant and sustainable development flow, we got attractive additional benefits such as the ease of spreading domain and technical knowledge or make fast onboarding of new members. In the case of Alea Soluciones, we hired people who already had experience in pairing and XP practices, so it was easier. In the case of TheMotion, however, we had to introduce some of the agile development practices (pairing, TDD, etc.), so we hired technical coaching help.
    In Nextail there was a mix of teams doing pairing and continuous reviews and others using asynchronous code reviews but prioritizing to avoid queues.
    At Clarity AI, some teams use asynchronous Code Reviews but prioritize them to minimize the lead time. There are already two teams doing continuous code reviews (using pairing or mob programming) and we are gradually expanding this way of working to other teams.

    Of course, the statistical value of my experience is zero, and it is much more interesting to analyze the results of more rigorous analyses such as DORA (https://www.devops-research.com/research.html). In them, we can see how the following practices are associated with high-performance teams:

    Related: