Sunday, June 26, 2022

Good talks/podcasts (Jun 2022 II)


 

These are the best podcast/talks I've seen/listen to recently:
  • LOTE #7: Kelsey Hightower on Developer Experience, PaaS, and Testing in Production (Kelsey Hightower) [Cloud, Developer Productivity, Devex, Platform as a product] [Duration: 0:41:00] In the seventh episode of the Ambassador Livin’ on the Edge podcast, Kelsey Hightower, technologist at Google, discusses his thought on cloud developer experience, modern Platform-as-a-Service (PaaS), and explores the reality that every organisation is testing in production
  • Should Computers Run the World? (Hannah Fry) [AI, Data Science, General, Inspirational] [Duration: 0:36:00] Hannah Fry takes us on a tour of the good, the bad and the downright ugly of the algorithms that surround us. She lifts the lid on their inner workings, to demonstrate their power, expose their limitations, and examine whether they really are an improvement on the humans they are replacing.
  • Small Batches - PDCA (Plan-Do-Check-Act) (Adam Hawkins) [Lean] [Duration: 0:06:00] Adam present's Dr. Deming's PDCA cycle and how it applies to the daily work of delivering software.
  • Steve Jobs on programmer productivity (Steve Jobs) [Lean Product Management, Lean Software Development] [Duration: 0:01:00] Excerpt (1m) of Steve Jobs presenting the software as a liability to minimize. You know, make impact, not software. "The way you get programmer productivity is not by increasing the lines of code per programmer per day. That doesn’t work. The way you get programmer productivity is by eliminating lines of code you have to write."
  • From initial request to software in production in 3 weeks (Christin Gorman) [Inspirational, Lean Software Development] [Duration: 0:22:00] (⭐⭐⭐⭐⭐) Simplicity--the art of maximizing the amount of work not done--is essential. Great talk on how to focus on the essentials and make simple solutions.
  • The Future of Mars Exploration (Anita Sengupta) [Inspirational] [Duration: 0:54:00] In this talk, you will learn about the motivation for Mars exploration and how computational modeling, high-tech solutions, and out-of-the-box thinking can be used to overcome engineering challenges.
Reminder, All these talks are interesting even just listening to them.

Related:

Sunday, June 12, 2022

Good talks/podcasts (Jun 2022 I)


 


These are the best podcast/talks I've seen/listen to recently:
  • Scaling Organizations and Design with James Shore (James Shore) [Agile, Evolutionary Design, Software Design, XP] [Duration: 0:48:00] (⭐⭐⭐⭐⭐) Great interview with great ideas (mocking-free testing, scalability of horizontal organization, FAST framework, code evolution with evolutionary design...). Very worthwhile.
  • Feature Branches and Toggles in a Post-GitHub World (Sam Newman) [Continuous Delivery, Technical Practices, Trunk Based Development] [Duration: 0:50:00] In this presentation Sam explains how working in feature branches creates friction when our goal is to achieve continuous delivery. Sam explains how to use Feature Toggles to improve delivery frequency.
  • From Kubernetes to PaaS to ... Err, What's Next? (Daniel Bryant) [Developer Productivity, Devex, Platform, Platform as a product] [Duration: 0:31:00] (⭐⭐⭐⭐⭐) In this talk Daniel reviews his experience in building platforms, both as an end-user and now as part of an organization that helps our clients do the same. She discusses topics such as DevEx, UX, workflows, available tools, etc.
  • Make Impacts Not Software (Gojko Adzic) [Lean Product Management, Lean Software Development, Product, Product Strategy] [Duration: 0:51:00] (⭐⭐⭐⭐⭐) An essential talk to understand how to get the most impact with the least amount of software (and thereby reduce basal cost and time to market). Highly recommended.
  • The Difference between Software Engineering and Manufacturing (Donald Reinertsen) [Lean Product Management, Product] [Duration: 0:09:00] In this interview, Don explains the basic difference between applying lean to a manufacturing process and to a digital/software product creation process.
  • the deep synergy between testability and good design (Michael Feathers) [Software Design, Technical Practices, XP, testing] [Duration: 0:50:00] Interesting talk about how to do good software design and the relationship between this design and the ease of testing.
  • Less - The Path to Better Design (Sandi Metz) [OOP, Software Design] [Duration: 0:50:00] (⭐⭐⭐⭐⭐) This talk strips away the well-known design principles and exposes the hidden, underlying goals of design. It reveals programming techniques that allow you to write less code while creating beautiful, flexible applications.
Reminder, All these talks are interesting even just listening to them.

Related:

Sunday, May 15, 2022

Good talks/podcasts (May 2022 I)

 


These are the best podcast/talks I've seen/listen to recently:
  • Full Cycle Developers at Netflix (Greg Burrell) [Architecture, Devops, Operations] [Duration: 0:48:00] (⭐⭐⭐⭐⭐) This talk presents Netflix' journey from siloed teams to our Full Cycle Developer model for building and operating our services at Netflix. Greg discusses the various approaches they’ve tried, the motivations that pushed them to keep evolving, and the lessons learned along the way.
  • Building DevX Teams, my story (Cirpo Cinelli) [Developer Productivity, Devex, Teams] [Duration: 0:42:00] (⭐⭐⭐⭐⭐) In this presentation, Cirpo talks about his past 4 years of experience setting up a DevX team from scratch, the main challenges, the pain, the gain, and the lessons learned.
  • How To Avoid Big Upfront Design (Dave Farley) [Architecture, Continuous Delivery, Evolutionary Design] [Duration: 0:18:00] In this episode, Dave Farley, describes how to avoid big up-front design, but also how to do enough design to make progress safely and allow us the freedom to learn and adapt our designs as we learn.
  • The Root Causes of Product Failure (Marty Cagan) [Product, Product Strategy, Product Team] [Duration: 0:49:00] Interesting talk on how the best companies and product teams work.
  • TDD Discipline - Thinking Ahead Without Coding Ahead (Jason Gorman) [Technical Practices, XP, tdd] [Duration: 0:11:00] Jason Gorman demonstrates a very common mistake TDD newbies make - writing the code you \*believe\* is required, instead of just the code needed to pass your tests.
Reminder, All these talks are interesting even just listening to them.

Related:

Monday, April 18, 2022

System: Control its evolution / or be its slave

 When we develop software (digital products), we usually deal with complex systems. We are not prepared (by definition) for emergent behaviors of these types of systems. As a result, it is hard to predict its future evolution and very easy to lose control over it over time.


If we don't control complexity, it will become increasingly difficult to adapt the system to what we need, and we will move from developing and evolving the system to spending more time reacting to it as it evolves. We will become slaves to the system rather than controlling its evolution. We will be spending our time working on production incidents, scalability improvements under pressure, and being in a firefighting mode instead of adding value to the product in a sustainable and continuous manner. Needless to say, the business impact is terrible (poor efficiency, unhappy customers, difficulty in defining strategy, etc).


In my experience, we can keep the system under control by using appropriate tactics.

Tactics to control system evolution

I have used the following tactics to control the evolution of systems I have worked on:

  • Good basic hygiene: automated tests, minimum observability, product instrumentation, etc.
  • Simple design based on current needs: Using incremental design and lean product development we can develop simple solutions that fulfill our current needs but allow us to evolve the system in the future.
  • Evolutionary architectures: Invest in an architecture that can evolve and that allows us to postpone decisions (reversible decisions, fitness functions, observability, etc.).
  • Acceptance criteria: Make the acceptance criteria explicit, so that we all know what we consider sufficient hygiene (testing, observability, performance, etc.).
  • Define product limits: Define product limits that allow us to convert problems commonly considered "technical" to business/product problems. For example, instead of talking about scalability as an abstract concept, concrete it at the product level with clear limits (number of concurrent users/requests, maximum response times, etc).
  • Look for boundaries and use them to assign clear ownership for each identified part. The concepts of bounded contexts of DDD and value stream can assist in identifying such limits within the system. Having clear ownership of each part of the system is fundamental since we must remember that we work on complex socio-technological systems.
  • Work in small safe steps: Controlling the evolution of the system is complex and requires us to constantly rethink and adapt. We can only accomplish this if we work in small safe increments of functionality that allow us to change direction without leaving features half-integrated.

The use of these tactics is usually adequate, but as with everything in our profession, it depends, there are times when a little upfront planning and attention to future needs would be beneficial. See "It depends" / The development Mix (Product, Engineering, Hygiene.


Below I include several examples of how I have utilized some of these tactics to control the evolution of the system and not be at its mercy.

Examples

System Load / Scalability evolution



Having performance or stability/availability problems when a system is under high load is very common. The most common approaches for this evolution dimension are:

  • Doing nothing and reacting when we discover the first problems (potentially damaging the relationship with our users due to production incidents).
  • Solving “imaginary” scalability problems by preparing/designing our systems for a load that is not real and generating a high waste. Sometimes this is the default option chosen in engineering simply for the pleasure of solving an interesting problem.

I think that there is a more sensible approach that allows us to control evolution without generating waste (See Lean software development). In this case, we need to have some metrics about the current load and a rough idea of how much maximum load our system can support (without degrading).

With this information, we can define a soft limit (Product Limit) for the load, as the maximum supported load minus a threshold that gives us time to evolve the system to support more load.

As we can see in the example diagram, we receive a notification at t1 informing us that we reach the moment to improve the scalability of the system. Between t1 and t2, we improve the scalability to support a new load maximum and redefine the new soft limit and repeat the process when the limit is reached (t3).

Storage volume growth



When we store information it is very common that the associated growth of the information volume reaches a point when the technology selected is no longer valid or that we can start to have some problems (high cost, backup problems, etc). 

As in the previous example, we can do nothing and react after we start to have problems or use a tactic to control the evolution/growth.

In this case, if we need to maintain all the historical information, we can follow a similar approach as in the previous example about scalability. But if we don’t need to store the historical information we can create a soft limit to detect when we should start to develop an automated data purge process that periodically removes obsolete information. This approach is very lean because it allows us to postpone the implementation of the data purge process until the last responsible moment.


Controlled errors and quotas


One of the easiest ways to lose control of a system is to have no controlled way of handling errors. When we don’t control errors, we are in risk of having cascade errors and affecting more users. This is one of the most common examples of losing control of the system.

One of the first tactics that help us to maintain control of a system is to detect unexpected runtime errors, avoid crashing and at least send a message to the user and avoid as much sa possible that this error impacts other users.

Another good tactic is to define quotas per user (Product Limits) and use these quotas to throttle requests and show a warning message to the user. In the message, we can inform the user to contact support and use this opportunity to get more information about how the user is using the system or even offer a better SLA to the user.


Committed capacity


When we work in a multitenant SaaS environment is very common that the resources used by the sum of all of our customers increase very fast. One interesting tactic to maintain the control of the evolution of the system in this context is to define how much capacity (of a resource) we commit to each type of user (Product limits). Even knowing that customers will not fully utilize their full capacity, having these definitions allows us to identify the worst-case load scenarios we could have.

In the example diagram, we define the different quotas for each type of user (basic, professional, enterprise). With these definitions and the number of users of each type, we can calculate the maximum committed load at any moment and use this information to make decisions.

This tactic is used frequently by AWS. For each type of resource, they define default limits, and to extend these limits you need to contact support. AWS can use the information from support requests to make very accurate capacity plans and make better product decisions.


Modular monolith


Building a system from scratch is not an easy task, and it is very easy to lose control of the system as it grows (more developers / more functionality). The two most common failures are:

  • Growing a monolith organically without committing effort to modularization and ending up with a big ball of mud (http://www.laputan.org/mud/).
  • Incorporating a microservices architecture when it is not yet time to scale greatly increases complexity without obtaining any of the benefits of this architecture.

In general, better results can be achieved by developing a monolith, but organizing the code and data into internal modules (modular monolith). As we become familiar with the system and domain, we organize it internally into modules (in alignment with bounded contexts), so we can maintain control of the architecture throughout its evolution. 

When we see the need, we can extract some of these modules into independent services.

It is essential to create mechanisms that help us maintain this modularity during evolution (to perform architecture tests to avoid not allowed dependencies between modules, to use different schemas in the DB, to analyze dependencies periodically).

In addition, we can use these modules to assign ownership of each to a team, allowing each team to work independently.

Conclusions

When we lose control over the evolution of the system:

  • - we become inefficient and fail to create value.
  • - we lose customer trust (incidents, errors, etc).
  • - we react to problems instead of being able to follow our product strategy.

The business impact is huge. Therefore, it is essential to manage complexity and keep evolution under our control.


The tactics I discuss in this article are quite simple and focus on detecting needs early enough so that we can react in a planned way (rather than reacting under pressure). Moreover, they do not require large investments, in fact, they allow us to have a lean approach and avoid over-engineering. But they do require good hygiene practices (testing, observability, reasonable quality, etc). 


References and related content:

Thanks

The post has been improved based on feedback from:

Thank you very much to all of you





Saturday, April 09, 2022

Good talks/podcasts (April 2022 I)




These are the best podcast/talks I've seen/listen to recently:
  • Autonomy, mastery, purpose (Drive summary) (Dan Pink) [Inspirational, Management] [Duration: 0:03:00] (⭐⭐⭐⭐⭐) Daniel Pink shares a study about what truly motivates employees. Excerpted from his talk on "motivation."
  • The puzzle of motivation (Dan Pink) [Inspirational, Management] [Duration: 0:18:00] (⭐⭐⭐⭐⭐) Dan Pink examines the puzzle of motivation, starting with a fact that social scientists know but most managers don't: Traditional rewards aren't always as effective as we think.
  • The secret to giving great feedback (LeeAnn Renninger) [Management] [Duration: 0:05:00] Cognitive psychologist LeeAnn Renniger shares a scientifically proven method for giving effective feedback.
  • Making Badass Developers (Kathy Sierra) [Developer Productivity, Devex, Inspirational] [Duration: 0:23:00] Interesting talk to understand how cognitive load works, how we learn and how we should take into account these concepts to improve our experience as developers.
  • Responsible Engineers and Outcomes with Mary and Tom Poppendieck (Mary Poppendieck, Tom Poppendieck) [Agile, Engineering Culture, Lean Product Management, Lean Software Development] [Duration: 0:49:00] Mary and Tom Poppendieck discuss about "Responsible Engineers and Outcomes". They walkthrough concrete examples of what it looks like for lean teams to maximize outcomes and minimize outputs. While exploring these examples, they touch on customer interaction, feedback, the responsible engineer, and the single threaded leader.
  • Production - Designing for Testability (Michael Bryzek) [Continuous Delivery, Devops, Engineering Culture, Testing in production, testing] [Duration: 0:50:00] (⭐⭐⭐⭐⭐) Michael Bryzek explores what it’s like to build quality software with no development, QA, or staging environments. He includes a deep dive into “verification in production” and what it really takes to build software that can safely be tested continuously in production.
  • Keynote: Creating a Holistic Developer Experience (Jasmine James) [Developer Productivity, Devex] [Duration: 0:15:00] (⭐⭐⭐⭐⭐) Great talk to understand what is development experience.
  • OOP is Dead! Long Live OODD! (David West) [OOP, Software Design] [Duration: 1:07:00] Interesting talk about object orientation, the history and how in many cases we don't understand it correctly. It's a bit ranty, but very interesting and inspiring in any case.
Reminder, All these talks are interesting even just listening to them.

Related:

Thursday, March 31, 2022

Mesa Redonda: Colaboración entre perfiles de producto y desarrollo (Code Sherpas)



El pasado 24 de Marzo tuve el placer de participar en una mesa redonda sobre "Colaboración entre perfiles de producto y desarrollo" donde pude aprender y compartir con Marta Manso, María Granadino, Isabel Garrido y Cristina Verdi.

Gracias a Code Sherpas por la organización y dinamización del evento.

Monday, March 28, 2022

Our DevOps Journey @ ClarityAI [2022 03 DevOps Lisbon]

Few days ago (2022-03-14), I had the pleasure of talk at the DevOps Lisbon about the last two year evolution of the engineering organization at ClarityAI. In my presentation, I examined the evolving DevOps culture as well as strategies for structuring teams according to Team Topologies' ideas.
Thanks to the Devos Lisbon for the invitation and to all the participants for the attention.

Video



Slide deck

Original Doc (with notes)



References

Sunday, March 13, 2022

Good talks/podcasts (March 2022 II)

 


These are the best podcast/talks I've seen/listen to recently:
  • Small Batches. The four types of problems (Adam Hawkins) [Lean, Mental models] [Duration: 0:07:00] Good pill to understand the classification of problems in Lean (troubleshooting, gap from standard, target condition, and open-end problems).
  • Ten (Hard-Won) Lessons of the DevOps Transition (Randy Shoup) [Devops, Engineering Culture, Inspirational] [Duration: 0:26:00] (⭐⭐⭐⭐⭐) This talk discusses the cultural change required to adopt a devops mentality. Excellent advice and warnings derived from Randy's experience leading teams at eBay, Google, and KIXEYE.
  • Don't Get Down-Leveled or How to Tell a Good Story (From a Principal at Amazon) (Meta) [Engineering Career] [Duration: 0:15:00] An excellent guide on how to tell a story during a behavioral interview.
  • CTO Craft 2021: Cross the River by Feeling the Stones (Simon Wardley) [Product Strategy, Technology Strategy] [Duration: 0:30:00] In this session, Simon examines the issue of situational awareness. Using examples from government and the commercial world, Simon explores how we can map our environment, identify opportunities to exploit and learn to play the game.
  • Frozen DevOps? Team Topologies Comes to the Rescue! - DevOpsDays Poznań 2021 (Manuel Pais) [Devops, Engineering Culture, Teams, team topologies] [Duration: 0:36:00] In this talk Manuel covers the self-imposed limitations of blindly following some “myths” around DevOps. Most organizations are stuck in the "frozen middle" of DevOps evolution due to a lack of organizational sensemaking. They must think beyond technical capabilities to unleash the potential of their teams to deliver with greater autonomy and a sense of purpose.
  • Expert Talk: DevOps & Software Architecture GOTO 2021 (Dave Farley, Simon Brown) [Architecture, Continuous Delivery, Inspirational] [Duration: 0:40:00] (⭐⭐⭐⭐⭐) In a world where software architecture is evolving rapidly, we are confronted with new challenges. Simon Brown, Dave Farley, and Hannes Lowette cover some of the recent trends in software architecture touching on hot topics such as DevOps and how to deal with complexity.
Reminder, All these talks are interesting even just listening to them.

Related:

Sunday, March 06, 2022

Good talks/podcasts (March 2022 I)

 


These are the best podcast/talks I've seen/listen to recently:
  • Mik + One: Manuel Pais (Episode 42) (Manuel Pais, Mik Kersten) [Devops, Engineering Culture, Teams, Technical leadership, team topologies] [Duration: 0:50:00] (⭐⭐⭐⭐⭐) During this episode, Mik and Manuel discuss some of the key issues in Team Topologies with great insights into different types of collaboration, treating platforms as products, and how to improve team flow by aligning teams with value streams. This was a very interesting episode.
  • Nordstrom Innovation Lab (nordstrominnovationlab) [Inspirational, Lean Product Management, Product, Product Discovery] [Duration: 0:06:00] (⭐⭐⭐⭐⭐) During this episode, Mik and Manuel discuss some of the key issues in Team Topologies with great insights into different types of collaboration, treating platforms as products, and how to improve team flow by aligning teams with value streams. This was a very interesting episode.
  • The Role Of QA in Agile Software (Dave Farley) [Continuous Delivery, Engineering Culture, testing] [Duration: 0:17:00] In this episode, Dave Farley explores the role of QA in modern agile teams and explores the move in some detail from gatekeepers to quality experts and ideas like continuous testing and QA as trusted advisors.
  • From Kubernetes to PaaS to Developer Control Planes (Daniel Bryant) [Developer Productivity, Devex, Platform, Platform as a product] [Duration: 0:24:00] Interesting talk on trends to remove cognitive load from developers in the era of cloud native services. Interesting ideas on how to develop a platform as a product (UX, focus on developer workflows and tool interoperability).
Reminder, All these talks are interesting even just listening to them.

Related:

Sunday, February 27, 2022

Charla: Experiencia de Desarrollo & Equipos de Plataforma Modernos

Quizás hayas escuchado el termino Developer Experience y lo importante que es para que los equipos sean efectivos. 
O hayas oido hablar de equipos de Plataforma pero te surgen dudas sobre cómo trabajan. 
En esta charla desgrano en que consiste la experiencia de desarrollo (Developer Experience) y como un equipo de plataforma pueden mejorarla sustancialmente. Explico cómo nos aproximamos al problema, decidimos qué implementar y el impacto conseguido. Hablo de optimización del proceso de onboarding, eliminación de fricción en el proceso de despliegue, instrumentación de pipelines de CI, mejoras en la autonomia de los equipos.

Todo esto desde el punto de vista de un equipo de desarrollo de producto que usa programación extrema (releases pequeñas y continuas, TDD, mobprogramming, propiedad collectiva del código, Trunk base development, etc). 

Esta charla está basada en la experiencia del equipo de Platform/DevEx de ClarityAI

La charla la preparé para la BilboStack 2022, pero dado que no se grabó y que había gente preguntandome a ver si la iba a repetir, coordine con las comunidades Agile Delivery, Pamplona/Iruña Software Crafters, Agile Sur, y Agile Norte, para repetirla (con algún cambio menor) y grabarla. La verdad es que se generó un interesante turno de preguntas al final.


 

Video




Presentación

Original Doc (with notes)


References

Saturday, February 12, 2022

Good talks/podcasts (February 2022 II)

These are the best podcast/talks I've seen/listen to recently:
  • Platforms at Twilio: Unlocking Developer Effectiveness (Justin Kitigawa) [Developer Productivity, Devex, Devops, Platform, Platform as a product] [Duration: 0:50:00] (⭐⭐⭐⭐⭐) Learn how Twilio’s internal Platform has evolved to reduce their engineers' cognitive load by providing a unified self-service, declarative platform to build, deliver, and run the thousands of global microservices that make up Twilio.
  • DOES15 Its All About Feedback (Elisabeth Hendrickson) [Devops, Quality, XP, testing] [Duration: 0:34:00] In this talk you’ll hear about Elisabeth's journey from the traditional silos with inherently long feedback latency to my current reality of increasingly tight feedback loops, and the lessons I’ve learned along the way. The talk include how Extreme Programming generate short feedback loops.
  • DOES14 On the Care and Feeding of Feedback Cycles (Elisabeth Hendrickson) [Continuous Delivery, Devops, Feedback cycles, Inspirational, Quality] [Duration: 0:31:00] (⭐⭐⭐⭐⭐) This talk examines the many forms of feedback, the questions each can answer, and the risks each can mitigate. Agile practices involve testing early and often. However feedback comes in many forms, only some of which are traditionally considered testing. Continuous integration, acceptance testing with users, even cohort analysis to validate business hypotheses are all examples of feedback cycles.
  • Complex Adaptive Systems (Dave Snowden) [DDD, Inspirational] [Duration: 0:57:00] Inspiring talk about Complex Adaptive Systems and what are the most effective approaches to deal with them.
  • Honeycomb & OpenTelemetry: Instrumentation Should Be Boring (Paul Osman) [Monitoring, Operations] [Duration: 0:28:00] In this session, Paul gives an overview of how Honeycomb has embraced OpenTelemetry as a key part of our instrumentation strategy, how this can help make instrumenting your code easier, and what you can expect from Honeycomb and OpenTelemetry in the near future.
Reminder, All these talks are interesting even just listening to them.

Related:

Sunday, February 06, 2022

Good talks/podcasts (February 2022 I)

 


These are the best podcast/talks I've seen/listen to recently:
  • TDD, where did it all go wrong. Summary (Ian Cooper) [tdd, testing, Technical Practices] [Duration: 0:04:00] Short summary of the "TDD, where did it all go wrong" talk.
  • Developer Productivity Engineering – The Next Big Thing in Software Development (US 2021) (Justin Reock) [Developer Productivity, Devex, Platform, Platform as a product] [Duration: 0:30:00] DPE is a new software development practice that uses acceleration technologies to speed up the software build and test process and data analytics to to improve developer efficiencies by as much as 10x. The ultimate aim is to achieve faster feedback cycles, more reliable and actionable data, and a highly satisfying developer experience.
  • SATURN 2018 Keynote: Uncoupling (Michael T. Nygard) [Architecture, Architecture patterns, Resilience] [Duration: 0:56:00] Interesting talk about coupling at the architecture level and its consequences.
  • Surviving Continuous Deployment in Distributed Systems (Valentina Servile) [Small Safe Steps (3s), Technical Practices, Trunk Based Development] [Duration: 0:40:00] A good description about the techniques required to deploy incremental changes and Trunk Based Development in a distributed system (expand-and-contract, outside-in with feature toggles).
  • Accelerating large engineering organisations with Internal Platforms (João Alves) [Developer Productivity, Devex, Platform, Platform as a product] [Duration: 0:45:00] Interesting talk on how to accelerate an engineering organization by creating an internal platform team with product focus. The talk is full of interesting examples.
  • Testing and Refactoring Legacy Code (Sandro Mancuso) [Evolutionary Design, Refactoring, Technical Practices, XP] [Duration: 1:29:00] (⭐⭐⭐⭐⭐) In this live coding session, Sandro will present many techniques that will help you to efficiently retrofit tests to legacy code and then refactor it to show the business logic more clearly.
  • The 3 Types of Unit Test in TDD (Dave Farley) [Technical Practices, XP, testing, tdd] [Duration: 0:17:00] This TDD tutorial explores the three types of test with some simple examples to demonstrate each one. It then goes on to explore the difference between stubs, fakes, spies and mocks in testing and describe some common difficulties that people sometimes face.
Reminder, All these talks are interesting even just listening to them.

Related: 

Saturday, January 22, 2022

Good talks/podcasts (January 2022 II)

 


These are the best podcast/talks I've seen/listen to recently:
  • Engineering Your Organization: Services, Platforms, and Communities (Randy Shoup) [Company Culture, Engineering Culture, Inspirational, Management, Platform, Platform as a product, Technology Strategy] [Duration: 0:38:00] (⭐⭐⭐⭐⭐) Great summary about the different ways high-performing engineering organizations gain leverage by specialization and sharing.
  • TDD, where did it all go wrong (Ian Cooper) [Technical Practices, tdd, testing] [Duration: 1:01:00] (⭐⭐⭐⭐⭐) Essential talk about how to do TDD in an efficient way and getting a battery of tests that support continuous refactoring. It fundamentally changed my approach to TDD. I highly recommend it.
  • Driving a Tech-led Reimagination of eBay Through DevOps (US 2021) (Randy Shoup, Mark Weinberg) [Devops, Technical leadership] [Duration: 0:33:00] (⭐⭐⭐⭐⭐) A very interesting session about eBay's strategy to improve delivery performance. A great example of engineering leadership.
  • How Honeycomb Manages Incident Response (Fred Hebert) [Incident respond, Operations] [Duration: 0:30:00] In this talk, Fred covers the full incident lifecycle at Honeycomb: all the way from first detecting issues to resolving them. But most of the effective practices we implement come from work that happens before and after those incidents. You'll also learn about systems we use at Honeycomb that can also help you implement better incident response with your teams.
  • Common Mistakes Data Scientists Make With BIG DATA (Dave Farley) [Big Data, Continuous Delivery, Data Engineering, Data Science] [Duration: 0:14:00] (⭐⭐⭐⭐⭐) n this episode Dave Farley explores how we could do a better job of dealing with data. Ideas like data pipelines and data mesh are becoming more common and more applicable as the scale of the data that we are dealing with grows. Managing the complexity in these activities, as in any other aspect of software engineering, is critical to success with data.
  • Ship It! #31 Is Kubernetes a platform? (Tammer Saleh, Gerhard Lazu) [Devops, Platform, Platform as a product, k8s] [Duration: 1:01:00] Interesting conversation about how to use k8s as a base for a platform.
Reminder, All these talks are interesting even just listening to them.

Related: 

Thursday, January 13, 2022

Fighting complexity: let's celebrate removals & simplifications

If you don’t actively fight for simplicity in software, complexity will win. …and it will suck. - @HenrikJoreteg

It is ubiquitous in our profession to celebrate adding new features or capabilities, but it is less common to celebrate removing components or simplifying the system. The problem is that we are using the wrong metaphor. We usually talk about “building” or “making” new features. But by using this “building” metaphor, we neglect all the work associated with the new element we created (feature, capability, component, etc.) (See Basal Cost of software). 

With my teams, I prefer to talk about new capabilities we enable, changes of behavior of our users, and the amount of complexity we manage and maintain. I tried to transmit that it is as essential to control and reduce complexity as developing new capabilities and features.

When I joined the Clarity AI Platform team, there was a problem of too much toil. This toil was generated by the lack of self-service capabilities for the stream-aligned teams, by the team's considerable number of components (k8s clusters, mongodb clusters, in-house monitoring platform, etc.), and the amount of accidental complexity of the infrastructure.

It came as no surprise to me. Clarity AI was a fast growing startup that during its first  years of run was in a hurry to achieve the product-market fit. This meant that controlling the complexity was not the priority in the first days (since the balance would have been much more difficult).

My first step was to quantify and classify the work/tasks and the source of each task. With this information and our context (a startup founded by VCs), we, the Platform team, determined that it does not make sense (in our case) to manage all that infrastructure ourselves. Therefore, we decided to use managed services whenever possible and simplify the infrastructure.

Let's celebrate simplification and removal

During this last year and a half, we developed several new capabilities and reduced the system's complexity by simplifying solutions, removing non-essential components, and migrating solutions to managed services.

Migrate to managed services

  • We removed all code and toil related to self-managed kubernetes clusters. We migrated all of our kops managed k8s clusters to EKS. This change allowed us to upgrade our cluster easily and remove tons of obsolete code and tooling we use to maintain the clusters. This change also enables other simplifications, such as using EKS managed node groups. This change drastically reduced the toil in the team (less security patching and upgrade, less code to maintain, etc.) and allowed us to make faster changes in our kubernetes infrastructure.
  • Use of EKS managed node groups. We migrated (practically) all k8s node groups to managed groups, which has allowed us to remove all the trouble of patching and maintaining the OS for the nodes. This change improved our security position and allowed us to remove these machines' configuration code.
  • We replaced Prometheus,  Alert Manager, and metric server. To provide a more complete monitoring solution, we integrated our systems with Datadog. Using this managed service, we gave a more effective and easy-to-use monitoring solution and allowed us to remove our ad-hoc internal monitoring solution. We replaced Prometheus, the alert manager, the metric server, and the corresponding services and code. In this case, the most significant benefit is to save us from maintaining, updating, patching, and managing all those components, considering that Datadog already provides us with those functionalities.
  • We migrated the self-managed MongoDB Cluster to MongoAtlas. This change allows us to remove several EC2 instances, the code associated with managing the clusters, and minimize the toil associated with the DBs operations. We also saved a lot of development costs with this change since we were in the process of improving security (enabling encryption at rest) and developing all the necessary tools to scale vertically and horizontally without losing service. With the managed service, all of these features are available without any development.

Remove anything unused

  • We removed several S3 buckets and EC2 machines. After two months of talking with many people from the company, we identified several S3 buckets and EC2 machines without clear usage. By removing the residual use, we could delete the buckets and some machines. This change reduced our monthly costs of AWS by $400-$500 and improved our security as the EC2 machines did not follow the same security rules as the rest of our infrastructure. 
  • We deleted an abandoned tool. I consider this change as a personal victory, and only cost us one year talking and convincing a lot of people :)  It was a small application mainly used on demos that did not have a clear owner and that was not maintained properly. Removing this application allows us also to remove the repository code, the database, the deployment artifacts, and some ad-hoc AWS resources. This change reduced our AWS monthly bill by $150 and removed a security attack vector. Furthermore, we saved a lot of development costs because the framework and the database used by the application had become obsolete, so if we hadn't removed the application, we would have had to update the framework, the database, and the DB driver. As its original developers no longer worked for the company, this would not be easy.
  • We removed commands from our platform slack bot. Last year, we created a slack bot that allows our users to self-service some platform-related operations. We try to follow a modern product development flow, including product discovery. Still, sometimes, some commands seem interesting during the discovery phase, but at the end, they are not very used. In these cases, we removed the commands from the code base, knowing that we can recover the code as a starting point to recreate this command or a similar one in the future. We reduced application maintenance and evolution costs with this simplification, accelerating future developments.
  • We removed platform CLI PoC. Right now, our platform bot is used via slack commands. Some months ago we made a proof of concept to have a local command-line interface to interact with the bot. We see that this wasn’t useful enough yet (due to the type of command available), so we removed the corresponding code to remove the complexity until we detect a more evident opportunity to release a command-line interface for our users. Once the PoC allowed us to learn what we needed, eliminating the code, as in other cases, reduces the cost of maintenance and future evolution
  • We removed Atlantis for Terraform changes. Atlantis allows us to automate the process of generating and approving Pull Requests for the terraform code. In the past, we used this approach to allow the stream-aligned teams to create ECR repositories in a self-service manner. In reality, this process was not working well, there were conflicts between different PRs, issues with the changes due to the lack of knowledge about terraform and infrastructure, etc. In the end, we developed a slack command to create ECR repositories and deleted the support for Atlantis. The elimination of this component has reduced the toil of our team, eliminating some very necessary but low value-added maintenance tasks.

Simplify existing services

  • We removed some k8s node groups. We reduced the complexity of our k8s clusters by simplifying the number and type of node groups. This complexity comes from a premature optimization mixed with some “potential” requirements that never come true. In this case, we recognized the error and reduced the complexity by reducing the number of different node groups we required. In this case, this simplification, in addition to reducing the infrastructure code to maintain, improves the use of the machines with the consequent cost savings. This simplification and some other additional changes have saved us about $ 10K / month.
  • We removed layers of complexity in our Terraform code. Our terraform code was structured in a way that allows us maximum flexibility using several layers of abstraction. In the day-to-day, this implies that any change requires making several changes in different repositories. At the same time, we were not using the flexibility that this structure was supposed to provide and generated conflicts with some AWS account migration that we were doing. We recognized this problem, and for several months we simplified our code to remove tons of unneeded complexity. This change has improved the development speed of the team. It has also made it easier for us to onboard new members.
  • We simplified our Monolith pipeline and release system. As part of the Platform/DevEx mission, we make a process to optimize the monolith release system. The first step was to take ownership, understand the release system, and simplify the related pipelines and release mechanism mercilessly. This initiative has reduced some of the friction in the development process, making our developers more efficient. In spite of the fact that this improvement can be quantified in terms of money, for me, the essential factor is that it has improved the teams' confidence in the release system.

Always fighting against complexity


In addition to all these simplifications, we have also identified other opportunities to reduce the complexity that we will address in the coming year.

For example:

  • Replace one of our mongo database backup systems.
  • Move RabbitMQ and Grafana to managed services.
  • Eliminate the current VPN and replace it with a zero-trust network solution.

Of course, if we find other opportunities to simplify the system, have no one doubt that we will take advantage of them :)

Among other things, the Agile Manifesto says:

  • Continuous attention to technical excellence and good design enhances agility.
  • Simplicity--the art of maximizing the amount of work not done--is essential.
I would like to add to the manifesto:
  • Please, eliminate and simplify mercilessly.


Our profession is about managing and controlling complexity, so let's celebrate and prioritize simplification.

"Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it." - Alan Perlis


References and related content


Thanks

The post has been improved based on feedback from:

Tuesday, January 04, 2022

Good talks/podcasts (January 2022 I)



These are the best podcast/talks I've seen/listen to recently:

  • Avoid These Common Mistakes Junior Developers Make (Dave Farley) [Engineering Career, Inspirational, Software Design] [Duration: 0:18:00] (⭐⭐⭐⭐⭐) A must-see talk. Dave Farley describes 8 common mistakes that junior developers often make and offers his advice on how to avoid them. Whatever your approach to software engineering and software development, whether you are practicing Continuous Delivery, DevOps, or something else, we think that you may find some helpful ideas in this video.
  • Martin Fowler On The Fundamentals Of Software Development | The Engineering Room Ep. 1 (Dave Farley, Martin Fowler) [Agile, Architecture, Architecture patterns] [Duration: 1:13:00] Dave and Martin discuss a wide range of ideas, from new work in patterns in distributed systems and Data Mesh, to the fundamental principles of software development that matter, whatever the technology or problem that you are solving.
  • Engineering Productivity @Google (Michael Bachman) [Devex, Engineering productivity] [Duration: 0:32:00] Interesting talk on how engineering productivity is organized at google
  • Gojko Adzic On How Agile Failed at the BBC and the FBI | The Engineering Room Ep. 3 (Gojko Adzic, Dave Farley) [Engineering Career, Engineering Culture, Product, Product Discovery] [Duration: 1:15:00] Dave and Gojko chat about a wide-ranging series of topics on product development, steering development organisations to success, Palchinsky principles and how agile development failed for the FBI and the BBC.
  • The Principles of Product Development Flow / Small batches podcast (Adam Hawkins) [Flow, Lean Product Management, Lean Software Development, Product Team] [Duration: 0:07:00] (⭐⭐⭐⭐⭐) Super dense and interesting summary of the book "The Principles of Product Development Flow".
  • Time Thieves / Small batches podcast (Adam Hawkins) [Agile, Lean, Lean Product Management, Lean Software Development] [Duration: 0:08:00] A summary of the time thieves as described in Domenica DeGrandis's book "Making Work Visible". Adam explains Too much WIP, unknown dependencies, conflicting priorities, neglected work and interruptions.
Reminder, All these talks are interesting even just listening to them.

Related: