Saturday, November 22, 2025

Scaling Systems and Teams: Five Mental Models for Engineering Leaders

Introduction: It's Not Performance, It's Scalability

In engineering, we often chase performance, but the true challenge of growth lies in scalability. Wikipedia defines scalability as the capability of a system to handle a growing amount of work by adding resources to the system. This is fundamentally different from performance.

Performance measures the speed or latency of a single request—how fast one thing happens. Scalability measures the system's ability to handle an increasing volume of work—how well it maintains its effectiveness as the load grows.

Consider two algorithms. Algorithm 1 (blue line) has excellent initial performance, processing requests quickly under a light load. However, as the load increases, its throughput hits a hard ceiling. Algorithm 2 (red line) starts with lower performance but scales linearly. As the investment of resources or load increases, its throughput continues to rise steadily.

While Algorithm 1 is faster out of the gate, Algorithm 2 is far more scalable. It is the system you want for the long term. This article explores five mental models to help you understand and design for scalability in both your technical systems and your human teams.

The Ideal World: Linear Scalability

Linear scalability is the theoretical ideal. In this perfect world, throughput increases in direct, linear proportion to the resources you add.

  • In Systems: If one database node handles 100 operations per second, adding three more nodes would result in a system that perfectly handles 400 operations per second.
  • In Teams: If a two-person team has a certain capacity, adding two more people would instantly double the team's output.

However, true linear scalability is a myth: it's the stuff of bedtime stories. It assumes 100% efficiency and zero overhead from coordination or shared resources, a condition that never exists in the real world. This fiction provides a useful baseline, but to build effective systems, we must understand why it fails.

The First Bottleneck: Amdahl's Law and the Contention Factor

Amdahl's Law provides the first dose of reality. It introduces the contention factor (α), which represents the portion of a system or process that is inherently serial and cannot be parallelized. This is the part of the workload that creates a queue for a shared resource—the bottleneck.

As you add more resources (like CPUs or team members), the work gets done faster, but only up to a point. The serial, non-parallelizable portion eventually dominates, and the system's throughput levels off, approaching a hard limit or asymptote.

The key takeaway from Amdahl's Law is that the maximum theoretical speedup is limited by this serial portion, defined as 1/α.

  • If just 1% of a process is serial (α = 0.01), you can never make it more than 100x faster, no matter how many resources you throw at it.
  • If 5% is serial (α = 0.05), your maximum speedup is capped at 20x.

Examples of contention are everywhere:

  • In Teams:
    • If you have a specialized team for deployments and operations, you create a bottleneck for all the other teams.
    • Critical tasks like database migrations or specific pull request approvals that could only be done by one or two people created queues and immense pressure on those individuals. These knowledge silos are classic examples of contention.
  • In Systems:
    • A monolithic infrastructure where all processes must compete for the same limited pool of computing resources.
    • Heavy optimization processes where certain calculation steps are inherently sequential, limiting the benefits of adding more parallel workers.

The Hidden Tax: The Universal Scalability Law (USL) and the Coherence Factor

The Universal Scalability Law (USL) builds on Amdahl's Law by introducing a second, more insidious factor: the coherence factor (β). This represents the cost of coordination: the overhead required for parallel processes to communicate and maintain a consistent, shared view of the system. It's the time spent "getting on the same page."

The critical insight of USL is that after a certain point, adding more resources can actually make the system slower. The graph of throughput no longer just flattens out; it peaks and then begins to decline.

This happens because the coordination overhead grows quadratically. The number of potential communication pathways between N workers is N*(N-1). As you add more nodes or people, the cost of keeping everyone in sync explodes, eventually outweighing the benefit of the extra workers.

Examples of coherence costs include:

  • In Teams:
    • Very large teams where decision-making requires consensus from everyone, leading to endless meetings and slowing down progress.
    • High levels of dependencies between teams that force constant coordination and block work from being completed independently.
    • It's often said that to scale, we need to communicate better. This is true, but counter-intuitively, it often means communicating less. The goal isn't more meetings, but rather to establish shared context, clear principles, and a strong culture so that less ad-hoc communication is needed. This reduces the coherence penalty and allows teams to operate more autonomously.
  • In Systems:
    • The Nextail BI Subsystem provided a powerful lesson in avoiding coherence costs. To calculate a specific metric, two independent, parallel processes each needed the result of a shared computation. The surprising lesson was that it was more scalable to have each process perform the exact same calculation independently—duplicating work—than to incur the quadratic communication penalty required to coordinate and share the result.

The Peril of 100% Busy: Insights from Queueing Theory

Queueing Theory provides a model for understanding wait times and the impact of system utilization. Its core lesson is stark: as a system's utilization pushes past approximately 80%, the wait time for new tasks increases exponentially.

This behavior creates three distinct regimes of system health:

  1. Everything is okay: At low utilization, the system is responsive.
  2. Oh wait...: As utilization approaches the "knee" of the curve, delays become noticeable.
  3. F**k: At high utilization, the system collapses, and wait times approach infinity.

This degradation is made drastically worse by variability. The curve for high-variability systems (the blue line in the graph below) shows that wait times begin to explode at a much lower utilization threshold (e.g., 40-50%) compared to low-variability systems (the green line). A queue that handles a mix of very short tasks (2 minutes) and very long tasks (2 hours) will collapse much sooner. A 2-minute job stuck behind a 2-hour job creates an unacceptable experience.

Practical applications of this theory include:

  • In Teams: The anti-pattern of a centralized Operations Team that becomes a single, high-variability queue for all other development teams is a recipe for bottlenecks. A better model is to embed operations capabilities within each team, making them self-sufficient. Similarly, organizing teams end-to-end (e.g., by product feature) instead of by technology (front-end vs. back-end) creates self-sufficient units that don't need to queue up for another team to finish their work.
  • In Systems: Moving from a single job queue (monoqueue) to multiple, specialized queues is a common strategy. By separating long-running jobs from short, interactive ones, you reduce the variability within any single queue, ensuring that quick tasks aren't starved by resource-intensive ones.

To Go Faster, Slow Down: Little's Law

The final mental model, Little's Law, offers a simple but profound relationship between throughput, work-in-progress, and completion time. The formula is:

Lead Time = Work in Progress (WIP) / Throughput

  • Lead Time: The average time it takes for a task to be completed.
  • Work in Progress (WIP): The number of tasks being worked on simultaneously.
  • Throughput: The average rate at which tasks are completed.

The counter-intuitive implication is powerful: for a given team or system throughput, the only way to reduce the average time it takes to complete a task (Lead Time) is to reduce the number of tasks being worked on at the same time (WIP). To go faster, you must start less and finish more.

Practical applications of Little's Law include:

  • Teams/Processes:
  • Set explicit and low WIP limits to force teams to focus on finishing tasks before starting new ones.
  • Prioritize flow optimization (getting single items done quickly) over resource optimization (keeping everyone 100% busy).
  • Embrace practices like pair programming, which focuses the energy of two people on a single task. This is a direct application of flow optimization, designed to finish one piece of work much faster, thereby reducing the total WIP and shortening the overall lead time for features.
  • Build a self-service platform that empowers all teams to perform tasks like deployments or database migrations. This increases the entire organization's throughput without creating a centralized bottleneck team.

Conclusion: From Theory to Practice

These five mental models (Linear Scalability, Amdahl's Law, USL, Queueing Theory, and Little's Law) provide a powerful vocabulary for reasoning about growth. The goal isn't to memorize formulas, but to use these concepts to facilitate better conversations and design decisions.

A practical framework I find very useful for thinking about scalability is:

  • Design for 2x the current size or client load. This keeps the immediate solution robust.
  • Consider what 20x would require. Would the current architecture or technology still hold?
  • Brainstorm what 100x would mean. This exercise helps uncover fundamental limitations that may require a completely different approach in the future.

Ultimately, a core strategy for managing scale is to break down a large problem into smaller, independent subsystems. By doing so, you can keep each component operating in the "happy," efficient part of its scalability curve. This is a strategic trade-off: solving a scaling problem at one level intentionally creates a new, higher-level problem of coherence between those subsystems. But this is the fundamental and proven pattern for building systems and organizations that can gracefully handle growth.


Sunday, November 09, 2025

Pseudo TDD with AI

Exploring Test-Driven Development with AI Agents

Over the past few months, I've been experimenting with a way to apply Test-Driven Development (TDD) by leveraging artificial intelligence agents. The goal has been to maintain the essence of the TDD process (test, code, refactor) while taking advantage of the speed and code generation capabilities that AI offers. I call this approach Pseudo TDD with AI.

How the Process Works

The AI agent follows a set of simple rules:

  1. Write a test first.
  2. Run the test and verify that it fails.
  3. Write the production code.
  4. Run the tests again to verify that everything passes.

I use the rules I defined in my base setup for augmented coding with AI. With these base rules, I can get both the Cursor agent and Claude Code to perform the TDD loop almost completely autonomously.

The refactoring part is not included automatically. Instead, I request it periodically as I observe how the design evolves. This manual control allows me to adjust the design without slowing down the overall pace of work.

Confidence Level and Limitations

The level of confidence I have in the code generated through this process is somewhat lower than that of TDD done manually by an experienced developer. There are several reasons for this:

  • Sometimes the agent doesn't follow all the instructions exactly and skips a step.
  • It occasionally generates fewer tests than I would consider necessary to ensure good confidence in the code.
  • It tends to generalize too early, creating production code solutions that cover more cases than have actually been tested.

Despite these issues, the process is very efficient and the results are usually satisfactory. However, it still doesn't match the confidence level of fully human-driven TDD.

Supporting Tools

To compensate for these differences and increase confidence in the code, I rely on tools like Mutation Testing. This technique has proven very useful for detecting parts of the code that weren't adequately covered by tests, helping me strengthen the reliability of the process.

Alternative Approaches Explored

In the early phases of experimentation, I tried a different approach: directing the TDD process myself within the chat with the AI, step by step. It was a very controlled flow:

"Now I want a test for this."
"Now make it pass."
"Now refactor."

This method made the process practically equivalent to traditional human TDD, as I had complete control over every detail. However, it turned out to be slower and didn't really leverage the AI's capabilities. In practice, it worked more as occasional help than as an autonomous process.

Next Steps

From the current state of this Pseudo TDD with AI, I see two possible paths forward:

  1. Adjust the rules and processes so the flow comes closer to human TDD while maintaining AI speed.
  2. Keep the current approach while observing and measuring how closely it actually approximates a traditional TDD process.

In any case, I'll continue exploring and sharing any progress or learnings that emerge from this experiment. The goal is to keep searching for that balance point between efficiency and confidence that collaboration between humans and AI agents can offer.

Related Content

My Base Setup for Augmented Coding with AI

Repository: eferro/augmentedcode-configuration

Over the last months I've been experimenting a lot with AI-augmented coding — using AI tools not as replacements for developers, but as collaborators that help us code faster, safer, and with more intention.

Most of the time I use Cursor IDE, and I complement it with command-line agents such as Claude Code, Codex CLI, or Gemini CLI.

To make all these environments consistent, I maintain a small open repository that serves as my base configuration for augmented coding setups:

👉 eferro/augmentedcode-configuration

Purpose

This repository contains the initial configuration I usually apply whenever I start a new project where AI will assist me in writing or refactoring code.

It ensures that both Cursor and CLI agents share the same base rules and principles — how to write code, how to take small steps, how to structure the workflow, etc.

In short: it's a simple but powerful way to keep my augmented coding workflow coherent across tools and projects.

Repository structure

augmentedcode-configuration/
├── .agents/
│   └── rules/
│       ├── base.md
│       └── ai-feedback-learning-loop.md
├── .cursor/
│   └── rules/
│       └── use-base-rules.mdc
├── AGENTS.md
├── CLAUDE.md
├── GEMINI.md
├── codex.md
└── LICENSE


.agents/rules/base.md

This is the core file — it defines the base rules I use when coding with AI.

These rules describe how I want the agent to behave:

  • Always work in small, safe steps
  • Follow a pseudo-TDD style (generate a test, make it fail, then implement)
  • Keep code clean and focused
  • Prefer clarity and maintainability over cleverness
  • Avoid generating huge chunks of code in one go

At the moment, these rules are slightly tuned for Python, since that's the language I use most often. When I start a new project in another language, I simply review and adapt this file.

🔗 View .agents/rules/base.md


.agents/rules/ai-feedback-learning-loop.md

This file defines a small feedback and learning loop that helps me improve the rule system over time.

It contains guidance for the AI on how to analyze the latest session, extract insights, and propose updates to the base rules.

In practice, I often tell the agent to "apply the ai-feedback-learning-loop.md" to distill the learnings from the working session, so it can generate suggestions or even draft changes to the rules based on what we learned together.

🔗 View .agents/rules/ai-feedback-learning-loop.md


.cursor/rules/use-base-rules.mdc

This small file tells Cursor IDE to use the same base rules defined above.

That way, Cursor doesn't have a separate or divergent configuration — it just inherits from .agents/rules/base.md.

🔗 View .cursor/rules/use-base-rules.mdc


AGENTS.md, CLAUDE.md, GEMINI.md, codex.md

Each of these files is simply a link (or reference) to the same base rules file.

This trick allows all my CLI agentsClaude Code, Codex, Gemini CLI, etc. — to automatically use the exact same configuration.

So regardless of whether I'm coding inside Cursor or launching commands in the terminal, all my AI tools follow the same guiding principles.

🔗 AGENTS.md
🔗 CLAUDE.md
🔗 GEMINI.md
🔗 codex.md


How I use it

Whenever I start a new project that will involve AI assistance:

  1. Clone or copy this configuration repository.
  2. Ensure that .agents/rules/base.md fits the project's language (I tweak it if I'm not working in Python).
  3. Connect Cursor IDE — it will automatically load the rules from .cursor/rules/use-base-rules.mdc.
  4. When using Claude Code, Codex, or Gemini CLI, they all read the same base rules through their respective .md links.
  5. During or after a session, I often run the AI Feedback Learning Loop by asking the agent to apply the ai-feedback-learning-loop.md so it can suggest improvements to the rules based on what we've learned.
  6. Start coding interactively: I ask the AI to propose small, incremental changes, tests first when possible, and to verify correctness step by step.

This results in a workflow that feels very close to TDD, but much faster. I like to call it pseudo-TDD.

It's not about strict process purity; it's about keeping fast feedback loops, learning continuously, and making intentional progress.

Why this matters

When working with multiple AI agents, it's surprisingly easy to drift into inconsistency — different styles, different assumptions, different "personalities."

By having one shared configuration:

  • All tools follow the same Lean/XP-style principles.
  • The workflow remains consistent across environments.
  • I can evolve the base rules once and have every agent benefit from it.
  • It encourages me (and the agents) to think in small steps, test early, and refactor often.
  • The feedback learning loop helps evolve the rule system organically through practice.

It's a small setup, but it supports a big idea:

"Augmented coding works best when both human and AI share the same working agreements — and continuously improve them together."

Adapting it

If you want to use this configuration yourself:

  1. Fork or clone eferro/augmentedcode-configuration.
  2. Adjust .agents/rules/base.md for your preferred language or conventions.
  3. Point your IDE or CLI agents to those files.
  4. Use .agents/rules/ai-feedback-learning-loop.md to help your agents reflect on sessions and evolve the rules.
  5. Experiment — see how it feels to work with a single, unified, and self-improving set of rules across AI tools.

Next steps

In an upcoming post, I'll share more details about the pseudo-TDD workflow I've been refining with these agents — how it works, what kinds of tests are generated, and how it compares to traditional TDD.

For now, this repository is just a small foundation — but it's been incredibly useful for keeping all my AI coding environments consistent, adaptive, and fast.

Related Content

Mutation Testing: When "Good Enough" Tests Weren't

For weeks, I had been carrying this nagging doubt. The kind of doubt that's easy to ignore when everything is working. My inventory application had 93% test coverage, all tests green, type checking passing. The code had been built with TDD from day one, using AI-assisted development with Claude, Cursor (with Sonnet 4.5, GPT-4o, and Claude Composer), what I like to call "vibecoding". Everything looked solid.

It's not a big application. About 650 lines of production code. 203 tests. A small internal tool for tracking teams and employees. The kind of project where you might think "good enough" is actually good enough.

But something was bothering me.

I had heard about mutation testing years ago. I even tried it once or twice. But let's be honest: it always felt like overkill. The setup was annoying, the output was overwhelming, and the juice rarely seemed worth the squeeze. You had to be really committed to quality (or really paranoid) to go through with it.

This time, though, with AI doing the heavy lifting, I decided to give it another shot.

The First Run: 726 Mutants

I added mutmut to the project and configured it with AI's help. Literally minutes of work. Then I ran it:

$ make test-mutation
Running mutation testing
726/726  🎉 711  ⏰ 0  🤔 0  🙁 0  🔇 15  🔴 0
33.50 mutations/second

Not bad. 711 mutants killed out of 726. That's 97.9% mutation score. I felt pretty good about it.

Until I looked at those 15 survivors.

The 15 Survivors

I ran the summary command to see what had survived:

$ make test-mutation-summary
Total mutants checked: 15
Killed (tests caught them): 0
Survived (gaps in coverage): 15

=== Files with most coverage gaps ===
    5 inventory.services.role_config_service
    4 inventory.services.orgportal_sync_service
    2 inventory.infrastructure.repositories.initiative
    1 main.x create_application__mutmut_6: survived
    1 inventory.services.orgportal_sync_service.x poll_for_updates__mutmut_6: survived
    1 inventory.db.gateway
    1 inventory.app_setup.x include_application_routes__mutmut_33: survived

There they were. Fifteen little gaps in my test coverage. Fifteen cases where my tests weren't as good as I thought.

And remember: this is a 650-line application with 203 tests. If I found 15 significant gaps here, what would I find in a 10,000-line system? Or 100,000?

The thing is, a few months ago, this would have been the end of the story. I would have looked at those 15 surviving mutants, felt slightly guilty, and moved on. The effort to manually analyze each mutation, understand what it meant, and write the specific tests to kill it would have taken days. Maybe a week.

Not worth it for a small internal tool.

But this time was different.

What the Mutants Revealed

Before jumping into fixes, I wanted to understand what these surviving mutants were actually telling me. With AI's help, I analyzed them systematically.

Here's what we found:

In role_config_service (5 survivors):
The service loaded YAML configuration for styling team roles. My tests verified that the service loaded the config and returned the right structure. But they never checked what happened when:

  • The YAML file was missing
  • The YAML was malformed
  • Required fields were absent

The code had error handling for all these cases. My tests didn't verify any of it.

In orgportal_sync_service (4 survivors):
This service synced data from S3. Tests covered the happy path: download file, process it, done. But mutants survived when we:

  • Changed log messages (I wasn't verifying logs)
  • Skipped metadata checks (last_modified, content_length)
  • Removed directory existence checks

The code was defensive. My tests assumed everything would go right.

In database and infrastructure layers (6 survivors):
Similar story. Error paths that existed in production but were never exercised in tests:

  • SQLite connection failures
  • Invalid data in from_db_row factories
  • 404 responses in API endpoints

Classic case of "it works, so I'm not testing the error cases."

The pattern was clear: I had good coverage of normal flows, but my tests were optimistic. They assumed the happy path and left the defensive code untested.

This is what deferred quality looks like at the micro level. Like Deming's red bead experiment (where defects came from the system, not the workers), these weren't random failures. They were systematic gaps in how I verified the system. Every surviving mutant is a potential bug waiting in production, interrupting flow when it surfaces weeks later. The resource efficiency trap: "we already have 93% coverage" feels cheaper than spending 2-3 hours... until you spend days debugging a production issue that a proper test would have caught.

The AI-Powered Cleanup

But this time I had AI. So I did something different.

I asked Claude to analyze the surviving mutants one by one, understand what edge cases they represented, and create or modify tests to cover them. I just provided some guidance on priorities and made sure the new tests followed the existing style.

(The app itself had been built using a mix of tools: Claude for planning and architecture, Cursor with different models for implementation. But for this systematic mutation analysis, Claude's reasoning capabilities were particularly useful.)

In about two or three hours, we had addressed all the key gaps:

  • SQLite error handling: I thought I was testing error paths, but I was only testing the happy path. Added proper error injection tests.
  • Factory method validation: My from_db_row factories had validation that was never triggered in tests. Added tests with invalid data.
  • Edge cases in services: Empty results, missing metadata, nonexistent directories. All cases my code handled but my tests never verified.
  • 404 handling in APIs: The code worked, but no test actually verified the 404 response.

The result after several iterations:

$ make test-mutation
Running mutation testing
726/726  🎉 724  ⏰ 0  🤔 0  🙁 2  🔇 0
30.02 mutations/second
$ make test-mutation-summary
Total mutants checked: 2
Killed (tests caught them): 0
Survived (gaps in coverage): 2

=== Files with most coverage gaps ===
    1 inventory.services.role_config_service
    1 inventory.db.gateway

From 15 surviving mutants down to 2. From 97.9% to 99.7% mutation score.

The coverage numbers told a similar story:

Coverage improvements:
- database_gateway.py: 92% → 100%
- teams_api.py: 85% → 100%
- role_config_service.py: 86% → 100%
- employees_api.py: 95% → 100%
- Overall: 93% → 99%
- Total tests: 203 passing

The Shift in Economics

Here's what struck me about this experience: the effort-to-value ratio had completely flipped.

Before AI, mutation testing was something you did if:

  • You had a critical system where bugs were expensive
  • You had a mature team with time to invest
  • You were willing to spend days or weeks on it
  • The application was large enough to justify the investment

For a 650-line internal tool? Forget about it. The math never worked out.

Now? The math is different. The AI did all the analysis work. I just had to review and approve. What used to take days took hours. And most of that time was me deciding priorities, not grinding through mutations.

The barrier to rigorous testing has dropped dramatically. And it doesn't matter if your codebase is 650 lines or 650,000. The cost per mutant is the same.

The Question That Remains

I've worked in teams that maintained sustainable codebases for years. I know what that forest looks like (to use Kent Beck's metaphor). I also know how much discipline, effort, and investment it took to stay there.

Now I'm seeing that same level of quality becoming accessible at a fraction of the cost. Tests that used to require days of manual work can be generated in hours. Mutation testing that was prohibitively expensive is now just another quick pass.

The technical barrier is gone.

So here's the question I'm left with: now that mutation testing costs almost nothing, will we actually use it? Will teams that never had the resources to invest in this level of testing quality start doing it?

Or will we find new excuses?

Because the old excuse ("we don't have time for that level of rigor") doesn't really work anymore. The time cost has collapsed. The tooling is there. The AI can do the heavy lifting.

What's left is just deciding to do it. And knowing that it's worth it.

What I Learned

Three concrete takeaways from this experience:

1. Line coverage lies, even in small codebases: 93% coverage looked great until mutation testing showed me the gaps. Those 15 surviving mutants were in critical error handling paths. After fixing them, I still had 99% line coverage. But now the tests actually verified what they claimed to test. If a 650-line application had 15 significant gaps, imagine larger systems.

2. AI makes rigor accessible for any project size: What used to be prohibitively expensive (manual mutation analysis) is now quick and almost frictionless. The economics have changed. From 15 survivors to 2 in just a few hours of work, most of it done by AI. This level of rigor is no longer reserved for critical systems. It's accessible for small internal tools too.

3. 99.7% is good enough: After the cleanup, I'm left with 2 surviving mutants out of 726. Could I hunt them down? Sure. Is it worth it? Probably not. They're edge cases in utility code that's already well-tested. The point isn't perfection. It's knowing where your gaps are and making informed decisions about them.

The real win isn't the numbers. It's the confidence. I now know exactly which 2 mutants survive and why. That's very different from having 93% coverage and hoping it's good enough.

This was a small project. If it had been bigger, I probably would have skipped mutation testing entirely (too expensive, too time-consuming). But now? Now I can't think of a good reason not to do it. Not when it costs almost nothing and reveals so much.

I used to think mutation testing was for perfectionists and critical systems only. Now I think it should be standard practice for any codebase you plan to maintain for more than a few months.

Not because it's perfect. But because it's no longer expensive.

And when the cost drops to almost zero, the excuses should too.

The AI Prompt That Worked

When facing surviving mutants, this single prompt did most of the heavy lifting:

"Run mutation testing with make test-mutation. For each surviving mutant, use make test-mutation-show MUTANT=name to see the details. Analyze what test case is missing and create tests to kill these mutants, following the existing test style. After adding tests, run make test-mutation again to verify they're killed. Focus on the top 5-10 most critical gaps first: business logic, error handling, and edge cases in services and repositories."

The key: let the AI drive the mutation analysis loop while you focus on reviewing and prioritizing.

Getting Started

If you want to try this:

  1. Add mutmut to your project (5 minutes with AI help)
  2. Create simple Makefile targets to make it accessible for everyone:
    • make test-mutation - Run the full suite
    • make test-mutation-summary - Get the overview
    • make test-mutation-report - See which mutants survived
    • make test-mutation-show MUTANT=name - Investigate specific cases
    • make test-mutation-clean - Reset when needed
  3. Run it weekly, not on every commit (mutation testing is slow)
  4. Use AI to triage survivors (ask it to analyze and prioritize)
  5. Review the top 5-10 gaps as a pair, decide which matter
  6. Start with one critical module, not the whole codebase

Making it easy to run is as important as setting it up. The barrier is gone. What's stopping you?

When NOT to chase 100%: Those final 2 surviving mutants? They're in logging and configuration defaults that are battle-tested in production. Perfect mutation score isn't the goal. Knowing your gaps is. Focus on business logic and error handling first. Skip trivial code.


About This Project

This application was developed using TDD and AI-assisted development with Claude code and Cursor (using Sonnet 4.5, GPT-5 codex, and Composer1). The mutation testing setup and gap analysis were done with Claude's help using mutmut.

Timeline: The entire mutation testing setup and gap analysis took about 2-3 hours with AI assistance.

Final stats: 649 statements, 208 tests, 99% line coverage, 726 mutants tested, 724 killed (99.7% mutation score).

Related Reading

Monday, November 03, 2025

When AI Makes Good Practices Almost Free

Since I started working with AI agents, I've had a feeling that was hard to explain. It wasn't so much that AI made work faster or easier, but something harder to pin down: the impression that good practices were much easier to apply and that most of the friction to introduce them had disappeared. That many things that used to require effort, planning, and discipline now happened almost frictionlessly.

That intuition had been haunting me for weeks, until this week, in just three or four days, two very concrete examples put it right in front of me.

The Small Go Application

This week, a colleague reached out to tell me that one of the applications I had implemented in Go didn't follow the team's architecture and testing conventions. They were absolutely right: I hadn't touched Go in years and, honestly, I didn't know the libraries we were using. So I did what I could, leaning heavily on AI to get a quick first version as a proof of concept to validate an idea.
The thing is, my colleague sent me a link to a Confluence page with documentation about architecture and testing, and also a link to another Go application I could use as a reference.

A few months ago, changing the entire architecture and testing libraries would have been at least a week of work. Probably more. But in this case, with AI, I had it completely solved in just two or three hours. Almost without realizing it.

I downloaded the reference application and asked the AI to read the Confluence documentation, analyze the reference application, and generate a transformation plan for my application. Then I just asked it to apply the plan, no adjustments needed, just small interactions to decide when to make commits or approve some operations. In just over two hours, and barely paying attention, I had the entire architecture changed to hexagonal and all the tests updated to use other libraries. It felt almost effortless.

It was a small app, maybe 2000 to 3000 lines of code and around 50 tests, but still, without AI, laziness would have won and I would have only done it if it had been absolutely essential.

The cost of keeping technical coherence across applications has dropped dramatically. What used to take serious effort now happens almost by itself.

The Testing That Stopped Hurting

A few days later, I encountered another similar situation, this time in Python. Something was nagging at me: some edge cases weren't well covered by the tests. I decided to use mutmut, a mutation testing library I'd tried years ago but usually skipped because the juice rarely seemed worth the squeeze.

This time I threw in the library, got it configured in minutes with AI's help, and then I basically went on autopilot: I simply generated the mutations and told the AI to go, one by one, analyzing the mutations and creating or modifying the necessary tests to cover those cases. This process required almost no effort from me. The AI was doing all the heavy lifting. I just prioritized a few cases and gave the tests a quick once-over, simply to check that they followed the style of the others.

In a couple of hours, the change in feeling was complete. Night and day. My confidence in the project's tests had shot up and the effort? Practically nothing.

The Intuition That Became Visible

These two examples, almost back-to-back, confirmed the intuition I had been carrying since I started working with AI agents: the economy of effort is changing. Radically.

Refactoring, keeping things coherent, writing solid tests, documenting decisions... None of that matters less now. What has changed is its cost. And when the cost drops to nearly zero, the excuses should vanish too.

If time and effort aren't the issue anymore, why do we keep falling into the same traps? Why do we keep piling on debt and complexity we don't need?

Perhaps the problem isn't technical. Perhaps the problem is that many teams have never really seen what sustainable code looks like, have never experienced it firsthand. They've lived in the desert so long they've forgotten what a forest looks like. Or maybe they never knew in the first place.

Beth Andres-Beck and Kent Beck use the forest and desert metaphor to talk about development practice. The forest has life, diversity, balance. The desert? Just survival and scarcity.

For years I've worked in the forest. I've lived it. I know it's possible, I know it works, and I know it's the right way to develop software. But I also know that building and maintaining that forest was an expensive discipline. Very expensive. It took mature teams, time, constant investment, and a company culture that actually supported it.

Now, with AI and modern agents, building that forest costs almost the same as staying in the desert. The barrier has dropped dramatically. The barrier isn't effort or time anymore. It's just deciding to do it and knowing how.

The question I'm left with is no longer whether it's possible to build sustainable software. I've known that for years. The question is: now that the cost has disappeared, will we actually seize this opportunity? Will we see more teams moving into that forest that used to be out of reach?

Related Content

Sunday, October 26, 2025

Keynote: Desapego radical en la era de la IA

Ayer sábado 25 de octubre tuve el honor de dar la keynote de apertura en Barcelona Software Crafters 2025 con la charla "Desapego radical en la era de la IA".

Para mí ha sido la charla más importante que he dado. Barcelona Software Crafters es la comunidad de software crafters que más respeto de las que conozco, la que más ha hecho por la profesión y la que más me ha enseñado todas las veces que he podido participar en sus conferencias u otras actividades. Así que dar la keynote de apertura para mí ha sido un grandísimo honor y una responsabilidad enorme, puesto que estamos en un momento de cambio brutal en nuestra profesión y entiendo que la comunidad software crafters y la comunidad agile (la de verdad) tenemos una gran oportunidad para reinventar esta profesión, adaptándonos, aprendiendo en comunidad y consiguiendo tener incluso más impacto.

🎥 El vídeo

Aquí puedes ver la charla completa. Gracias a Sirvidiendo Codigo por la grabación y por el increíble trabajo que hacen compartiendo contenido de calidad con la comunidad.



🔗 También puedes verlo directamente en YouTube: Desapego radical en la era de la IA

Y no te pierdas el canal de Sirvidiendo Codigo donde encontrarás muchas más charlas y contenido valioso sobre desarrollo de software.

📊 Las slides


🔗 Si prefieres verlas directamente en Google Slides, aquí tienes el enlace: Desapego radical en la era de la IA



¿De qué va la charla?


Estamos en un momento de cambio brutal en nuestra profesión. La IA nos ofrece una velocidad sobrehumana, pero también puede generar más complejidad si no cambiamos nuestra forma de trabajar.

La idea central es la necesidad de adoptar una mentalidad de "desapego radical" para poder explorar, aprender y adaptarnos. Algunos puntos clave:

- La IA exige reinventar el desarrollo de software con un product mindset fundamental.
- Debemos desapegarnos de lo conocido para explorar nuevas posibilidades, difuminando los roles.
- La IA amplifica el impacto de las buenas prácticas de ingeniería.
- La comunidad Agile es esencial para esta reinvención y el aprendizaje colaborativo.

Las slides tienen bastantes notas con ejemplos y reflexiones adicionales. Te recomiendo echarles un vistazo mientras esperamos al vídeo 😉

El feedback del público


El feedback fue bastante bueno y hablé con mucha gente sobre el tema durante el resto del día. Un par de personas me dijeron que les había hecho reflexionar y que iban a tomar acciones. En general, muy positivo. Estoy pendiente de recibir el feedback recogido por los organizadores en el formulario oficial.

Un abrazo enorme para el equipo de Barcelona Software Crafters por invitarme y por crear esta comunidad tan especial. El ambiente, las conversaciones y la energía fueron increíbles.

Si estuviste en la charla, ¡escríbeme! Me encantará seguir hablando, responder dudas o compartir ideas.

Si te lo perdiste, date una vuelta por las slides y compártelas con quien creas que le puede venir bien 😉

Nos vemos en la próxima. ¡A seguir aprendiendo en comunidad!

Saturday, October 11, 2025

Good talks/podcasts (Oct I) / AI & AI - Augmented Coding Edition!

These are the best podcasts/talks I've seen/listened to recently: Reminder: All of these talks are interesting, even just listening to them.

You can now explore all recommended talks and podcasts interactively on our new site: The new site allows you to:
  • 🏷️ Browse talks by topic
  • 👤 Filter by speaker
  • 🎤 Search by conference
  • 📅 Navigate by year
Feedback Welcome!
Your feedback and suggestions are highly appreciated to help improve the site and content. Feel free to contribute or share your thoughts!
Related:

Friday, October 10, 2025

AI and Lean Software Development: Reflections from Experimentation

Exploring how artificial intelligence might be changing the rules of the game in software development - preliminary insights from the trenches

An exploration into uncharted territory

I want to be transparent from the start: what I'm about to share are not definitive conclusions or proven principles. These are open reflections that emerge from a few months of intense personal experimentation with AI applied to software development, exploring its possibilities and trying to understand how this affects the Lean practices we typically use.

These thoughts are not definitive conclusions based on prolonged experience, but rather open reflections that I would like to continue experimenting with and discussing with others interested in this fascinating topic. I'm not speaking as someone who already has the answers, but as someone exploring fascinating questions and suspecting that we're facing a paradigm shift that we're only beginning to understand.

The fundamental paradox: speed versus validation

A central idea I'm observing is that, although artificial intelligence allows us to work faster, this doesn't mean we should automatically expand the initial scope of our functionalities. My intuition tells me that we should continue delivering value in small increments, validate quickly, and decide based on real feedback rather than simply on the speed at which we can now execute tasks.

But there's an interesting nuance I've started to consider: in low-uncertainty contexts, where both the value and implementation are clear and the team is very confident, it might make sense to advance a bit more before validating. However, my feelings lead me to think that maintaining discipline to avoid falling into speculative design is fundamental, because although AI facilitates it, it can jeopardize the simplicity and future flexibility of the system.

The cognitive crisis we don't see coming

Chart: While development speed with AI grows exponentially, our human cognitive capacity remains constant, creating a "danger zone" where we can create complexity faster than we can manage it.

Here I do have a conviction that becomes clearer every day: we should now be much more radical when it comes to deleting and eliminating code and functionalities that aren't generating the expected impact.

What this visualization shows me is something I feel viscerally: we have to be relentless to prevent complexity from devouring us, because no matter how much AI we have, human cognitive capacity hasn't changed - both for managing technical complexity and for users to manage the growing number of applications and functionalities.

We're at that critical point where the blue line (AI speed) crosses the red line (our capacity), and my intuition tells me that either we develop radical disciplines now, or we enter that red zone where we create more complexity than we can handle.

The paradox of amplified Lean

But here's the crux of the matter, and I think this table visualizes it perfectly:

Table: AI eliminates the natural constraints that kept us disciplined (at least some of us), creating the paradox that we need to artificially recreate those constraints through radical discipline.

This visualization seems to capture something fundamental that I'm observing: AI eliminates the natural constraints that kept us applying Lean principles. Before, the high cost of implementation naturally forced us to work in small batches. Now we have to recreate that discipline artificially.

For example, look at the "Small Batches" row: traditionally, development speed was the natural constraint that forced us to validate early. Now, with AI, that brake disappears and we risk unconscious scope growth. The countermeasure isn't technical, it's cultural: explicitly redefining what "small" means in terms of cognitive load, not time.

The same happens with YAGNI: before, the high cost of implementation was a natural barrier against speculative design. Now AI "suggests improvements" and makes overengineering tempting and easy. The answer is to make YAGNI even more explicit.

This is the paradox that fascinates me most: we have to become more disciplined precisely when technology makes it easier for us.

From this general intuition, I've identified several specific patterns that concern me and some opportunities that excite me. These are observations that arise from my daily experimentation, some clearer than others, but all seem relevant enough to share and continue exploring.

About scope and complexity

Change in the "default size" of work

AI facilitates the immediate development of functionalities or refactors, which can unconsciously lead us to increase their size. The risk I perceive is losing the discipline of small batch size crucial for early validation.

Ongoing exploration: My intuition suggests explicitly redefining what "small" means in an AI context, focused on cognitive size and not just implementation time. One way to achieve this is by relying on practices like BDD/ATDD/TDD to limit each cycle to a test or externally validable behavior.

Amplified speculative design

On several occasions I've had to undo work done by AI because it tries to do more than necessary. I've observed that AI lacks sensitivity to object-oriented design and has no awareness of the complexity it generates, creating it very quickly until reaching a point where it can't escape and enters a loop, fixing one thing and breaking others.

Reflection: This suggests reinforcing deliberate practices like TDD, walking skeletons, or strict feature toggles.

New type of "overengineering"

My initial experience suggests that the ease AI offers can lead to adding unnecessary functionalities. It's not the classic overengineering of the architect who designs a cathedral when you need a cabin. It's more subtle: it's adding "just one more feature" because it's easy, it's creating "just one additional abstraction" because AI can generate it quickly.

Key feeling: Reinforcing the YAGNI principle even more explicitly seems necessary.

About workflow and validations

Differentiating visible work vs. released work

My experience indicates that rapid development shouldn't confuse "ready to deploy" with "ready to release." My feeling is that keeping the separation between deployment and release clear remains fundamental.

I've also developed several times small functionalities that then weren't used. Although, to be honest, since I have deeply internalized eliminating waste and baseline cost, I simply deleted the code afterwards.

Opportunity I see: AI can accelerate development while we validate with controlled tests like A/B testing.

More work in progress, but with limits

Although AI can allow more parallel work, my intuition tells me this can fragment the team's attention and complicate integration. It's tempting to have three or four features "in development" simultaneously because AI makes them progress quickly.

My current preference: Use AI to reduce cycle time per story, prioritizing fast feedback, instead of parallelizing more work.

Change in the type of mistakes we make

My observations suggest that with AI, errors can propagate quickly, generating unnecessary complexity or superficial decisions. A superficial decision or a misunderstanding of the problem can materialize into functional code before I've had time to reflect on whether it's the right direction.

Exploration: My intuition points toward reinforcing cultural and technical guardrails (tests, decision review, minimum viable solution principle).

About culture and learning

Impact on culture and learning

I feel there's a risk of over-relying on AI, which could reduce collective reflection. Human cognitive capacity hasn't changed, and we're still better at focusing on few things at a time.

Working intuition: AI-assisted pair programming, ownership rotations, and explicit reviews of product decisions could counteract this effect.

Ideas I'm exploring to manage these risks

After identifying these patterns, the natural question is: what can we do about it? The following are ideas I'm exploring, some I've already tried with mixed results, others are working hypotheses I'd like to test. To be honest, we're in a very embryonic phase of understanding all this.

Discipline in Radical Elimination My intuition suggests introducing periodic "Deletion Reviews" to actively eliminate code without real impact. Specific sessions where the main objective is to identify and delete what isn't generating value.

"Sunset by Default" for experiments The feeling is that we might need an explicit automatic expiration policy for unvalidated experiments. If they don't demonstrate value in X time, they're automatically eliminated, no exceptions.

More rigorous Impact Tracking My experience leads me to think about defining explicit impact criteria before writing code and ruthlessly eliminating what doesn't meet expectations in the established time.

Fostering a "Disposable Software" Mentality My feeling is that explicitly labeling functionalities as "disposable" from the start could psychologically facilitate elimination if they don't meet expectations.

Continuous reduction of "AI-generated Legacy" I feel that regular sessions to review automatically generated code and eliminate unnecessary complexities that AI introduced without us noticing could be valuable.

Radically Reinforcing the "YAGNI" Principle My intuition tells me we should explicitly integrate critical questions in reviews to avoid speculative design: "Do we really need this now? What evidence do we have that it will be useful?"

Greater rigor in AI-Assisted Pair Programming My initial experience suggests promoting "hybrid Pair Programming" to ensure sufficient reflection and structural quality. Never let AI make architectural decisions alone.

A fascinating opportunity: Cross Cutting Concerns and reinforced YAGNI

Beyond managing risks, I've started to notice something promising: AI also seems to open new possibilities for architectural and functional decisions that traditionally had to be anticipated from the beginning.

I'm referring specifically to elements like:

  • Internationalization (i18n): Do we really need to design for multiple languages from day one?
  • Observability and monitoring: Can we start simple and add instrumentation later?
  • Compliance: Is it possible to build first and adapt regulations later?
  • Horizontal scalability and adaptation to distributed architectures: Can we defer these decisions until we have real evidence of need?

My feeling is that these decisions can be deliberately postponed and introduced later thanks to the automatic refactoring capabilities that AI seems to provide. This could further strengthen our ability to apply YAGNI and defer commitment.

The guardrails I believe are necessary

For this to work, I feel we need to maintain certain technical guardrails:

  • Clear separation of responsibilities: So that later changes don't break everything
  • Solid automated tests: To refactor with confidence
  • Explicit documentation of deferred decisions: So we don't forget what we deferred
  • Use of specialized AI for architectural spikes: To explore options when the time comes

But I insist: these are just intuitions I'd like to validate collectively.

Working hypotheses I'd love to test

After these months of experimentation, these are the hypotheses that have emerged and that I'd love to discuss and test collectively:

1. Speed ≠ Amplitude

Hypothesis: We should use AI's speed to validate faster, not to build bigger.

2. Radical YAGNI

Hypothesis: If YAGNI was important before, now it could be critical. Ease of implementation shouldn't justify additional complexity.

3. Elimination as a central discipline

Hypothesis: Treat code elimination as a first-class development practice, not as a maintenance activity.

4. Hybrid Pair Programming

Hypothesis: Combining AI's speed with human reflection could be key. Never let AI make architectural decisions alone.

5. Reinforced deployment/release separation

Hypothesis: Keep this separation clearer than ever. Ease of implementation could create mirages of "finished product."

6. Deferred cross-cutting concerns

Hypothesis: We can postpone more architectural decisions than before, leveraging AI's refactoring capabilities.

An honest invitation to collective learning

Ultimately, these are initial ideas and reflections, open to discussion, experimentation, and learning. My intuition tells me that artificial intelligence is radically changing the way we develop products and software, enhancing our capabilities, but suggesting the need for even greater discipline in validation, elimination, and radical code simplification.

My strongest hypothesis is this: AI amplifies both our good and bad practices. If we have the discipline to maintain small batches, validate quickly, and eliminate waste, AI could make us extraordinarily effective. If we don't have it, it could help us create disasters faster than ever.

But this is just a hunch that needs validation.

What experiences have you had? Have you noticed these same patterns, or completely different ones? What practices are you trying? Have you noticed these same effects in your teams? What feelings does the integration of AI in your Lean processes generate for you?

We're in the early stages of understanding all this. We need perspectives from the entire community to navigate this change that I sense could be paradigmatic, but that I still don't fully understand.

Let's continue the conversation. The only way forward is exploring together.

Do these reflections resonate with you? Have you noticed similar or completely different patterns? I'd love to hear about your experience and continue learning together in this fascinating and still unexplored territory.

Sunday, October 05, 2025

Lean + XP + Product Thinking – Three Pillars for Sustainable Software Development

When we talk about developing software sustainably, we're not just talking about taking care of the code or going at a reasonable pace. We're referring to the ability to build useful products, with technical quality, in a continuous flow, without burning out the team and without making the system collapse with every change. It's a difficult balance. However, in recent years I've been seeing how certain approaches have helped us time and again to maintain it.

These aren't new or particularly exotic ideas. But when combined well, they can make all the difference. I'm referring to three concrete pillars: Extreme Programming (XP) practices, Lean thinking applied to software development, and a product mindset that drives us to build with purpose and understand the "why" behind every decision.

Curiously, although sometimes presented as different approaches, XP and Lean Software Development have very similar roots and objectives. In fact, many of Lean's principles—such as eliminating waste, optimizing flow, or fostering continuous learning—are deeply present in XP's way of working. This is no coincidence: Kent Beck, creator of XP, was one of the first to apply Lean thinking to software development, even before it became popular under that name. As he himself wrote:

"If you eliminate enough waste, soon you go faster than the people who are just trying to go fast." — Kent Beck, Extreme Programming Explained (2nd ed.), Chapter 19: Toyota Production System

This quote from Kent Beck encapsulates the essence of efficiency through the elimination of the superfluous.

I don't intend to say this is the only valid way to work, nor that every team has to function this way. But I do want to share that, in my experience, when these three pillars are present and balanced, it's much easier to maintain a sustainable pace, adapt to change, and create something that truly adds value. And when one is missing, it usually shows.

This article isn't a recipe, but rather a reflection on what we've been learning as teams while building real products, with long lifecycles, under business pressure and with the need to maintain technical control—a concrete way that has worked for us to do "the right thing, the right way… and without waste."

Doing the right thing, the right way… and without waste

There's a phrase I really like that well summarizes the type of balance we seek: doing the right thing, the right way. This phrase has been attributed to Kent Beck and has also been used by Martin Fowler in some contexts. In our experience, this phrase falls short if we don't add a third dimension: doing it without waste, efficiently and smoothly. Because you can be doing the right thing, doing it well, and still doing it at a cost or speed that makes it unsustainable.



Over the years, we've seen how working this way—doing the right thing, the right way and without waste—requires three pillars:

  • Doing the right thing implies understanding what problem needs to be solved, for whom, and why. And this cannot be delegated outside the technical team. It requires that those who design and develop software also think about product, impact, and business. This is what in many contexts has been called Product Mindset: seeing ourselves as a product team, where each person acts from their discipline, but always with a product perspective.
  • Doing it the right way means building solutions that are maintainable, testable, that give us confidence to evolve without fear and that do so at a sustainable pace, respecting people. This is where Extreme Programming practices come into full play.
  • And doing it without waste leads us to optimize workflow, eliminate everything that doesn't add value, postpone decisions that aren't urgent, and reduce the baseline cost of the system. Again, much of Lean thinking helps us here.

These three dimensions aren't independent. They reinforce each other. When one fails, the others usually suffer. And when we manage to have all three present, even at a basic level, that's when the team starts to function smoothly and with real impact.

The three pillars

Over time, we've been seeing that when a team has these three pillars present—XP, Lean Thinking, and Product Engineering—and keeps them balanced, the result is a working system that not only functions, but endures. It endures the passage of time, strategy changes, pressure peaks, and difficult decisions.

1. XP: evolving without breaking

Extreme Programming practices are what allow us to build software that can be changed. Automated tests, continuous integration, TDD, simple design, frequent refactoring… all of this serves a very simple idea: if we want to evolve, we need very short feedback cycles that allow us to gain confidence quickly.

With XP, quality isn't a separate goal. It's the foundation upon which everything else rests. Being able to deploy every day, run experiments, try new things, reduce the cost of making mistakes… all of that depends on the system not falling apart every time we touch something.

"The whole organization is a quality organization." — Kent Beck, Extreme Programming Explained (2nd ed.), Chapter 19: Toyota Production System

I remember, at Alea, changing the core of a product (fiber router provisioning system) in less than a week, going from working synchronously to asynchronously. We relied on the main tests with business logic and gradually changed all the component's entry points, test by test. Or at The Motion, where we changed in parallel the entire architecture of the component that calculated the state and result of the video batches we generated, so it could scale to what the business needed.

Making these kinds of changes in a system that hadn't used good modern engineering practices (XP/CD) would have been a nightmare, or even completely discarded, opting for patch upon patch until having to declare technical bankruptcy and rebuild the system from scratch.

However, for us, thanks to XP, it was simply normal work: achieving scalability improvements or adapting a component to manufacturer changes. Nothing exceptional.

None of this would be possible without teams that can maintain a constant pace over time, because XP doesn't just seek to build flexible and robust systems, but also to care for the people who develop them.

XP not only drives the product's technical sustainability, but also a sustainable work pace, which includes productive slack to be creative, learn, and innovate. It avoids "death marches" and heroic efforts that exhaust and reduce quality. Kent Beck's 40-hour work week rule reflects a key idea: quality isn't sustained with exhausted teams; excessive hours reduce productivity and increase errors.

2. Lean Thinking: focus on value and efficiency

Lean thinking gives us tools to prioritize, simplify, and eliminate the unnecessary. It reminds us that doing more isn't always better, and that every line of code we write has a maintenance cost. Often, the most valuable thing we can do is build nothing at all.

We apply principles like eliminating waste, postponing decisions until the last responsible moment (defer commitment), measuring flow instead of utilization, or systematically applying YAGNI. This has allowed us to avoid premature complexities and reduce unnecessary work.

In all the teams I've worked with, we've simplified processes: eliminating ceremonies, working in small and solid steps, dispensing with estimates and orienting ourselves to continuous flow. Likewise, we've reused "boring" technology before introducing new tools, and always sought to minimize the baseline cost of each solution, eliminating unused functionalities when possible.

I remember, at Alea, that during the first months of the fiber router provisioning system we stored everything in a couple of text files, without a database. This allowed us to launch quickly and migrate to something more complex only when necessary. Or at Clarity AI, where our operations bot avoided maintaining state by leveraging what the systems it operates (like AWS) already store and dispensed with its own authentication and authorization system, using what Slack, its main interface, already offers.

These approaches have helped us focus on the essential, reduce costs, and maintain the flexibility to adapt when needs really require it.

3. Product Mindset: understanding the problem, not just building the solution

And finally, the pillar that's most often forgotten or mentally outsourced: understanding the problem.
As a product team, we can't limit ourselves to executing tasks from our disciplines; we need to get involved in the impact of what we build, in the user experience, and in the why of each decision.

When the team assumes this mindset, the way of working changes completely. The dividing line between "business" and "technology" disappears, and we start thinking as a whole. It doesn't mean everyone does everything, but we do share responsibility for the final result.

In practice, this implies prioritizing problems before solutions, discarding functionalities that don't add value even if they're already planned, and keeping technical options open until we have real data and feedback. Work is organized in small and functional vertical increments, delivering improvements almost daily to validate hypotheses with users and avoid large deliveries full of uncertainty. Thanks to this, this approach allows adapting critical processes in just a few hours to changes in requirements or context, without compromising stability or user experience.

Not all pillars appear at once

One of the things I've learned over time is that teams don't start from balance. Sometimes you inherit a team with a very solid technical level, but with no connection to the product. Other times you arrive at a team that has good judgment about what to build, but lives in a trench of impossible-to-maintain code. Or the team is so overwhelmed by processes and dependencies that it can't even get to production smoothly.

The first thing, in those cases, isn't to introduce a methodology or specific practice. It's to understand. See which of the pillars is weakest and work on improving it until, at least, it allows you to move forward. If the team can't deploy without suffering, it matters little that they perfectly understand the product. If the team builds fast but what they make is used by no one, the problem is elsewhere.

Our approach has always been to seek a certain baseline balance, even at a very initial level, and from there improve on all three pillars at once. In small steps. Without major revolutions.

The goal isn't to achieve perfection in any of the three, but to prevent any one from failing so badly that the team gets blocked or frustrated. When we manage to have all three reasonably present, improvement feeds back on itself. Increasing quality allows testing more things. Better understanding the product allows reducing unnecessary code. Improving flow means we can learn faster.

When a pillar is missing…

Over time, we've also seen the opposite: what happens when one of the pillars isn't there. Sometimes it seems the team is functioning, but there's something that doesn't quite fit, and eventually the bill always comes due.

  • Teams without autonomy become mere executors, without impact or motivation.
  • Teams without technical practices end up trapped in their own complexity, unable to evolve without breaking things.
  • Teams without focus on value are capable of building fast… fast garbage.

And many times, the problem isn't technical but structural. As Kent Beck aptly points out:

"The problem for software development is that Taylorism implies a social structure of work... and it is bizarrely unsuited to software development." — Kent Beck, Extreme Programming Explained (2nd ed.), Chapter 18: Taylorism and Software

In some contexts you can work to recover balance. But there are also times when the environment itself doesn't allow it. When there's no room for the team to make decisions, not even to improve their own dynamics or tools, the situation becomes very difficult to sustain. In my case, when it hasn't been possible to change that from within, I've preferred to directly change contexts.


Just one way among many

Everything I'm telling here comes from my experience in product companies. Teams that build systems that have to evolve, that have long lives, that are under business pressure and that can't afford to throw everything in the trash every six months.
It's not the only possible context. In environments more oriented to services or consulting, the dynamics can be different. You work with different rhythms, different responsibilities, and different priorities. I don't have direct experience in those contexts, so I won't opine on what would work best there.

I just want to make clear that what I'm proposing is one way, not the way. But I've also seen many others that, without some minimum pillars of technical discipline, focus on real value, and a constant search for efficiency, simply don't work in the medium or long term in product environments that need to evolve. My experience tells me that, while you don't have to follow this to the letter, you also can't expect great results if you dedicate yourself to 'messing up' the code, building without understanding the problem, or generating waste everywhere.

This combination, on the other hand, is the one that has most often withstood the passage of time, changes in direction, pressure, and uncertainty. And it's the one that has made many teams not only function well, but enjoy what they do.

Final reflection

Building sustainable software isn't just a technical matter. It's a balance between doing the right thing, doing it well, and doing it without waste. And for that, we need more than practices or processes. We need a way of working that allows us to think, decide, and build with purpose.

In our case, that has meant relying on three legs: XP, Lean, and Product Engineering. We haven't always had all three at once. Sometimes we've had to strengthen one to be able to advance with the others. But when they're present, when they reinforce each other, the result is a team that can deliver value continuously, adapt, and grow without burning out.

I hope this article helps you reflect on how you work, which legs you have strongest, and which ones you could start to balance.

Friday, September 19, 2025

Charla: Incentivos perversos, resultados previsibles. Cuando el sistema sabotea a los equipos

Hoy he tenido la oportunidad de participar en FredCon 25 con la charla "Incentivos perversos, resultados previsibles: Cuando el sistema sabotea a los equipos". Quiero agradecer sinceramente a la organización por la invitación y por generar el espacio para debatir un tema que me importa muchísimo.

Sobre la charla

Exploro cómo la aplicación continuada del Taylorismo en el desarrollo de software y producto genera problemas sistémicos como:

  • La trampa de la eficiencia de recursos.
  • La desconexión del propósito.
  • La trampa de la calidad diferida.

Cuando el sistema está mal diseñado, aparecen resultados previsibles pero indeseables: cuellos de botella crónicos, retrabajo constante, productos irrelevantes, fuga de talento y deuda técnica. Sucede porque se premian incentivos locales y la ocupación individual por encima del rendimiento global del sistema.

Idea clave: estos problemas no son fallos puntuales de los equipos, sino consecuencias inevitables de un sistema mal diseñado. La solución pasa por cambiar el sistema, promoviendo la colaboración, la responsabilidad compartida, la optimización global, la autonomía con propósito y el aprendizaje continuo para lograr impacto y velocidad sostenibles.

Vídeo de la charla

Video link

Slides

También puedes consultar las diapositivas. Incluyen bastantes notas con ejemplos y explicaciones adicionales que ayudan a seguir el hilo más allá de lo que se ve en pantalla.

Documento original con notas (Google Slides)

Abrir las slides en pestaña nueva

Gracias de nuevo a la organización de FredCon 25 y a todas las personas que asististeis. Ojalá este material sirva para abrir conversaciones sobre cómo diseñar sistemas que potencien a los equipos en lugar de sabotearlos.


Referencias de los conceptos principales

Resource Efficiency vs Flow Efficiency:
Systems Thinking and Quality:
  • The Red Bead Experiment (14m) by W. Edwards Deming — A powerful demonstration that performance and quality depend on the system, not individual effort. A reminder that systemic issues require systemic fixes.