Sunday, April 12, 2026

My second brain: markdown, Dropbox, and an AI agent

1,222 notes, 24 categories, zero vendor lock-in. My knowledge management system fits on a thumb drive.

The starting point

The oldest notes in this vault are from June 2013. Talk notes, loose ideas, book quotes, technical decisions, reflections on teams and processes. They started out in Google Keep, in scattered files, in tools I no longer remember the name of. Over time I migrated them into markdown files by hand. Slowly, in bursts, never quite finishing.

The second brain always had valuable stuff in it. Interesting ideas, connections between concepts, reference material that had taken me hours to compile. I found notes when I needed them, connected ideas across sessions. But I had this persistent feeling that there was more in there than I was getting out. Hundreds of notes that didn't talk to each other. Knowledge that accumulated but didn't compound.

And then there was the Google Keep backlog. Over the years I'd piled up around 1,500 notes and links in Keep. Quick captures, things I meant to process "later." The problem was that Google Keep has no API, so getting stuff out was painful enough that I just... didn't. The backlog grew. Every time I opened Keep I felt the weight of it.

Last Christmas I decided to just rip the band-aid off. Exported everything via Google Takeout, deleted it all from Keep, and sat down with the raw files. Using AI I built a throwaway classification pipeline: a combination of heuristics and a human-in-the-loop process where the system proposed a category and I made the final call. In a couple of days, 1,500 notes were classified and integrated into the vault. Bye bye, Google Keep.

That was the moment it clicked. The friction of processing notes had been the bottleneck for years, not the lack of a system. Once I could generate tooling on demand to solve a specific problem, the backlog that had haunted me for ages just... dissolved.

Separately, I'd been curating recommended technical talks in eferro-picks for years. 855 talks, 465 speakers, gathered over 6 years. Another project with valuable information sitting in its own repo, its own structure, disconnected from the vault.

Both projects are now one. What I have is an Obsidian vault of markdown notes on Dropbox, versioned with Git, maintained by an AI agent that understands the structure. The talks from eferro-picks live inside the vault now. 402 of the 855 have full notes so far: all recent talks get them automatically, and I'm gradually backfilling the older ones that only had a title and a link. The vault is the source of truth for the talks, not the other way around. I even have automated pipelines to process new talks I watch and want to recommend (but that's a story for another post).

It's not an app. It's a folder of text files with a system on top. And for the first time in over ten years of note-taking, I feel like I'm getting out of it the value I always sensed was there.

Why plain text files

The most important decision in the system is the most boring one: everything is markdown. .md files I can open with any text editor, on any operating system, with no special tooling.

It sounds like a non-decision. It isn't. There's no database. No proprietary format. No server. If Obsidian disappears tomorrow, or Dropbox, or Claude, the notes are still readable. I can search with grep, edit with emacs, sync with rsync. The format will outlive whatever tool I'm using to read it this year.

Dropbox syncs across devices with zero configuration. Git versions everything, so every change is recorded, I can see diffs, I can roll back. Between the two I get sync and version control almost by accident. Dropbox keeps files up to date across machines, Git keeps the history.

There's a side effect I didn't appreciate until I was living it: I can edit a note on my laptop, close the lid, pick up my phone and keep writing where I left off. Or switch to another machine at home. No export, no sync button, no waiting. Dropbox just does it. Obsidian on mobile opens the same files, same links, same structure. It sounds trivial, but it removes the last excuse for not capturing an idea when it shows up.

Not sophisticated tools. Tools that work, and have been working for decades.

Obsidian is the reading interface. Wiki-links ([[Note]]) create a navigable knowledge graph, backlinks connect ideas both ways, the graph view shows you clusters you didn't know were there. But Obsidian is a view on files, not a platform. If something better comes along, I switch. The files don't care.

The vault's knowledge graph in Obsidian.
The clusters form naturally from wiki-links and shared tags.

And it turns out that "just text files" goes further than notes. Diagrams in the vault are Mermaid, a text format that renders into flowcharts and architecture visuals. Obsidian stores Excalidraw sketches as JSON inside regular .md files. The presentations for the master's program I teach live in Marp: markdown with a few annotations that compiles into slide decks, PDFs and PowerPoints. I can git diff a diagram the same way I diff a note, and an AI agent can generate a Mermaid flowchart as easily as it writes prose. To the agent it's all text.

The structure: PARA, loosely

Notes are organized following Tiago Forte's PARA methodology, without being religious about it.

Projects for things with a deadline or deliverable. Right now: a master's program I'm teaching, this blog, a house renovation. Areas for ongoing stuff: writing, professional network. Resources is where most of the vault lives, reference material organized by topic across about 24 categories. Archive for completed projects that should stop getting in the way.

If you've read my blog before, the topics won't surprise you. The most tagged subjects are engineering culture, AI, agile, continuous delivery, software design, architecture, product and XP. Throw in DevOps, lean, teams and testing and you have a pretty accurate map of what I spend my time thinking about. The vault just makes it explicit.

Then there's The Forest (the vault borrows the digital garden metaphor), a directory with Maps of Content: thematic indexes connecting scattered notes. And Sources, where the 402 talk notes live.

Beyond folders, notes carry two kinds of tags. Topic tags (on/software-design, on/lean, on/ai) say what a note is about. Maturity tags say how developed it is: state/seedling for a raw capture, state/budding for something I've worked on but isn't finished, state/evergreen for a note I consider solid. Most of the vault is still seedlings and budding notes. The evergreen ones are the minority, which is honest: note count and idea quality are different things.

The structure isn't perfect. What matters is that it's predictable. If I'm looking for a quote, I know it's in 3_Resources/Quotes/. Finished projects go to 4_Archive/. This predictability is what lets an AI agent work on the vault without asking me where things go.

The leap: an AI agent that gets the vault

Here's where it gets interesting. Claude Code is not a chatbot I ask things about my notes. It's an agent that operates directly on the files. Creates notes, edits them, runs scripts, checks the vault's health. All while following rules I've written over time.

The rules live in .claude/rules/, configuration files the agent reads at the start of every session. Where to put each type of note. What YAML frontmatter to include. Naming conventions. How to verify links aren't broken. Not suggestions. Constraints.

Here's what that looks like in practice. I say: "create notes for these 7 mental models." The agent creates 7 files in 3_Resources/system_thinking/, each with the right structure (core idea, software applications, limitations, connections), tags them on/mental-model, updates the central Map of Content in The Forest/Mental Models.md placing each in the right domain category, and before writing any wiki-link, checks that the target note actually exists. I don't touch a file. But the rules that made this work? I wrote those myself, one session at a time, encoding what I'd learned about how my vault should behave.

Three layers, one system

What makes this a system and not just "a chatbot writing files" is three layers working together. I keep coming back to this because it's the part people misunderstand.

The first layer is the agent with domain context. Claude Code doesn't see loose files. It knows a new talk note needs topics from the 80+ taxonomy and a link to the speaker's page. A quote goes in Quotes/ with the author's name as tag. A project has a deadline, an area doesn't. Rules give it semantics, not just file paths.

The second layer is 15 Python scripts behind 30 Makefile targets.

One group keeps the vault healthy: a script that parses every wiki-link and checks it against existing files, another that validates YAML frontmatter, another that finds stray images and moves them next to the notes that reference them.

A second group powers the talk pipeline: sync from eferro-picks, pull metadata from YouTube via yt-dlp, run content through NotebookLM, generate blog post HTML.

A third handles the master's presentations: Mermaid to PNG, Marp to PDF and slide decks.

The agent runs all of this through make targets. When it checks for broken links, it's not guessing. It's running a real script with real output.

The third layer is 6 skills. These aren't prompts. They're protocols: which tools to use, in what order, what to verify, what to ask me before proceeding. When I say "process this talk," the agent doesn't figure it out from scratch. It follows a written workflow that I've refined over multiple iterations. Same when I say "check vault health" or "organize the inbox." Each skill encodes a complete task, not a vague instruction.

Take away any one layer and the thing falls apart. Without scripts the agent would improvise something different every time. Without the agent the scripts are just CLI tools I'd have to remember to run. Without rules the skills wouldn't know what decisions to make.

How it actually evolved

The current state of the system is less interesting than how it got here. Because none of this was planned.

The talk pipeline started as a single script to sync a JSON with the vault. I was doing it by hand and it was tedious, so I automated the sync. Then I wanted the agent to be able to run it, so I wrote a skill. Then I wanted automatic topic extraction, so I plugged in NotebookLM. Then I wanted to publish talks as blog posts, so I wrote another script. Over a weekend, iterating with the agent. From copy-paste to a pipeline processing 402 talks and spitting out HTML.

The pattern repeats. I use the vault, hit a friction. Write a script for the mechanical part. Wrap it in a skill so the agent can orchestrate it with judgment. Add rules so the agent remembers the lesson next time. Now the agent is better at that task, which frees me to notice new frictions.

The master's program presentation rules followed the same arc. After session 3, I jotted down what worked and what hadn't. Slides had too much text. Extra material should be in separate files. Speaker notes needed a timeline. I turned those lessons into rules. From session 4 on, Claude Code applied them without me having to say anything. The rules became the shared memory between me and the agent.

Zero broken links

Of all the rules in the vault, one appears in three separate files. I consider it the most important: zero broken links.

In Obsidian, a wiki-link [[Something]] is a promise that "Something" exists as a note. If it doesn't, the link is noise. It suggests content that isn't there, pollutes the knowledge graph, blurs the line between what's real and what's aspirational. And broken links breed. One becomes five, five become thirty, and suddenly your graph is full of ghosts.

The rule is simple: if you're not sure a note exists, use plain text. The agent checks before writing any [[Concept]]. If the file doesn't exist, it writes "Concept" without brackets. After every edit, make find-broken-links. If it created broken links, it fixes them before moving on.

Prevention, not correction. Same principle as tests in code. Cheaper to not introduce the bug.

What this isn't

I don't want to oversell this.

It's not effortless. Rules need writing, scripts need maintaining, structural decisions need making. There are 15 Python scripts and 30 Makefile targets for a personal vault. Is that over-engineering? Probably, partly. Some of it is hobby. Some is me poking at what's possible when you point an AI agent at a folder of text files.

It's not a system where AI does the thinking. Claude Code doesn't decide what's worth keeping, how to categorize a concept, or which connections matter. I'm the one who decided mental models live in system_thinking/ regardless of their domain. I'm the one who decided that domain categorization happens in the MOC, not in the filesystem. The agent executes those choices at scale. But the design is mine.

And it's not for everyone. You need to be comfortable with a terminal, with Git, with text files. If you want a polished app with automatic sync and zero setup, use Notion. Seriously. It's fine.

The question I'm left with

Every time I codify a rule or a skill, the agent gets more capable. But also more opinionated. The rules reflect my decisions today: my taxonomy, my structure, my conventions. What happens when my thinking changes? Do the rules become inertia, or are they an explicit record of decisions I can consciously revisit?

Git has the full history. I can see when I added each rule and why. I can change them. But there's a gap between being able to change them and actually doing it when the system works well as-is.

After years trying tools, what works for me turns out to be the most boring stack imaginable: text files, a synced folder, an agent that knows the garden's rules. 1,222 notes and counting.

Whether the system helps me think better or just organize faster what I was already thinking, I genuinely don't know. Probably both. Probably the distinction doesn't hold up under scrutiny.

Related reading

The methodology and ideas behind this:

  • The PARA Method — Tiago Forte's original post on the organizational system this vault uses
  • Evergreen notes — Andy Matuschak's thinking on notes that evolve and compound over time, which inspired the seedling/budding/evergreen maturity model
  • Building a Second Brain — Tiago Forte's broader framework for personal knowledge management

The tools:

  • Obsidian — The editor I use as a view on the vault's markdown files
  • Claude Code — The AI agent that operates on the vault
  • Marp — Markdown to slide decks
  • Mermaid — Diagrams as text

Sunday, April 05, 2026

Good talks/podcasts (April I)

These are the best podcasts/talks I've seen/listened to recently:
  • No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer 🔗 talk notes (Dex Horthy) [AI Assisted Engineering] [Duration: 00:20] (⭐⭐⭐⭐⭐) Dex Horthy explores advanced context engineering and the "Research, Plan, Implement" (RPI) workflow to effectively solve complex problems in brownfield codebases while minimizing AI-generated "slop" and maintaining team alignment.
  • Kent L Beck: You’re Ignoring Optionality… and Paying for It 🔗 talk notes (Kent Beck) [Agile, Engineering Culture, Software Design] [Duration: 00:49] (⭐⭐⭐⭐⭐) Kent Beck discusses the tension between delivering features and maintaining software "optionality," advocating for a "tidy first" approach to make hard changes easy by improving code structure as both an economic and moral necessity.
  • #156 How to deploy lean projects and more with author Michael Balle 🔗 talk notes (Michael Balle) [Engineering Culture, Lean, Management] [Duration: 01:02] (⭐⭐⭐⭐⭐) Michael Ballé redefines Lean as a humanistic engineering philosophy centered on "making people before making parts" by prioritizing technical competence, Gemba-based collaboration, and strengthening workplace conditions to bridge the gap between top-down management and true frontline engagement.
  • Platform Engineering in 2025: Still Stuck in Ticket Hell? 🔗 talk notes (Steve Smith) [Devex, Platform engineering] [Duration: 00:07] Escaping "ticketing hell" by evolving platform engineering from a manual service desk into an automated self-service model that reduces queue times and empowers delivery teams to accelerate.
  • Forget Velocity, Let's Talk Acceleration • Jessica Kerr • GOTO 2017 🔗 talk notes (Jessica Kerr) [Engineering Culture, Mental models, Systems Thinking] [Duration: 00:54] (⭐⭐⭐⭐⭐) Jessica Kerr redefines software development as software parenting and system moving, arguing that teams should prioritize acceleration—the ability to change direction and improve the system—over mere velocity by fostering generativity and mutual learning through strategic automation.
  • The Best Product Engineering Org in the World 🔗 talk notes (James Shore) [Engineering Culture, Product Strategy, Technology Strategy, agile-XP] [Duration: 01:40] (⭐⭐⭐⭐⭐) James Shore outlines a holistic framework for building a world-class engineering culture by focusing on six core pillars—People, Internal Quality, Lovability, Visibility, Agility, and Profitability—while leveraging Extreme Programming (XP) and Fluid Scaling Technology (FaST) to drive sustainable business impact.
  • What Skills Do Developers NEED To Have In An AI Future? 🔗 talk notes (Trisha Gee, Kent Beck) [AI Assisted Engineering, Engineering Culture, Technical leadership] [Duration: 00:24] (⭐⭐⭐⭐⭐) This videopodcast examines how AI-augmented development shifts the developer's role from writing syntax to exercising high-leverage skills like curiosity, design taste, strategic testing, and effective communication to navigate rapid feedback loops and maintain optionality.
  • o11ycast - Ep. #87, Augmented Coding Patterns with Lada Kesseler 🔗 talk notes (Lada Kesseler, Jessica Kerr, Ken Rimple) [AI Assisted Engineering, Generative AI, tdd] [Duration: 00:48] (⭐⭐⭐⭐⭐) Lada Kessler introduces Augmented Coding Patterns to navigate the "black box" of AI-assisted development by employing specialized single-purpose agents, high-level test specifications, and emoji-based context markers to monitor an agent's focus and internal knowledge.
  • The state of VC within software and AI startups – with Peter Walker 🔗 talk notes (Peter Walker, Gergely Orosz) [AI, Engineering Culture, startup] [Duration: 01:19] A data-driven exploration of how shifting venture capital dynamics and AI are reshaping startup hiring, team structures, and the engineering landscape.
  • Should Test-Driven Development (TDD) Be Used MORE In Software Engineering? 🔗 talk notes (Emily Bache, Dave Farley) [Agile, Software Design, tdd] [Duration: 00:26] (⭐⭐⭐⭐⭐) This expert discussion highlights how Test-Driven Development (TDD) acts as a fundamental software design tool that facilitates Agile development by providing constant feedback, enforcing separation of concerns, and enabling developers to proceed with confidence through small, iterative steps.
  • An AI state of the union: We’ve passed the inflection point & dark factories are coming 🔗 talk notes (Simon Willison) [AI Assisted Engineering, Security, tdd] [Duration: 01:39] (⭐⭐⭐⭐⭐) Simon Willison explores the "inflection point" of AI in software development, detailing agentic engineering patterns, the rise of "dark factories," and the critical security challenges posed by prompt injection.
  • Data vs Hype: How Orgs Actually Win with AI - The Pragmatic Summit 🔗 talk notes (Laura Tacho) [AI, Developer Productivity, Devex] [Duration: 00:29] A data-driven exploration of how organizations can move beyond AI hype to achieve real impact by focusing on developer experience, organizational transformation, and clear measurement frameworks.
  • Making Codebases Agent Ready – Eno Reyes, Factory AI 🔗 talk notes (Eno Reyes) [AI Assisted Engineering, Developer Productivity, Testing] [Duration: 00:15] This talk explores how rigorous automated validation and specification-driven development serve as the essential foundation for scaling autonomous AI agents and unlocking exponential engineering velocity.
  • The Most Polarizing Practice In Modern Software Engineering? 🔗 talk notes (Dave Farley, Dan North) [CI, Trunk Based Development, tdd] [Duration: 00:33] Dave Farley and Daniel Terhorst-North explore the industry's most polarizing practices, such as trunk-based development and estimation, advocating for a pragmatic and outcome-focused approach to software engineering.
  • The Forest & The Desert Are Parallel Universes • Kent Beck • GOTO 2025 🔗 talk notes (Kent Beck) [Compliance, Engineering Culture, XP] [Duration: 00:39] (⭐⭐⭐⭐⭐) Kent Beck contrasts trust-based "Forest" and control-driven "Desert" development cultures, revealing how these parallel universes fundamentally redefine the meaning of metrics, accountability, and engineering practices.
Reminder: All of these talks are interesting, even just listening to them.

You can explore all my recommended talks and podcasts on the interactive picks site, where you can filter by topic, speaker, and rating: Related:

Sunday, March 29, 2026

El libro que llevo 25 años intentando no tener que escribir

"Hoy, cuando la última moda es el AI-assisted coding, sonrío pensando que los principios que aprendí hace diez años siguen siendo los mismos." — del prólogo, por una ingeniera que trabajó en el equipo

Durante más de dos décadas he explicado las mismas ideas a los mismos tipos de equipos. Equipos distintos, empresas distintas, contextos distintos. Y sin embargo, el patrón se repetía con una regularidad que ya no puedo ignorar.

Equipos talentosos. Equipos capaces. Tecnología razonable. Y aun así: la sensación de correr cada vez más fuerte para avanzar cada vez menos.

Lewis Carroll lo describió mejor que yo en A través del espejo: "Aquí, como ves, se requiere correr todo cuanto se pueda para permanecer en el mismo sitio." Durante años pensé que esa metáfora era exagerada. Ya no.

Parte del problema es cómo pensamos sobre el software. Decimos que lo "construimos", como si fuera algo que se termina y queda ahí. Pero el software no se construye. Se cultiva. Es un sistema vivo que crece, cambia y se degrada. Y los sistemas vivos necesitan atención continua, no solo construcción.

He llegado a la conclusión de que lo más honesto que podía hacer era escribirlo.

De qué va el libro

"Menos software, más impacto" tiene un subtítulo que no deja mucho a la imaginación: Cómo evitar que tu equipo colapse bajo el peso de su propio código.

La tesis central es incómoda: el mayor problema de la mayoría de los equipos no es que escriban código malo. Es que escriben código de más.

El software existente consume recursos continuamente, lo uses o no. Cada funcionalidad añadida, cada integración, cada decisión de diseño que se acumula sin revisión tiene un coste que no aparece en ningún roadmap pero que aparece cada día. Lo llamo el coste basal del software, en analogía al metabolismo basal de un organismo: el gasto mínimo para seguir funcionando. Y como el metabolismo, si no se gestiona, crece hasta consumir toda la energía disponible.

El libro recorre cuatro grandes bloques:

  • Fundamentos: qué es el Lean Software Development y por qué el coste basal es el concepto central que lo conecta todo
  • Los cinco principios: eliminar desperdicios, amplificar el aprendizaje, decidir en el último momento responsable, entregar lo antes posible, empoderar al equipo
  • Calidad sostenible: por qué la calidad no es el enemigo de la velocidad sino su única base duradera
  • Pensamiento sistémico: optimizar el todo, integrar Lean con XP y mentalidad de producto, y qué pasa si no haces nada

Son 192 páginas. Basadas en más de 25 años de experiencia en equipos reales: Alea Soluciones, The Motion, Nextail, Clarity AI. Con casos concretos, conflictos reales y errores propios reconocidos. El libro incluye también las perspectivas de ocho profesionales que han vivido estas transformaciones desde dentro, en distintos roles y contextos.

Para quién es (y para quién no es)

Este no es un libro para quien quiere mejorar su código de forma individual. Hay libros excelentes para eso y este no es uno de ellos.

Es para quien toma decisiones sobre qué se construye, qué no se construye y qué se elimina. Engineering Managers, Tech Leads, Product Managers, CTOs. Cualquier persona con responsabilidad directa sobre la capacidad de un equipo a seis meses, un año, tres años vista.

Si tu día a día es decidir prioridades, gestionar capacidad y negociar alcance, lo que viene en el libro te va a resultar familiar. Y probablemente incómodo. Esa es la intención.

Por qué ahora

Hay mucha literatura sobre Lean, XP y Agile en inglés. En español, menos de la que debería haber. Y casi ninguna que combine los tres enfoques de forma integrada, con casos reales de equipos que conozco de primera mano.

Además, el contexto actual lo hace más urgente. La aceleración que trae la IA hace que las decisiones sobre qué construir y qué no construir sean más importantes, no menos. Amplificar la capacidad de un equipo que ya construye demasiado no resuelve el problema. Lo acelera.

El borrador está completo. Ahora viene la revisión, la edición y la preparación para publicación. Si te interesa estar entre los primeros en leerlo, escríbeme: eferro@eferro.net

Sunday, March 01, 2026

Encoding Experience into AI Skills

I'd been tweaking my augmented coding setup for months - adjusting CLAUDE.md rules, adding instructions for testing discipline, complexity management, incremental delivery. Things I've repeated to every team I've worked with, now repeated to AI agents. It worked, but it felt like writing the same email over and over.

Then I found Lada Kesseler's skill-factory.


What Skills Are (And Why They Matter)

If you use Claude Code, you already know about CLAUDE.md - a file where you put instructions that the agent reads at the start of every conversation. It works. But it has a problem: everything is always loaded. Your TDD guidelines, your Docker best practices, your refactoring workflow - all of it competing for the agent's limited context window, whether it's relevant or not.

Skills solve this differently. They're packaged knowledge that activates only when relevant. You type /mutation-testing and the agent gains deep expertise about finding weak tests through mutation analysis. You type /complexity-review and it becomes a technical reviewer that challenges your proposals against 30 dimensions of complexity. The rest of the time, that knowledge stays out of the way.

Think of it as progressive disclosure for AI context. The agent gets what it needs, when it needs it.

The Discovery: Lada Kesseler's Skill Factory

Lada Kesseler built the skill-factory - a repository with 315 commits of carefully crafted skills covering serious engineering ground: TDD, Nullables (James Shore's pattern for testing without mocks), approval tests, refactoring (using Llewellyn Falco's approach), hexagonal architecture, event modeling, collaborative design, and more.

These aren't toy prompts. The Nullables skill alone includes reference material for infrastructure wrappers, embedded stubs, output tracking, and three different architectural patterns. The approval-tests skill covers Java, Python, and Node.js with scrubbers, reporters, and inline patterns. This is deep, carefully structured knowledge.

Lada also co-created augmented-coding-patterns - a catalog of 43 patterns, 14 obstacles, and 9 anti-patterns for working effectively with AI coding tools. It's a collaboration between Lada Kesseler, Ivett Ordog, and Nitsan Avni. If you're doing augmented coding and haven't seen it, stop reading this and go look.

What I found wasn't just a collection of skills. It was an approach to sharing engineering knowledge with AI agents that I hadn't seen anywhere else.

The Fork as Extension

The natural next step wasn't to start from scratch - it was to fork and extend. Lada's skills already covered testing fundamentals, design patterns, and AI-specific workflows. What I noticed missing were the practices I kept explaining repeatedly: how to manage complexity, how to deliver incrementally, how to make sure tests actually catch bugs.

So I added 11 skills. Not because 16 wasn't enough, but because my particular set of problems needed particular solutions.

You can find my extended fork at github.com/eferro/skill-factory with all 27 skills ready to use.

Testing rigor

test-desiderata - Kent Beck's 12 properties that make tests valuable. Not "does this test pass?" but "is this test isolated? composable? predictive? inspiring?" I was tired of AI generating tests that had coverage but no diagnostic power. This skill makes the agent evaluate tests against each property and suggest concrete improvements.

mutation-testing - The question code coverage can't answer: "Would my tests catch this bug?" Coverage tells you what your tests execute. Mutation testing tells you what they'd detect. I'd already written a blog post about this - now it's a reusable skill. The examples are in Python and JavaScript, but I'm also using it successfully with Go.

Delivering incrementally and managing complexity

This is where the skills chain together, and where things get interesting.

story-splitting - Detects linguistic red flags in requirements ("and", "or", "manage", "handle", "including") and applies splitting heuristics. It's the first pass: is this story actually three stories wearing a trenchcoat?

hamburger-method - When a story doesn't have obvious split points but still feels too big, this skill applies Gojko Adzic's Hamburger Method: slice the feature into layers, generate 4-5 implementation options per layer, then compose the thinnest possible vertical slices.

small-safe-steps - The implementation planner. Takes any piece of work and breaks it into 1-3 hour increments using the expand-contract pattern for migrations, schema changes, API changes. Core belief: risk grows faster than the size of the change.

complexity-review - My inner skeptic, encoded. Reviews technical proposals against 30 dimensions of complexity across 6 categories (data volume, interaction frequency, consistency requirements, resilience, team topology, operational burden). Pushes for the simplest viable approach. Use it when someone says "Kafka" and you want to ask "why not a queue?"

code-simplifier - Reduces complexity in existing code without changing behavior. The cleanup crew after a feature is done.

These five skills work as a pipeline: story-splitting -> hamburger-method -> small-safe-steps for delivery planning, with complexity-review as a gate before implementation and code-simplifier as a sweep after.

Practical tools and team workflows

thinkies - Kent Beck's creative thinking habits, turned into a skill. When you're stuck, it applies patterns like "What would I do if I had infinite resources?", "What's the opposite of my current approach?", "What would make this problem trivial?" It's less about code and more about unsticking your thinking.

traductor-bilingue - Technical translation between English and Spanish that keeps terms like "deploy", "pull request", "pipeline", and "staging" in English (because that's how Spanish-speaking dev teams actually talk). Small thing, but it saves constant corrections.

dockerfile-review - Reviews Dockerfiles for build performance, image size, and security issues.

modern-cli-design - Principles for building scalable CLIs: object-command architecture (noun-verb), LLM-optimized help text, JSON output, concurrency patterns.

A Skill in Action

To make this concrete, here's what the delivery planning pipeline looks like in practice.

Say you have a story: "As a user, I want to manage my notification preferences including email, SMS, and push notifications with scheduling and quiet hours."

Step 1 - You invoke /story-splitting. The agent immediately flags "manage", "including", and the conjunction "and" joining three notification types plus scheduling. It suggests splitting into at least 4 stories: one per notification channel plus quiet hours as a separate slice.

Step 2 - You take the first slice ("email notification preferences") and invoke /hamburger-method. It breaks the feature into layers (UI, API, business logic, persistence) and generates options for each. For the UI layer: (a) full settings page, (b) single toggle, (c) link to email with confirmation, (d) inline in profile. It composes the thinnest vertical slice: a single toggle with an API endpoint and a database flag.

Step 3 - You invoke /small-safe-steps on that thin slice. It produces a sequence of 1-3 hour steps: add the database column with a migration, add the API endpoint with tests, add the UI toggle, wire it together. Each step deployable independently.

No single skill does everything. They compose. That's the point.

How to Get Started

If you want to try these:

  1. Fork the repo: github.com/eferro/skill-factory (my extended fork with 11 additional skills for complexity management and incremental delivery) or the original by Lada Kesseler
  2. Install skills: The repo includes a skills CLI tool. Run ./skills toggle to browse and select which skills to install into your Claude Code setup.
  3. Use them: Type /skill-name in Claude Code. /mutation-testing to check your tests. /complexity-review to challenge a design. /small-safe-steps to plan your next implementation.
  4. Make your own: The repo includes documentation and tooling for creating new skills. Fork it, add what you need, share it back.

Standing on Shoulders

The total is 329 commits, 27 skills across 6 categories. But the number that matters most is that Lada built 315 of those commits. I added 14. The original structure, the skill manager, the testing and design skills that form the foundation - that's all her work. What I did was extend it with the practices I personally find myself repeating.

This is how open source has always worked: someone builds something good, others extend it, and the whole thing becomes more useful than any individual could make it. With AI skills, the effect compounds differently - every skill that gets shared becomes available to every person using it, making good practices almost free.

Lada's augmented-coding-patterns site (with Ivett Ordog and Nitsan Avni) takes this even further - it's not just tooling but a shared vocabulary for how we work with AI. Skills, patterns, obstacles, anti-patterns: a growing body of community knowledge.

What knowledge do you find yourself repeating to your AI agents? What practices would you encode as skills?

The barrier to sharing isn't technical anymore. It's deciding to do it.

References

Sunday, February 22, 2026

Podcast: AI as an Amplifier. Why Engineering Practices Matter More Than Ever

Vasco Duarte invited me to be part of the Scrum Master Toolbox Podcast's AI Assisted Coding series, and I couldn't pass up the chance to talk about something I've been living and thinking about intensely for the past several months.

The conversation builds directly on the experiment I documented in Fast Feedback, Fast Features: My AI Development Experiment: 424 commits over 11 weeks, where for every unit of effort I put into new features, I invested four times more in refactoring, cleanup, tests, and simplification. And yet, globally, I think I more or less doubled my pace of work.

In the episode, we dig into several things I've been exploring:

Vibe coding vs production AI development. Both are valid—but they require different mindsets. Vibe coding is flow-driven, exploration-focused, great for prototypes and discovery. Production AI coding demands architectural thinking, security analysis, and sustainability practices. Even vibe coding benefits from engineering discipline as soon as experiments grow beyond a weekend hack.

The positive spiral of code removal. One of the most powerful patterns I've discovered is using AI to accelerate deletion. Connect product analytics to identify unused features, use AI to remove them efficiently, and you trigger a cycle: simpler code makes architecture changes cheaper, cheaper architecture changes enable faster feature delivery, which creates more opportunities for simplification. Humans historically avoided this because removal was as expensive as creation. That excuse is gone.

Preparing the system before introducing change. Rather than asking "implement this feature," I've been asking "how should I change my system to make this feature trivial to introduce?" AI makes that preparation cheap enough to do routinely. The result: systems that evolve cleanly rather than accumulating debt with each addition.

AI as an amplifier—the double-edged sword. This is the central idea. AI doesn't replace engineering judgment; it magnifies its presence or absence. Strong teams will see accelerated improvement. Teams without good practices will generate technical debt faster than ever. The path to excellence in modern software development lies in the seamless integration of a high-performance engineering culture, lean-agile product strategies, and an evolutionary approach to architecture. AI makes that path wider—but you still have to choose to walk it.

🎙️ Listen to the episode: AI as an Amplifier—Why Engineering Practices Matter More Than Ever

Sunday, January 18, 2026

Fast Feedback, Fast Features: My AI Development Experiment

What happens when you use AI not to ship faster, but to build better? I tracked 424 commits over 11 weeks to find out.

The Experiment

Context first: I'm an engineering manager, not a full-time developer. These 424 commits happened in the time I could carve out between meetings, planning, and leadership work. The applications are production internal systems (monitoring dashboards, inventory management, CLI tools, chatbot backends) used by real teams, but not high-criticality systems where a bug directly impacts external customers or revenue.

Important nuance: I also act as Product Manager for the Platform team that owns these applications. This means I'm defining the problems and implementing the solutions. There's no friction or information loss between problem definition and implementation that typically exists in stream-aligned teams where PM and developers are separate roles. This setup favors faster iteration and tighter feedback loops (though it's worth noting this isn't representative of how most teams operate).

From November 2025 to January 2026, I wrote 424 commits across 6 repositories, spanning 44 active days (with Christmas holidays in the middle). Every single line of code was written with AI assistance: Cursor, Claude Code, the works. These weren't toy projects or weekend experiments. These were real systems evolving under active use.

The repositories varied wildly in maturity: from a 13-day-old Go service to a 5.6-year-old Python system with over 12,000 commits in its history. Half were greenfield projects under 6 months old; half were mature codebases years into their lifecycle. Combined, they represent ~107,000 lines of production code. These are small-to-medium projects. That's how our platform team works: we prefer composable systems over monoliths.

The period was intense: 9.6 commits per day average, almost double my historical pace. But AI didn't just make me faster at writing code. It fundamentally changed what kind of code I wrote.

I tracked everything. Every commit was categorized using a combination of commit message analysis, file change patterns, and manual review. Claude Sonnet 4.5 helped automate the initial categorization, which I then validated. And when I analyzed the data, I found something I wasn't expecting.

The Balance

For every hour I spent on new features, I spent over four hours on tests, documentation, refactoring, security improvements, and cleanup.

22.7% functionality. 98.3% sustainability.

Yes, that adds up to more than 100%. That's not an error: it's the reality of how development actually works. When I develop a feature, the same commit often includes tests, documentation updates, and code cleanup. The numbers reflect that commits are multidimensional, not mutually exclusive categories.

The ratio: 0.23:1 (Functionality:Sustainability)

This wasn't accidental. This was a deliberate experiment in sustainable velocity. And AI made it possible.

Breaking Down the 98.3%

8-Dimensional Commit Categorization

When I say "sustainability," I mean 8 specific, measurable categories:

  • Tests: 30.7%: The largest single category
  • Documentation: 19.0%: READMEs, API docs, inline comments
  • Cleanup: 13.8%: Removing dead code, unused features, simplification
  • Infrastructure: 12.0%: CI/CD, scripts, tooling improvements
  • Refactoring: 11.5%: Structural improvements, better abstractions
  • Configuration: 8.1%: Environment variables, settings, build configs
  • Security: 3.2%: Vulnerability fixes, security audits, input validation

These aren't "nice-to-haves." They're the foundation that makes the 22.7% of new functionality actually sustainable.

What Changed (And What Didn't)

Here's what I learned: tests and feedback loops were always important. Good engineers always knew this. The barrier wasn't understanding, it was economics and time.

What was true before AI:

  • Fast feedback loops were critical for velocity
  • Comprehensive tests enabled confident iteration
  • Documentation reduced knowledge silos
  • Some teams invested in this, many didn't grasp that sustainable software requires sustained investment in technical practices

What changed with AI:

  • The barrier to entry dropped dramatically
  • Building that feedback infrastructure became fast
  • Maintaining quality became economically viable for small teams
  • The excuse of "not enough time" largely disappeared

What didn't change:

  • Discipline is still our responsibility
  • The choice to balance features vs sustainability is still ours
  • AI doesn't automatically make us write tests: we have to choose to
  • The default behavior is still "ship more features faster" until technical debt forces a halt

The insight: AI removed the last excuse. Now it's about discipline, not capability.

For me, as a manager who codes in limited time, this changed everything. I can afford to build the feedback infrastructure that lets me iterate fast. The 0.23 ratio isn't a constraint, it's what enables the velocity I'm experiencing.

Negative Code: Simplification as a Feature

Here's another data point: 55,407 lines deleted out of 135,485 total lines changed.

That's 40.9% deletions. For every 3 lines I wrote, I deleted 2.

Some deletions were refactoring: replacing 100 lines of messy code with 20 clean ones. But many were something else: removing features that didn't provide enough value.

One repository, chatcommands, has net negative growth: the codebase got smaller despite active development. It's not alone. ctool also shrank during this period.

This connects to two concepts I've written about before:

Basal Cost of Software: Every line of code has an inherent maintenance cost. It needs to be understood, tested, debugged, and updated. The best way to reduce basal cost is to have less code.

Radical Detachment: Software is a liability to minimize, not an asset to maximize. The goal isn't more code, it's the right amount of code to solve the problem.

Before AI, deleting features was expensive:

  • Understanding old code took hours (documentation outdated)
  • Tracing dependencies was manual and error-prone
  • Verifying nothing broke required incomplete test suites
  • Updating docs and configs was tedious

Features became immortal. Once added, they never left, even at zero usage.

With AI, deletion becomes viable:

  • Trace dependencies in minutes, not hours
  • Comprehensive tests catch breaking changes immediately
  • Documentation updates happen alongside code changes
  • The entire deletion commit includes proper cleanup

The 13.8% cleanup category isn't just removing dead imports. It's removing dead features. Entire endpoints. Unused UI components. Configuration options nobody sets.

I call this Negative Velocity: making the codebase smaller, simpler, and faster, not just adding more.

This aligns with lean thinking about waste elimination. Every unused feature is waste: it increases build times, slows down tests, complicates mental models, and raises the basal cost of the system. Each line of code creates drag on everything else. By deleting features, we're not just cleaning up: we're reducing the ongoing cost of ownership. Fewer features means faster comprehension, simpler debugging, easier onboarding, and less surface area for bugs.

I'd deleted code before, but AI reduced the friction enough to make it routine instead of occasional. Deletion went from expensive to viable. We can finally afford to minimize the liability at the pace it deserves.

The best code is no code. Now we can actually afford to delete it.

The Metrics at a Glance

The key numbers:

  • 424 total commits across 44 active days (November 2025 - January 2026)
  • 9.6 commits per day average: nearly double typical velocity
  • Ratio Func:Sust = 0.23:1 (1 hour features, >4 hours sustainability)
  • Average Functionality: 22.7% per commit
  • Average Sustainability: 98.3% per commit (multidimensional, not mutually exclusive)
  • 135,485 total lines changed (80,078 insertions, 55,407 deletions)
  • 40.9% deletion ratio: for every 3 lines written, 2 deleted

These aren't aspirational numbers. These are the actual patterns from an intensive 11-week period of AI-assisted development in production repositories.

Different Projects, Different Profiles

Not every project should have the same ratio. Context matters.

  • inventory: 0.42:1 ratio: More feature-focused, greenfield project in active development
  • plt-mon: 0.25:1 ratio: Test-heavy, mature monitoring system needing reliability
  • ctool-cli: 0.16:1 ratio: CLI tool with emphasis on tests and robustness
  • chatcommands: 0.15:1 ratio: Maintenance-focused, net negative code growth (-1,809 lines)
  • ctool: 0.09:1 ratio: Minimal feature work, heavy focus on infrastructure and cleanup
  • cagent: 0.13:1 ratio: New project with emphasis on quality from day one

The chatcommands profile is particularly interesting: 31.5% of effort went to cleanup, and the repository actually shrank by 1,809 lines over this period. This isn't a dying project, it's a maturing one. Features were removed intentionally because they weren't providing value. The codebase got simpler, faster, and more maintainable.

The plt-mon repository maintains a 1.15:1 test-to-feature ratio: tests slightly outpace features. This is a production monitoring system where reliability matters, and the balance reflects steady feature growth with corresponding test coverage.

The ratio should reflect the project's phase and needs. AI makes all of these profiles viable without sacrificing quality or velocity.

What I Learned

After 11 weeks and 424 commits, here's what I've discovered:

Real velocity comes from fast feedback loops. Not from writing code faster, but from being able to iterate confidently and quickly. The 98.3% investment in sustainability isn't overhead, it's what enables speed.

AI changed what became economically viable. Before, building comprehensive test coverage as a manager with limited coding time would have been impossible. Now I can afford to build both the features and the safety net at sustainable pace. The barrier dropped; the discipline remains my responsibility.

Speed ≠ Velocity. Speed is how fast you move. Velocity is speed in the right direction. A team shipping 10 features per week with zero tests is moving fast toward a rewrite. A team shipping 3 features per week with comprehensive test coverage is moving fast toward sustainability.

What you optimize for gets amplified. My hypothesis: AI amplifies our choices. If you optimize for feature velocity, you'll accumulate technical debt faster. If you optimize for sustainable velocity (balancing features with quality infrastructure) you'll build healthier systems faster. I've seen this play out in my own work, though I don't claim this is universal.

Deletion is a feature. With lower barriers to understanding and changing code, we can finally afford to make codebases smaller. Net negative growth isn't stagnation, it's maturity.

The right ratio depends on context. My 0.23:1 ratio works for internal systems with moderate criticality, developed by a manager in limited time. Your context is different. The point isn't to copy my numbers, it's to be intentional about the balance.

This is still an experiment. I don't know if this approach scales to all teams or all types of systems. What I do know: for my context, over these 11 weeks, this balance produced the fastest sustainable velocity I've experienced in my career.

The shift wasn't learning new practices—I'd practiced TDD and built for sustainability for years. But as a manager coding in limited time, I always had to compromise. I wrote tests, but not as many as I wanted. I refactored, but not as thoroughly. I documented, but not as completely. AI didn't change what I valued—it changed what I could afford to do. The discipline I'd always practiced could finally match the standard I'd always wanted.

Your Turn

I don't have universal answers. But I do have a suggestion:

Measure your balance. Be intentional about it.

Track your next month of commits. Categorize them honestly. Calculate your Functionality:Sustainability ratio.

The number itself matters less than the awareness. Are you making conscious choices about where AI velocity goes? Are you building the feedback infrastructure that enables sustainable speed? Are you just shipping faster, or are you building better systems faster?

For me, the answer has been clear: investing heavily in tests, documentation, and simplification has made me faster, not slower. The 98.3% isn't overhead, it's the engine.

Your mileage may vary. Your context is different. But the question is worth asking:

What kind of engineering does AI make viable for you that wasn't before?

Related Posts