Sunday, May 03, 2026

Good talks/podcasts (May I)

These are the best podcasts/talks I've seen/listened to recently: Reminder: All of these talks are interesting, even just listening to them.

You can explore all my recommended talks and podcasts on the interactive picks site, where you can filter by topic, speaker, and rating: Related:

Sunday, April 26, 2026

How I use Claude Code to maintain an Obsidian vault

Seven mental models had been sitting in my inbox for weeks. Rough captures I'd jotted down while reading and watching things online: a stub here, a link there, a sentence I wanted to come back to. I asked the agent to flesh them out into proper notes. It dropped each one in the right folder, got the tags right, updated the central index, and committed the batch with a sensible message. No broken links. All in one session, and I didn't touch a single file.

That sentence is easy to write and easy to misread. It sounds like magic, or like marketing copy, or like the kind of thing people say about AI before you try it yourself and it puts notes in the wrong folder and breaks half the links in your vault. So let me explain what actually made it work, because it wasn't the model. It was the scaffolding.

A previous post described the general system: markdown files on Dropbox, versioned with Git, an AI agent that operates on the vault. This one is about a narrower question the first post barely touched. How do you give an agent durable, specific knowledge of a vault it didn't create, so that it stays consistent across sessions, not just within one?

Rules as the agent's memory

Claude Code reads a set of rule files at the start of every session. They live in .claude/rules/ and load automatically. This is the part most write-ups about "AI plus Obsidian" skip. The rules are not a prompt I rewrite each time, and they're not a long system prompt I crafted once and forgot about. They are operational memory: stuff that persists between sessions and grows the more I use the agent.

The main file, obsidian.md, runs to a few hundred lines. It describes the PARA structure and where each type of content belongs, how wiki-links work, the format conventions for literature notes, evergreen notes, Maps of Content (MOCs), mental models. It also tells the agent which Python script to run in which situation and how to interpret the output. A second file, vault-management.md, is more operational: when to use find_broken_links.py versus fix_broken_links.py, what to do with a fuzzy match at 73% versus 91% confidence. A third, master-viu-presentaciones.md, covers the conventions for the master's program materials I teach.

None of these files is static. After every session where I learn something, I update the relevant one. Sometimes it's a convention I'd been applying inconsistently. Sometimes it's a pattern that worked better than what I had documented. Sometimes it's just a rule I'd been meaning to write down for weeks. The next session starts with that lesson already loaded, and the agent doesn't need to rediscover it.

This is what separates "I told the agent the rules" from "the rules live in the repo and improve with use." Most setups stop at the first. Getting an agent to write a note is trivial. Getting it to write a note that's consistent with 774 other notes it didn't write, and to keep it consistent over months, is what needs this kind of explicit, maintained context.

The zero broken links rule

The single most important rule in the vault is this one: if you're not certain a note exists, don't create the wiki-link. Use plain text instead.

A broken link in Obsidian isn't a 404. It's a ghost. The link renders, it shows up in the graph view, it looks like a connection. But click it and you're in an empty note. Broken links are promises the vault never kept. They make the graph misleading. They accumulate silently, each one suggesting knowledge that isn't there.

The rule is enforced two ways. In the agent's behavior: before writing [[SomeConcept]], Claude Code runs a Glob search to confirm the file exists. If it doesn't, it uses plain text. No exceptions. And in a Python script:

make find-broken-links

This runs after any session that involved creating or editing notes. It parses every .md file in the vault, extracts every [[wiki-link]], and checks that a file with that name exists. The output is a table. The acceptable result is zero.

There's a principle I keep coming back to here: you can't trust an agent's self-reporting on something like link integrity. The agent thinks it checked. The script actually checked. Both have to be in the loop.

The mechanical layer

Behind the agent there are 15 Python scripts wired together through a Makefile. They are the part of the system that doesn't improvise.

find_broken_links.py I've already described. Its companion, fix_broken_links.py, handles the cases where a broken link is a typo or a slightly different name of an existing note. It uses difflib for fuzzy matching, configurable threshold, interactive confirmation by default. --auto-apply --threshold 0.90 handles the obvious cases without asking; --dry-run shows what would change without touching anything. validate_frontmatter.py scans every note and checks that the YAML is valid, that title is present, and that tags is a list rather than a bare string. These drift surprisingly fast during fast note creation, and catching them at the script level keeps them from accumulating into a future cleanup job.

The principle behind the whole layer is the one above: the agent decides, the scripts verify and execute the repeatable mechanics. The agent says "I created seven notes." The script confirms whether the wiki-links between them resolve. Two different jobs, run by two different actors, with no overlap of trust.

The middle layer

Between rules (context for the agent) and scripts (mechanical operations), there is a third layer: skills. These are workflow protocols that combine agent judgment with script execution.

The vault has six. create-note creates a note with the right frontmatter, in the right PARA location, with the right tags. If something's missing it asks before guessing. fix-broken-links runs the detection script, applies the obvious fixes automatically, and surfaces the edge cases for me to decide. vault-health-check runs the full check-health suite and tells me what to fix and in what order.

The distinction I care about is this: a skill is not a prompt. A prompt runs once and the agent improvises the rest. A skill is a protocol. It's explicit about which tools to use, in which order, what to verify, and when to ask the human. The same skill runs the same way every time. That's what you want for maintenance work. Boring, repeatable reliability. Not novelty.



The pattern: friction → script → skill → rule

None of this was designed upfront. It evolved from use, and the evolution is visible in the commit history.

Each piece started as a friction. I'd notice I was making the same judgment call session after session, or hitting the same dull task by hand, or repeating a correction I'd already made. The first response was a script for the mechanical part. Then, when the script was being called the same way each time, a skill that wrapped the orchestration. Then, when the lesson was something the agent should always know, a line in the rules so the next session would start with it loaded.

The presentation materials for the master's program followed exactly this arc. After the third session I noticed I'd made the same mistakes twice: too many bullets per slide, explanatory content in visible slides instead of speaker notes, material that didn't fit the session dumped at the end of the main presentation. I spent an hour updating the rule file with what I'd learned. From session four onward, Claude Code applied those lessons automatically. I didn't have to remember them. The rules remembered them for me.

Every problem I solve ends up encoded in a rule, a script, or a skill. The next time the same problem turns up, the agent already knows the answer. The documentation I write once keeps working session after session, which is why this thing keeps getting more useful without me planning for it.

More tools, same scaffolding

The tools I lean on most share a shape: text in, text out, callable from a make target. That's what lets them attach to the loop without friction. The agent operates them like it edits a note. Each one adds a capability without changing how the system works.

  • Mermaid: diagrams written as .mmd files. The agent generates them, renders to PNG via mmdc, embeds the result in the note, and commits, all from a single make target. I use it for architecture sketches, process flows, the kind of thing I used to draw on a whiteboard and lose.

  • Marp: slide decks written in markdown with annotations, compiled to PDF and PowerPoint. I ask the agent to change a slide, it edits the .md, regenerates the PDF, and commits. The thing I value most is mundane: I can git diff between session 3 and session 7 of the same master's class.

  • Excalidraw: Obsidian stores its sketches as JSON inside regular .md files, so the agent edits them like any other text. Whiteboard-style diagrams that live next to the note that needs them, refined by prompt instead of by hand.

  • notebooklm-py: a client for the NotebookLM API. The agent passes a URL, a transcript, or a PDF and gets back topics, key points, and a structured summary. That's how external material lands in the vault with a consistent shape, without me writing every summary by hand.

  • yt-dlp: YouTube extraction. The vault uses it for talk-note metadata (title, channel, duration, captions), and also for downloading the video or the audio when I want a local copy of something I might lose access to.

  • markitdown: PDF and Word to markdown. It's how a paper or a slide deck someone shares ends up inside the vault as text the agent can read, summarise, and link to whatever else is already there.

The pattern repeats every time. If a tool speaks text and a Makefile target can wrap it, the agent inherits the capability. Adding a new one is a few lines of glue, not a rewrite.

What the agent doesn't do

The agent doesn't decide what's worth capturing. That judgment is still entirely mine. It doesn't surface connections I haven't thought of, at least not reliably enough to trust. It doesn't write the parts of notes that actually matter: the "applications in software" section of a mental model note, the personal reflection on why a talk changed how I think about something, the synthesis between two ideas I read six months apart.

What it does is remove the friction between having an idea and that idea being properly integrated into the vault. Before: I'd capture something in the inbox, know vaguely where it should go, not quite have the energy to do all the steps correctly, leave it in limbo. Now: the skill handles the scaffolding, I make the judgment calls, the note ends up in the right place with the right tags and no broken links.

The honest version is that the system requires active maintenance. If I'm lazy about updating the rules after learning something new, the next session starts with slightly worse context. The skills are only as good as the last time I refined them. The scripts catch what they're designed to catch. They are not magic.

But the direction is right. Each round makes the agent a bit more useful in this specific vault, and each round is cheap: a few lines of rules, a Makefile target, a refined skill. Unlike a one-off prompt, none of it evaporates when the session ends.

Where this leaves me

774 notes, 15 scripts, 33 Makefile targets, 6 skills, 3 rule files. Around 100 commits in the last two months, most of them from sessions where I was building something else and the vault maintenance happened as a side effect.

The test I use: after a week off, how long does it take me to pick up where I left off? With the previous system (no scripts, no rules, ad-hoc organization), reorienting took twenty or thirty minutes of reading through recent notes to figure out what was where. Now it's opening the vault, running make check-health, skimming recent commits. Five minutes.

Back to the question this post opened with: what makes Claude Code reliable in this specific vault? The short answer is that reliability doesn't come from the model. It comes from making conventions explicit. Rules the agent reads at the start of every session. Scripts that verify what the agent thinks it did. Skills that turn a vague instruction into a written protocol. The model fills in the parts that weren't worth automating. Everything else is documented, checked, and reused.

I've built this kind of thing in software teams: explicit conventions, automated checks, tools that hold on to what the team has figured out so new members don't start from zero. The dynamics are the same here. The agent is the new team member who read the onboarding docs, runs the checks before committing, and asks when something isn't covered. The documentation just happens to live in .claude/rules/.

Related reading

Sunday, April 12, 2026

My second brain: markdown, Dropbox, and an AI agent

1,222 notes, 24 categories, zero vendor lock-in. My knowledge management system fits on a thumb drive.

The starting point

The oldest notes in this vault are from June 2013. Talk notes, loose ideas, book quotes, technical decisions, reflections on teams and processes. They started out in Google Keep, in scattered files, in tools I no longer remember the name of. Over time I migrated them into markdown files by hand. Slowly, in bursts, never quite finishing.

The second brain always had valuable stuff in it. Interesting ideas, connections between concepts, reference material that had taken me hours to compile. I found notes when I needed them, connected ideas across sessions. But I had this persistent feeling that there was more in there than I was getting out. Hundreds of notes that didn't talk to each other. Knowledge that accumulated but didn't compound.

And then there was the Google Keep backlog. Over the years I'd piled up around 1,500 notes and links in Keep. Quick captures, things I meant to process "later." The problem was that Google Keep has no API, so getting stuff out was painful enough that I just... didn't. The backlog grew. Every time I opened Keep I felt the weight of it.

Last Christmas I decided to just rip the band-aid off. Exported everything via Google Takeout, deleted it all from Keep, and sat down with the raw files. Using AI I built a throwaway classification pipeline: a combination of heuristics and a human-in-the-loop process where the system proposed a category and I made the final call. In a couple of days, 1,500 notes were classified and integrated into the vault. Bye bye, Google Keep.

That was the moment it clicked. The friction of processing notes had been the bottleneck for years, not the lack of a system. Once I could generate tooling on demand to solve a specific problem, the backlog that had haunted me for ages just... dissolved.

Separately, I'd been curating recommended technical talks in eferro-picks for years. 855 talks, 465 speakers, gathered over 6 years. Another project with valuable information sitting in its own repo, its own structure, disconnected from the vault.

Both projects are now one. What I have is an Obsidian vault of markdown notes on Dropbox, versioned with Git, maintained by an AI agent that understands the structure. The talks from eferro-picks live inside the vault now. 402 of the 855 have full notes so far: all recent talks get them automatically, and I'm gradually backfilling the older ones that only had a title and a link. The vault is the source of truth for the talks, not the other way around. I even have automated pipelines to process new talks I watch and want to recommend (but that's a story for another post).

It's not an app. It's a folder of text files with a system on top. And for the first time in over ten years of note-taking, I feel like I'm getting out of it the value I always sensed was there.

Why plain text files

The most important decision in the system is the most boring one: everything is markdown. .md files I can open with any text editor, on any operating system, with no special tooling.

It sounds like a non-decision. It isn't. There's no database. No proprietary format. No server. If Obsidian disappears tomorrow, or Dropbox, or Claude, the notes are still readable. I can search with grep, edit with emacs, sync with rsync. The format will outlive whatever tool I'm using to read it this year.

Dropbox syncs across devices with zero configuration. Git versions everything, so every change is recorded, I can see diffs, I can roll back. Between the two I get sync and version control almost by accident. Dropbox keeps files up to date across machines, Git keeps the history.

There's a side effect I didn't appreciate until I was living it: I can edit a note on my laptop, close the lid, pick up my phone and keep writing where I left off. Or switch to another machine at home. No export, no sync button, no waiting. Dropbox just does it. Obsidian on mobile opens the same files, same links, same structure. It sounds trivial, but it removes the last excuse for not capturing an idea when it shows up.

Not sophisticated tools. Tools that work, and have been working for decades.

Obsidian is the reading interface. Wiki-links ([[Note]]) create a navigable knowledge graph, backlinks connect ideas both ways, the graph view shows you clusters you didn't know were there. But Obsidian is a view on files, not a platform. If something better comes along, I switch. The files don't care.

The vault's knowledge graph in Obsidian.
The clusters form naturally from wiki-links and shared tags.

And it turns out that "just text files" goes further than notes. Diagrams in the vault are Mermaid, a text format that renders into flowcharts and architecture visuals. Obsidian stores Excalidraw sketches as JSON inside regular .md files. The presentations for the master's program I teach live in Marp: markdown with a few annotations that compiles into slide decks, PDFs and PowerPoints. I can git diff a diagram the same way I diff a note, and an AI agent can generate a Mermaid flowchart as easily as it writes prose. To the agent it's all text.

The structure: PARA, loosely

Notes are organized following Tiago Forte's PARA methodology, without being religious about it.

Projects for things with a deadline or deliverable. Right now: a master's program I'm teaching, this blog, a house renovation. Areas for ongoing stuff: writing, professional network. Resources is where most of the vault lives, reference material organized by topic across about 24 categories. Archive for completed projects that should stop getting in the way.

If you've read my blog before, the topics won't surprise you. The most tagged subjects are engineering culture, AI, agile, continuous delivery, software design, architecture, product and XP. Throw in DevOps, lean, teams and testing and you have a pretty accurate map of what I spend my time thinking about. The vault just makes it explicit.

Then there's The Forest (the vault borrows the digital garden metaphor), a directory with Maps of Content: thematic indexes connecting scattered notes. And Sources, where the 402 talk notes live.

Beyond folders, notes carry two kinds of tags. Topic tags (on/software-design, on/lean, on/ai) say what a note is about. Maturity tags say how developed it is: state/seedling for a raw capture, state/budding for something I've worked on but isn't finished, state/evergreen for a note I consider solid. Most of the vault is still seedlings and budding notes. The evergreen ones are the minority, which is honest: note count and idea quality are different things.

The structure isn't perfect. What matters is that it's predictable. If I'm looking for a quote, I know it's in 3_Resources/Quotes/. Finished projects go to 4_Archive/. This predictability is what lets an AI agent work on the vault without asking me where things go.

The leap: an AI agent that gets the vault

Here's where it gets interesting. Claude Code is not a chatbot I ask things about my notes. It's an agent that operates directly on the files. Creates notes, edits them, runs scripts, checks the vault's health. All while following rules I've written over time.

The rules live in .claude/rules/, configuration files the agent reads at the start of every session. Where to put each type of note. What YAML frontmatter to include. Naming conventions. How to verify links aren't broken. Not suggestions. Constraints.

Here's what that looks like in practice. I say: "create notes for these 7 mental models." The agent creates 7 files in 3_Resources/system_thinking/, each with the right structure (core idea, software applications, limitations, connections), tags them on/mental-model, updates the central Map of Content in The Forest/Mental Models.md placing each in the right domain category, and before writing any wiki-link, checks that the target note actually exists. I don't touch a file. But the rules that made this work? I wrote those myself, one session at a time, encoding what I'd learned about how my vault should behave.

Three layers, one system

What makes this a system and not just "a chatbot writing files" is three layers working together. I keep coming back to this because it's the part people misunderstand.

The first layer is the agent with domain context. Claude Code doesn't see loose files. It knows a new talk note needs topics from the 80+ taxonomy and a link to the speaker's page. A quote goes in Quotes/ with the author's name as tag. A project has a deadline, an area doesn't. Rules give it semantics, not just file paths.

The second layer is 15 Python scripts behind 30 Makefile targets.

One group keeps the vault healthy: a script that parses every wiki-link and checks it against existing files, another that validates YAML frontmatter, another that finds stray images and moves them next to the notes that reference them.

A second group powers the talk pipeline: sync from eferro-picks, pull metadata from YouTube via yt-dlp, run content through NotebookLM, generate blog post HTML.

A third handles the master's presentations: Mermaid to PNG, Marp to PDF and slide decks.

The agent runs all of this through make targets. When it checks for broken links, it's not guessing. It's running a real script with real output.

The third layer is 6 skills. These aren't prompts. They're protocols: which tools to use, in what order, what to verify, what to ask me before proceeding. When I say "process this talk," the agent doesn't figure it out from scratch. It follows a written workflow that I've refined over multiple iterations. Same when I say "check vault health" or "organize the inbox." Each skill encodes a complete task, not a vague instruction.

Take away any one layer and the thing falls apart. Without scripts the agent would improvise something different every time. Without the agent the scripts are just CLI tools I'd have to remember to run. Without rules the skills wouldn't know what decisions to make.

How it actually evolved

The current state of the system is less interesting than how it got here. Because none of this was planned.

The talk pipeline started as a single script to sync a JSON with the vault. I was doing it by hand and it was tedious, so I automated the sync. Then I wanted the agent to be able to run it, so I wrote a skill. Then I wanted automatic topic extraction, so I plugged in NotebookLM. Then I wanted to publish talks as blog posts, so I wrote another script. Over a weekend, iterating with the agent. From copy-paste to a pipeline processing 402 talks and spitting out HTML.

The pattern repeats. I use the vault, hit a friction. Write a script for the mechanical part. Wrap it in a skill so the agent can orchestrate it with judgment. Add rules so the agent remembers the lesson next time. Now the agent is better at that task, which frees me to notice new frictions.

The master's program presentation rules followed the same arc. After session 3, I jotted down what worked and what hadn't. Slides had too much text. Extra material should be in separate files. Speaker notes needed a timeline. I turned those lessons into rules. From session 4 on, Claude Code applied them without me having to say anything. The rules became the shared memory between me and the agent.

Zero broken links

Of all the rules in the vault, one appears in three separate files. I consider it the most important: zero broken links.

In Obsidian, a wiki-link [[Something]] is a promise that "Something" exists as a note. If it doesn't, the link is noise. It suggests content that isn't there, pollutes the knowledge graph, blurs the line between what's real and what's aspirational. And broken links breed. One becomes five, five become thirty, and suddenly your graph is full of ghosts.

The rule is simple: if you're not sure a note exists, use plain text. The agent checks before writing any [[Concept]]. If the file doesn't exist, it writes "Concept" without brackets. After every edit, make find-broken-links. If it created broken links, it fixes them before moving on.

Prevention, not correction. Same principle as tests in code. Cheaper to not introduce the bug.

What this isn't

I don't want to oversell this.

It's not effortless. Rules need writing, scripts need maintaining, structural decisions need making. There are 15 Python scripts and 30 Makefile targets for a personal vault. Is that over-engineering? Probably, partly. Some of it is hobby. Some is me poking at what's possible when you point an AI agent at a folder of text files.

It's not a system where AI does the thinking. Claude Code doesn't decide what's worth keeping, how to categorize a concept, or which connections matter. I'm the one who decided mental models live in system_thinking/ regardless of their domain. I'm the one who decided that domain categorization happens in the MOC, not in the filesystem. The agent executes those choices at scale. But the design is mine.

And it's not for everyone. You need to be comfortable with a terminal, with Git, with text files. If you want a polished app with automatic sync and zero setup, use Notion. Seriously. It's fine.

The question I'm left with

Every time I codify a rule or a skill, the agent gets more capable. But also more opinionated. The rules reflect my decisions today: my taxonomy, my structure, my conventions. What happens when my thinking changes? Do the rules become inertia, or are they an explicit record of decisions I can consciously revisit?

Git has the full history. I can see when I added each rule and why. I can change them. But there's a gap between being able to change them and actually doing it when the system works well as-is.

After years trying tools, what works for me turns out to be the most boring stack imaginable: text files, a synced folder, an agent that knows the garden's rules. 1,222 notes and counting.

Whether the system helps me think better or just organize faster what I was already thinking, I genuinely don't know. Probably both. Probably the distinction doesn't hold up under scrutiny.

Related reading

The methodology and ideas behind this:

  • The PARA Method — Tiago Forte's original post on the organizational system this vault uses
  • Evergreen notes — Andy Matuschak's thinking on notes that evolve and compound over time, which inspired the seedling/budding/evergreen maturity model
  • Building a Second Brain — Tiago Forte's broader framework for personal knowledge management

The tools:

  • Obsidian — The editor I use as a view on the vault's markdown files
  • Claude Code — The AI agent that operates on the vault
  • Marp — Markdown to slide decks
  • Mermaid — Diagrams as text

Sunday, April 05, 2026

Good talks/podcasts (April I)

These are the best podcasts/talks I've seen/listened to recently:
  • No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer 🔗 talk notes (Dex Horthy) [AI Assisted Engineering] [Duration: 00:20] (⭐⭐⭐⭐⭐) Dex Horthy explores advanced context engineering and the "Research, Plan, Implement" (RPI) workflow to effectively solve complex problems in brownfield codebases while minimizing AI-generated "slop" and maintaining team alignment.
  • Kent L Beck: You’re Ignoring Optionality… and Paying for It 🔗 talk notes (Kent Beck) [Agile, Engineering Culture, Software Design] [Duration: 00:49] (⭐⭐⭐⭐⭐) Kent Beck discusses the tension between delivering features and maintaining software "optionality," advocating for a "tidy first" approach to make hard changes easy by improving code structure as both an economic and moral necessity.
  • #156 How to deploy lean projects and more with author Michael Balle 🔗 talk notes (Michael Balle) [Engineering Culture, Lean, Management] [Duration: 01:02] (⭐⭐⭐⭐⭐) Michael Ballé redefines Lean as a humanistic engineering philosophy centered on "making people before making parts" by prioritizing technical competence, Gemba-based collaboration, and strengthening workplace conditions to bridge the gap between top-down management and true frontline engagement.
  • Platform Engineering in 2025: Still Stuck in Ticket Hell? 🔗 talk notes (Steve Smith) [Devex, Platform engineering] [Duration: 00:07] Escaping "ticketing hell" by evolving platform engineering from a manual service desk into an automated self-service model that reduces queue times and empowers delivery teams to accelerate.
  • Forget Velocity, Let's Talk Acceleration • Jessica Kerr • GOTO 2017 🔗 talk notes (Jessica Kerr) [Engineering Culture, Mental models, Systems Thinking] [Duration: 00:54] (⭐⭐⭐⭐⭐) Jessica Kerr redefines software development as software parenting and system moving, arguing that teams should prioritize acceleration—the ability to change direction and improve the system—over mere velocity by fostering generativity and mutual learning through strategic automation.
  • The Best Product Engineering Org in the World 🔗 talk notes (James Shore) [Engineering Culture, Product Strategy, Technology Strategy, agile-XP] [Duration: 01:40] (⭐⭐⭐⭐⭐) James Shore outlines a holistic framework for building a world-class engineering culture by focusing on six core pillars—People, Internal Quality, Lovability, Visibility, Agility, and Profitability—while leveraging Extreme Programming (XP) and Fluid Scaling Technology (FaST) to drive sustainable business impact.
  • What Skills Do Developers NEED To Have In An AI Future? 🔗 talk notes (Trisha Gee, Kent Beck) [AI Assisted Engineering, Engineering Culture, Technical leadership] [Duration: 00:24] (⭐⭐⭐⭐⭐) This videopodcast examines how AI-augmented development shifts the developer's role from writing syntax to exercising high-leverage skills like curiosity, design taste, strategic testing, and effective communication to navigate rapid feedback loops and maintain optionality.
  • o11ycast - Ep. #87, Augmented Coding Patterns with Lada Kesseler 🔗 talk notes (Lada Kesseler, Jessica Kerr, Ken Rimple) [AI Assisted Engineering, Generative AI, tdd] [Duration: 00:48] (⭐⭐⭐⭐⭐) Lada Kessler introduces Augmented Coding Patterns to navigate the "black box" of AI-assisted development by employing specialized single-purpose agents, high-level test specifications, and emoji-based context markers to monitor an agent's focus and internal knowledge.
  • The state of VC within software and AI startups – with Peter Walker 🔗 talk notes (Peter Walker, Gergely Orosz) [AI, Engineering Culture, startup] [Duration: 01:19] A data-driven exploration of how shifting venture capital dynamics and AI are reshaping startup hiring, team structures, and the engineering landscape.
  • Should Test-Driven Development (TDD) Be Used MORE In Software Engineering? 🔗 talk notes (Emily Bache, Dave Farley) [Agile, Software Design, tdd] [Duration: 00:26] (⭐⭐⭐⭐⭐) This expert discussion highlights how Test-Driven Development (TDD) acts as a fundamental software design tool that facilitates Agile development by providing constant feedback, enforcing separation of concerns, and enabling developers to proceed with confidence through small, iterative steps.
  • An AI state of the union: We’ve passed the inflection point & dark factories are coming 🔗 talk notes (Simon Willison) [AI Assisted Engineering, Security, tdd] [Duration: 01:39] (⭐⭐⭐⭐⭐) Simon Willison explores the "inflection point" of AI in software development, detailing agentic engineering patterns, the rise of "dark factories," and the critical security challenges posed by prompt injection.
  • Data vs Hype: How Orgs Actually Win with AI - The Pragmatic Summit 🔗 talk notes (Laura Tacho) [AI, Developer Productivity, Devex] [Duration: 00:29] A data-driven exploration of how organizations can move beyond AI hype to achieve real impact by focusing on developer experience, organizational transformation, and clear measurement frameworks.
  • Making Codebases Agent Ready – Eno Reyes, Factory AI 🔗 talk notes (Eno Reyes) [AI Assisted Engineering, Developer Productivity, Testing] [Duration: 00:15] This talk explores how rigorous automated validation and specification-driven development serve as the essential foundation for scaling autonomous AI agents and unlocking exponential engineering velocity.
  • The Most Polarizing Practice In Modern Software Engineering? 🔗 talk notes (Dave Farley, Dan North) [CI, Trunk Based Development, tdd] [Duration: 00:33] Dave Farley and Daniel Terhorst-North explore the industry's most polarizing practices, such as trunk-based development and estimation, advocating for a pragmatic and outcome-focused approach to software engineering.
  • The Forest & The Desert Are Parallel Universes • Kent Beck • GOTO 2025 🔗 talk notes (Kent Beck) [Compliance, Engineering Culture, XP] [Duration: 00:39] (⭐⭐⭐⭐⭐) Kent Beck contrasts trust-based "Forest" and control-driven "Desert" development cultures, revealing how these parallel universes fundamentally redefine the meaning of metrics, accountability, and engineering practices.
Reminder: All of these talks are interesting, even just listening to them.

You can explore all my recommended talks and podcasts on the interactive picks site, where you can filter by topic, speaker, and rating: Related:

Sunday, March 29, 2026

El libro que llevo 25 años intentando no tener que escribir

"Hoy, cuando la última moda es el AI-assisted coding, sonrío pensando que los principios que aprendí hace diez años siguen siendo los mismos." — del prólogo, por una ingeniera que trabajó en el equipo

Durante más de dos décadas he explicado las mismas ideas a los mismos tipos de equipos. Equipos distintos, empresas distintas, contextos distintos. Y sin embargo, el patrón se repetía con una regularidad que ya no puedo ignorar.

Equipos talentosos. Equipos capaces. Tecnología razonable. Y aun así: la sensación de correr cada vez más fuerte para avanzar cada vez menos.

Lewis Carroll lo describió mejor que yo en A través del espejo: "Aquí, como ves, se requiere correr todo cuanto se pueda para permanecer en el mismo sitio." Durante años pensé que esa metáfora era exagerada. Ya no.

Parte del problema es cómo pensamos sobre el software. Decimos que lo "construimos", como si fuera algo que se termina y queda ahí. Pero el software no se construye. Se cultiva. Es un sistema vivo que crece, cambia y se degrada. Y los sistemas vivos necesitan atención continua, no solo construcción.

He llegado a la conclusión de que lo más honesto que podía hacer era escribirlo.

De qué va el libro

"Menos software, más impacto" tiene un subtítulo que no deja mucho a la imaginación: Cómo evitar que tu equipo colapse bajo el peso de su propio código.

La tesis central es incómoda: el mayor problema de la mayoría de los equipos no es que escriban código malo. Es que escriben código de más.

El software existente consume recursos continuamente, lo uses o no. Cada funcionalidad añadida, cada integración, cada decisión de diseño que se acumula sin revisión tiene un coste que no aparece en ningún roadmap pero que aparece cada día. Lo llamo el coste basal del software, en analogía al metabolismo basal de un organismo: el gasto mínimo para seguir funcionando. Y como el metabolismo, si no se gestiona, crece hasta consumir toda la energía disponible.

El libro recorre cuatro grandes bloques:

  • Fundamentos: qué es el Lean Software Development y por qué el coste basal es el concepto central que lo conecta todo
  • Los cinco principios: eliminar desperdicios, amplificar el aprendizaje, decidir en el último momento responsable, entregar lo antes posible, empoderar al equipo
  • Calidad sostenible: por qué la calidad no es el enemigo de la velocidad sino su única base duradera
  • Pensamiento sistémico: optimizar el todo, integrar Lean con XP y mentalidad de producto, y qué pasa si no haces nada

Son 192 páginas. Basadas en más de 25 años de experiencia en equipos reales: Alea Soluciones, The Motion, Nextail, Clarity AI. Con casos concretos, conflictos reales y errores propios reconocidos. El libro incluye también las perspectivas de ocho profesionales que han vivido estas transformaciones desde dentro, en distintos roles y contextos.

Para quién es (y para quién no es)

Este no es un libro para quien quiere mejorar su código de forma individual. Hay libros excelentes para eso y este no es uno de ellos.

Es para quien toma decisiones sobre qué se construye, qué no se construye y qué se elimina. Engineering Managers, Tech Leads, Product Managers, CTOs. Cualquier persona con responsabilidad directa sobre la capacidad de un equipo a seis meses, un año, tres años vista.

Si tu día a día es decidir prioridades, gestionar capacidad y negociar alcance, lo que viene en el libro te va a resultar familiar. Y probablemente incómodo. Esa es la intención.

Por qué ahora

Hay mucha literatura sobre Lean, XP y Agile en inglés. En español, menos de la que debería haber. Y casi ninguna que combine los tres enfoques de forma integrada, con casos reales de equipos que conozco de primera mano.

Además, el contexto actual lo hace más urgente. La aceleración que trae la IA hace que las decisiones sobre qué construir y qué no construir sean más importantes, no menos. Amplificar la capacidad de un equipo que ya construye demasiado no resuelve el problema. Lo acelera.

El borrador está completo. Ahora viene la revisión, la edición y la preparación para publicación. Si te interesa estar entre los primeros en leerlo, escríbeme: eferro@eferro.net

Sunday, March 01, 2026

Encoding Experience into AI Skills

I'd been tweaking my augmented coding setup for months - adjusting CLAUDE.md rules, adding instructions for testing discipline, complexity management, incremental delivery. Things I've repeated to every team I've worked with, now repeated to AI agents. It worked, but it felt like writing the same email over and over.

Then I found Lada Kesseler's skill-factory.


What Skills Are (And Why They Matter)

If you use Claude Code, you already know about CLAUDE.md - a file where you put instructions that the agent reads at the start of every conversation. It works. But it has a problem: everything is always loaded. Your TDD guidelines, your Docker best practices, your refactoring workflow - all of it competing for the agent's limited context window, whether it's relevant or not.

Skills solve this differently. They're packaged knowledge that activates only when relevant. You type /mutation-testing and the agent gains deep expertise about finding weak tests through mutation analysis. You type /complexity-review and it becomes a technical reviewer that challenges your proposals against 30 dimensions of complexity. The rest of the time, that knowledge stays out of the way.

Think of it as progressive disclosure for AI context. The agent gets what it needs, when it needs it.

The Discovery: Lada Kesseler's Skill Factory

Lada Kesseler built the skill-factory - a repository with 315 commits of carefully crafted skills covering serious engineering ground: TDD, Nullables (James Shore's pattern for testing without mocks), approval tests, refactoring (using Llewellyn Falco's approach), hexagonal architecture, event modeling, collaborative design, and more.

These aren't toy prompts. The Nullables skill alone includes reference material for infrastructure wrappers, embedded stubs, output tracking, and three different architectural patterns. The approval-tests skill covers Java, Python, and Node.js with scrubbers, reporters, and inline patterns. This is deep, carefully structured knowledge.

Lada also co-created augmented-coding-patterns - a catalog of 43 patterns, 14 obstacles, and 9 anti-patterns for working effectively with AI coding tools. It's a collaboration between Lada Kesseler, Ivett Ordog, and Nitsan Avni. If you're doing augmented coding and haven't seen it, stop reading this and go look.

What I found wasn't just a collection of skills. It was an approach to sharing engineering knowledge with AI agents that I hadn't seen anywhere else.

The Fork as Extension

The natural next step wasn't to start from scratch - it was to fork and extend. Lada's skills already covered testing fundamentals, design patterns, and AI-specific workflows. What I noticed missing were the practices I kept explaining repeatedly: how to manage complexity, how to deliver incrementally, how to make sure tests actually catch bugs.

So I added 11 skills. Not because 16 wasn't enough, but because my particular set of problems needed particular solutions.

You can find my extended fork at github.com/eferro/skill-factory with all 27 skills ready to use.

Testing rigor

test-desiderata - Kent Beck's 12 properties that make tests valuable. Not "does this test pass?" but "is this test isolated? composable? predictive? inspiring?" I was tired of AI generating tests that had coverage but no diagnostic power. This skill makes the agent evaluate tests against each property and suggest concrete improvements.

mutation-testing - The question code coverage can't answer: "Would my tests catch this bug?" Coverage tells you what your tests execute. Mutation testing tells you what they'd detect. I'd already written a blog post about this - now it's a reusable skill. The examples are in Python and JavaScript, but I'm also using it successfully with Go.

Delivering incrementally and managing complexity

This is where the skills chain together, and where things get interesting.

story-splitting - Detects linguistic red flags in requirements ("and", "or", "manage", "handle", "including") and applies splitting heuristics. It's the first pass: is this story actually three stories wearing a trenchcoat?

hamburger-method - When a story doesn't have obvious split points but still feels too big, this skill applies Gojko Adzic's Hamburger Method: slice the feature into layers, generate 4-5 implementation options per layer, then compose the thinnest possible vertical slices.

small-safe-steps - The implementation planner. Takes any piece of work and breaks it into 1-3 hour increments using the expand-contract pattern for migrations, schema changes, API changes. Core belief: risk grows faster than the size of the change.

complexity-review - My inner skeptic, encoded. Reviews technical proposals against 30 dimensions of complexity across 6 categories (data volume, interaction frequency, consistency requirements, resilience, team topology, operational burden). Pushes for the simplest viable approach. Use it when someone says "Kafka" and you want to ask "why not a queue?"

code-simplifier - Reduces complexity in existing code without changing behavior. The cleanup crew after a feature is done.

These five skills work as a pipeline: story-splitting -> hamburger-method -> small-safe-steps for delivery planning, with complexity-review as a gate before implementation and code-simplifier as a sweep after.

Practical tools and team workflows

thinkies - Kent Beck's creative thinking habits, turned into a skill. When you're stuck, it applies patterns like "What would I do if I had infinite resources?", "What's the opposite of my current approach?", "What would make this problem trivial?" It's less about code and more about unsticking your thinking.

traductor-bilingue - Technical translation between English and Spanish that keeps terms like "deploy", "pull request", "pipeline", and "staging" in English (because that's how Spanish-speaking dev teams actually talk). Small thing, but it saves constant corrections.

dockerfile-review - Reviews Dockerfiles for build performance, image size, and security issues.

modern-cli-design - Principles for building scalable CLIs: object-command architecture (noun-verb), LLM-optimized help text, JSON output, concurrency patterns.

A Skill in Action

To make this concrete, here's what the delivery planning pipeline looks like in practice.

Say you have a story: "As a user, I want to manage my notification preferences including email, SMS, and push notifications with scheduling and quiet hours."

Step 1 - You invoke /story-splitting. The agent immediately flags "manage", "including", and the conjunction "and" joining three notification types plus scheduling. It suggests splitting into at least 4 stories: one per notification channel plus quiet hours as a separate slice.

Step 2 - You take the first slice ("email notification preferences") and invoke /hamburger-method. It breaks the feature into layers (UI, API, business logic, persistence) and generates options for each. For the UI layer: (a) full settings page, (b) single toggle, (c) link to email with confirmation, (d) inline in profile. It composes the thinnest vertical slice: a single toggle with an API endpoint and a database flag.

Step 3 - You invoke /small-safe-steps on that thin slice. It produces a sequence of 1-3 hour steps: add the database column with a migration, add the API endpoint with tests, add the UI toggle, wire it together. Each step deployable independently.

No single skill does everything. They compose. That's the point.

How to Get Started

If you want to try these:

  1. Fork the repo: github.com/eferro/skill-factory (my extended fork with 11 additional skills for complexity management and incremental delivery) or the original by Lada Kesseler
  2. Install skills: The repo includes a skills CLI tool. Run ./skills toggle to browse and select which skills to install into your Claude Code setup.
  3. Use them: Type /skill-name in Claude Code. /mutation-testing to check your tests. /complexity-review to challenge a design. /small-safe-steps to plan your next implementation.
  4. Make your own: The repo includes documentation and tooling for creating new skills. Fork it, add what you need, share it back.

Standing on Shoulders

The total is 329 commits, 27 skills across 6 categories. But the number that matters most is that Lada built 315 of those commits. I added 14. The original structure, the skill manager, the testing and design skills that form the foundation - that's all her work. What I did was extend it with the practices I personally find myself repeating.

This is how open source has always worked: someone builds something good, others extend it, and the whole thing becomes more useful than any individual could make it. With AI skills, the effect compounds differently - every skill that gets shared becomes available to every person using it, making good practices almost free.

Lada's augmented-coding-patterns site (with Ivett Ordog and Nitsan Avni) takes this even further - it's not just tooling but a shared vocabulary for how we work with AI. Skills, patterns, obstacles, anti-patterns: a growing body of community knowledge.

What knowledge do you find yourself repeating to your AI agents? What practices would you encode as skills?

The barrier to sharing isn't technical anymore. It's deciding to do it.

References