Sunday, April 26, 2026

How I use Claude Code to maintain an Obsidian vault

Seven mental models had been sitting in my inbox for weeks. Rough captures I'd jotted down while reading and watching things online: a stub here, a link there, a sentence I wanted to come back to. I asked the agent to flesh them out into proper notes. It dropped each one in the right folder, got the tags right, updated the central index, and committed the batch with a sensible message. No broken links. All in one session, and I didn't touch a single file.

That sentence is easy to write and easy to misread. It sounds like magic, or like marketing copy, or like the kind of thing people say about AI before you try it yourself and it puts notes in the wrong folder and breaks half the links in your vault. So let me explain what actually made it work, because it wasn't the model. It was the scaffolding.

A previous post described the general system: markdown files on Dropbox, versioned with Git, an AI agent that operates on the vault. This one is about a narrower question the first post barely touched. How do you give an agent durable, specific knowledge of a vault it didn't create, so that it stays consistent across sessions, not just within one?

Rules as the agent's memory

Claude Code reads a set of rule files at the start of every session. They live in .claude/rules/ and load automatically. This is the part most write-ups about "AI plus Obsidian" skip. The rules are not a prompt I rewrite each time, and they're not a long system prompt I crafted once and forgot about. They are operational memory: stuff that persists between sessions and grows the more I use the agent.

The main file, obsidian.md, runs to a few hundred lines. It describes the PARA structure and where each type of content belongs, how wiki-links work, the format conventions for literature notes, evergreen notes, Maps of Content (MOCs), mental models. It also tells the agent which Python script to run in which situation and how to interpret the output. A second file, vault-management.md, is more operational: when to use find_broken_links.py versus fix_broken_links.py, what to do with a fuzzy match at 73% versus 91% confidence. A third, master-viu-presentaciones.md, covers the conventions for the master's program materials I teach.

None of these files is static. After every session where I learn something, I update the relevant one. Sometimes it's a convention I'd been applying inconsistently. Sometimes it's a pattern that worked better than what I had documented. Sometimes it's just a rule I'd been meaning to write down for weeks. The next session starts with that lesson already loaded, and the agent doesn't need to rediscover it.

This is what separates "I told the agent the rules" from "the rules live in the repo and improve with use." Most setups stop at the first. Getting an agent to write a note is trivial. Getting it to write a note that's consistent with 774 other notes it didn't write, and to keep it consistent over months, is what needs this kind of explicit, maintained context.

The zero broken links rule

The single most important rule in the vault is this one: if you're not certain a note exists, don't create the wiki-link. Use plain text instead.

A broken link in Obsidian isn't a 404. It's a ghost. The link renders, it shows up in the graph view, it looks like a connection. But click it and you're in an empty note. Broken links are promises the vault never kept. They make the graph misleading. They accumulate silently, each one suggesting knowledge that isn't there.

The rule is enforced two ways. In the agent's behavior: before writing [[SomeConcept]], Claude Code runs a Glob search to confirm the file exists. If it doesn't, it uses plain text. No exceptions. And in a Python script:

make find-broken-links

This runs after any session that involved creating or editing notes. It parses every .md file in the vault, extracts every [[wiki-link]], and checks that a file with that name exists. The output is a table. The acceptable result is zero.

There's a principle I keep coming back to here: you can't trust an agent's self-reporting on something like link integrity. The agent thinks it checked. The script actually checked. Both have to be in the loop.

The mechanical layer

Behind the agent there are 15 Python scripts wired together through a Makefile. They are the part of the system that doesn't improvise.

find_broken_links.py I've already described. Its companion, fix_broken_links.py, handles the cases where a broken link is a typo or a slightly different name of an existing note. It uses difflib for fuzzy matching, configurable threshold, interactive confirmation by default. --auto-apply --threshold 0.90 handles the obvious cases without asking; --dry-run shows what would change without touching anything. validate_frontmatter.py scans every note and checks that the YAML is valid, that title is present, and that tags is a list rather than a bare string. These drift surprisingly fast during fast note creation, and catching them at the script level keeps them from accumulating into a future cleanup job.

The principle behind the whole layer is the one above: the agent decides, the scripts verify and execute the repeatable mechanics. The agent says "I created seven notes." The script confirms whether the wiki-links between them resolve. Two different jobs, run by two different actors, with no overlap of trust.

The middle layer

Between rules (context for the agent) and scripts (mechanical operations), there is a third layer: skills. These are workflow protocols that combine agent judgment with script execution.

The vault has six. create-note creates a note with the right frontmatter, in the right PARA location, with the right tags. If something's missing it asks before guessing. fix-broken-links runs the detection script, applies the obvious fixes automatically, and surfaces the edge cases for me to decide. vault-health-check runs the full check-health suite and tells me what to fix and in what order.

The distinction I care about is this: a skill is not a prompt. A prompt runs once and the agent improvises the rest. A skill is a protocol. It's explicit about which tools to use, in which order, what to verify, and when to ask the human. The same skill runs the same way every time. That's what you want for maintenance work. Boring, repeatable reliability. Not novelty.



The pattern: friction → script → skill → rule

None of this was designed upfront. It evolved from use, and the evolution is visible in the commit history.

Each piece started as a friction. I'd notice I was making the same judgment call session after session, or hitting the same dull task by hand, or repeating a correction I'd already made. The first response was a script for the mechanical part. Then, when the script was being called the same way each time, a skill that wrapped the orchestration. Then, when the lesson was something the agent should always know, a line in the rules so the next session would start with it loaded.

The presentation materials for the master's program followed exactly this arc. After the third session I noticed I'd made the same mistakes twice: too many bullets per slide, explanatory content in visible slides instead of speaker notes, material that didn't fit the session dumped at the end of the main presentation. I spent an hour updating the rule file with what I'd learned. From session four onward, Claude Code applied those lessons automatically. I didn't have to remember them. The rules remembered them for me.

Every problem I solve ends up encoded in a rule, a script, or a skill. The next time the same problem turns up, the agent already knows the answer. The documentation I write once keeps working session after session, which is why this thing keeps getting more useful without me planning for it.

More tools, same scaffolding

The tools I lean on most share a shape: text in, text out, callable from a make target. That's what lets them attach to the loop without friction. The agent operates them like it edits a note. Each one adds a capability without changing how the system works.

  • Mermaid: diagrams written as .mmd files. The agent generates them, renders to PNG via mmdc, embeds the result in the note, and commits, all from a single make target. I use it for architecture sketches, process flows, the kind of thing I used to draw on a whiteboard and lose.

  • Marp: slide decks written in markdown with annotations, compiled to PDF and PowerPoint. I ask the agent to change a slide, it edits the .md, regenerates the PDF, and commits. The thing I value most is mundane: I can git diff between session 3 and session 7 of the same master's class.

  • Excalidraw: Obsidian stores its sketches as JSON inside regular .md files, so the agent edits them like any other text. Whiteboard-style diagrams that live next to the note that needs them, refined by prompt instead of by hand.

  • notebooklm-py: a client for the NotebookLM API. The agent passes a URL, a transcript, or a PDF and gets back topics, key points, and a structured summary. That's how external material lands in the vault with a consistent shape, without me writing every summary by hand.

  • yt-dlp: YouTube extraction. The vault uses it for talk-note metadata (title, channel, duration, captions), and also for downloading the video or the audio when I want a local copy of something I might lose access to.

  • markitdown: PDF and Word to markdown. It's how a paper or a slide deck someone shares ends up inside the vault as text the agent can read, summarise, and link to whatever else is already there.

The pattern repeats every time. If a tool speaks text and a Makefile target can wrap it, the agent inherits the capability. Adding a new one is a few lines of glue, not a rewrite.

What the agent doesn't do

The agent doesn't decide what's worth capturing. That judgment is still entirely mine. It doesn't surface connections I haven't thought of, at least not reliably enough to trust. It doesn't write the parts of notes that actually matter: the "applications in software" section of a mental model note, the personal reflection on why a talk changed how I think about something, the synthesis between two ideas I read six months apart.

What it does is remove the friction between having an idea and that idea being properly integrated into the vault. Before: I'd capture something in the inbox, know vaguely where it should go, not quite have the energy to do all the steps correctly, leave it in limbo. Now: the skill handles the scaffolding, I make the judgment calls, the note ends up in the right place with the right tags and no broken links.

The honest version is that the system requires active maintenance. If I'm lazy about updating the rules after learning something new, the next session starts with slightly worse context. The skills are only as good as the last time I refined them. The scripts catch what they're designed to catch. They are not magic.

But the direction is right. Each round makes the agent a bit more useful in this specific vault, and each round is cheap: a few lines of rules, a Makefile target, a refined skill. Unlike a one-off prompt, none of it evaporates when the session ends.

Where this leaves me

774 notes, 15 scripts, 33 Makefile targets, 6 skills, 3 rule files. Around 100 commits in the last two months, most of them from sessions where I was building something else and the vault maintenance happened as a side effect.

The test I use: after a week off, how long does it take me to pick up where I left off? With the previous system (no scripts, no rules, ad-hoc organization), reorienting took twenty or thirty minutes of reading through recent notes to figure out what was where. Now it's opening the vault, running make check-health, skimming recent commits. Five minutes.

Back to the question this post opened with: what makes Claude Code reliable in this specific vault? The short answer is that reliability doesn't come from the model. It comes from making conventions explicit. Rules the agent reads at the start of every session. Scripts that verify what the agent thinks it did. Skills that turn a vague instruction into a written protocol. The model fills in the parts that weren't worth automating. Everything else is documented, checked, and reused.

I've built this kind of thing in software teams: explicit conventions, automated checks, tools that hold on to what the team has figured out so new members don't start from zero. The dynamics are the same here. The agent is the new team member who read the onboarding docs, runs the checks before committing, and asks when something isn't covered. The documentation just happens to live in .claude/rules/.

Related reading

No comments: