By Nick Cotter in ai — 02 Mar 2026

Claude Rules & Skills

Mechanics of CLAUDE.md and Agent Skills

I have spent rather a lot of time recently formalising the boundaries of my local development environment. The corporate narrative surrounding AI coding assistants tends to frame them as tireless junior developers. The reality, as is often the case, is somewhat more complicated. You are integrating a highly capable but fundamentally unaccountable system into your workflow. Without rigorous mechanical constraints, the complexity of the output will simply overwhelm you.

As Simon Willison correctly observes, your fundamental job as an engineer remains unchanged: you must deliver code you have proven to work. The human provides the accountability; the machine merely scales the output. To manage this friction, I have built a strict feedback loop using a custom CLAUDE.md file and a targeted SKILL.md.

Here is a breakdown of how these mechanics actually function in practice:

The TDD Constraint: Willison notes that instructing an agent to "Use red/green TDD" is a highly effective way to constrain its behaviour. My CLAUDE.md explicitly forces the model into Kent Beck's Red-Green-Refactor cycle. It must write the simplest failing test first, and only then write the minimum code to pass it. Skipping this "red" phase risks the model hallucinating a test that already passes, rendering the exercise pointless.
Structural vs. Behavioural: I mandate strict adherence to "Tidy First" principles. The agent is explicitly forbidden from mixing structural and behavioural changes in a single commit. This ensures the system remains readable and the intent clear. See Kent Beck's system prompt example.
Accountability in Testing: Code must be proven. For my Java backend, this means relying heavily on JUnit 5 and Mockito 6. When automated tests fail, providing the agent with detailed error messages is an inexpensive and vital way to give it necessary context, a practice Willison also champions. Crucially, my standing orders dictate that the agent is never allowed to commit automatically; it must wait for my explicit confirmation.

The Context Trap and Agent Skills

There is a mechanical limit to all this: the context window. Mert Köseoğlu has demonstrated that dumping raw outputs—like full access logs or GitHub issue lists—into an LLM can burn through 40% of a 200K context window in roughly 30 minutes. The system simply becomes too heavy to function.

Anthropic’s "Agent Skills" offer a rather elegant solution to this via progressive disclosure. A skill is essentially a markdown directory with a tiny bit of YAML frontmatter. The agent pre-loads the metadata, but only reads the full instructions into context if it deems the skill relevant to the specific task.

I wrote a specific SKILL.md to handle the mechanics of fetching and merging the develop branch into feature branches 16. Rather than hoping the LLM intuits the correct Git workflow for my specific stack, the skill explicitly defines the boundaries:

Safety Checks: It checks for uncommitted changes and explicitly prevents execution on shared branches like main or develop.
Merge and Abort: If it detects a merge conflict, it is instructed to immediately run git merge --abort and halt. It leaves the messy friction of conflict resolution to the human.
Targeted Testing: It dynamically targets tests based on the layers affected. It runs Angular suites if UI code has changed, and Java unit tests if serverside code has changed, etc.

The Generation Trap

There is a subtle trap when creating these skills. As Sean Goedecke points out, asking an LLM to author a skill document before it has solved a problem is a mistake; it will simply bake in its own incorrect assumptions. The correct approach is to force the model to solve the problem the hard way, iterating through failures, and then ask it to distil that hard-won knowledge into a reusable skill.

Ultimately, AI assistance is a bit of a mixed bag. It is a powerful mechanism for scaling, but it requires explicit rules of engagement, carefully managed context, and a healthy degree of skepticism regarding its autonomy.

Mechanics of CLAUDE.md and Agent Skills

Subscribe to Ecstatic Disregard