Taming AI Agents With a Code Constitution

AI coding agents are incredible. They can scaffold entire services, write tests, refactor code, and do it all in minutes. But if you’ve used them on anything beyond a single throwaway script, you’ve probably noticed something: they have opinions, and those opinions change with every conversation. What was a clean repository pattern yesterday becomes an inline SQL query today. The logger you’ve been passing through context? The agent just decided to instantiate a new one in every function.

This is the story of how I went from constantly wrestling AI agents into submission, to building a system where they consistently produce code that matches my standards, across multiple repositories, at lower cost, and with enough confidence to run several agents in parallel.

The Backdrop

I run a hobby project called Ultimate Setup Hub, a platform for sim racing enthusiasts. The backend is a collection of Go microservices: an auth API, a content API, a setups API, a strategy API, and a few more. There’s a Next.js frontend sitting on top of it all.

The project serves a dual purpose. Beyond the product itself, it’s my “don’t get rusty” playground. At the time I started the project, my day job had temporarily drifted away from hands-on development, and I needed something to keep me sharp. I’ve always taken the approach of product first, code later, ship the feature, worry about perfection when there’s time.

That philosophy, combined with the natural pace of a hobby project, introduced what I’ll call organic drift. One API would get input validation while another wouldn’t. Logging started as explicit logger passing in function calls, then evolved to context-based logging in newer services, but the older ones never got updated. Each repository was slightly different, and I was fine with that. The product was moving forward.

Then I started using AI agents.

The Drift Amplifier

At first, AI was a productivity booster. I’d spin up a new API for some functionality and let the agent scaffold the whole thing. Need to add caching to an existing service? Let the agent handle it. It was fast, it was convenient, and the output was… mostly fine.

Mostly.

The problem crept in gradually. The agent would look at one repository and adopt its patterns. Then it would work on another repository and adopt different patterns, or worse, invent entirely new ones. My organic drift wasn’t just persisting, it was being actively amplified. The AI would see that Repository A didn’t have validation and conclude that validation wasn’t part of the project’s conventions. It would encounter the older logger-passing pattern and dutifully replicate it, even though I’d moved on from that approach months ago.

I found myself spending more and more time in each prompt explaining how things should be done. Use this library, not that one. Follow this architecture. Put DTOs here, not there. Don’t use an ORM, we use raw SQL with pgx. And of course, I’d inevitably forget something, and the agent would fill the gap with whatever it thought was best, which was almost never what I wanted.

The Documentation-First Detour

A friend introduced me to spec-kit around this time — a tool for defining changes through structured documentation before writing any code. The idea resonated with me: if I could catch drift at the specification stage, I wouldn’t have to fix it in code.

I started using spec-kit, or mimicking its approach with prompts, to have the agent write specifications and task breakdowns first. I’d review the documentation, catch potential issues, and only then let the agent implement. This helped. I was catching more problems earlier.

But I quickly realised I’d just relocated the problem. Instead of going back and forth with the agent during coding, I was going back and forth during specification. The agent would still propose approaches that didn’t match my conventions, I’d correct them in the docs, miss a few, and those would slip through into the implementation. The drift wasn’t gone, it had just moved upstream.

The Constitution

This is when I had the key insight: the agent doesn’t know what it doesn’t know, and neither do I, at prompt time. No matter how detailed my prompts were, I’d always forget to mention something. The agent needed access to a comprehensive, authoritative source of truth about how code should be written across all my repositories.

So I documented everything.

I created what I call a code constitution, a version-controlled set of documents that define, in detail, how my Go backend services are built. It covers:

Core values: domain independence, explicit contracts, infrastructure interchangeability, fail-fast configuration, observable behavior
Architecture: three-layer clean architecture with strict dependency flow rules
Technology decisions: which libraries to use, which are forbidden, and why
Patterns: repository pattern, DTO contracts, dependency injection with Wire, context-aware logging, error handling conventions
Anti-patterns: an explicit list of things that must never appear in the codebase

I organized the repositories into categories (starting with “Go backend services”) and created an implementation guide with concrete code examples for every pattern, twelve numbered examples covering everything from layered architecture to testing conventions.

The constitution lives in its own repository and is structured like this:

bitval-constitution/
├── go-backend-services/
│   ├── constitution.md          # Core values and principles
│   ├── IMPLEMENTATION-GUIDE.md  # Concrete patterns with code
│   ├── libraries.md             # Approved/forbidden dependencies
│   └── examples/
│       ├── 01-layered-architecture/
│       ├── 02-repository-pattern/
│       ├── 03-dtos/
│       ├── ...
│       └── 12-configuration/
├── mcp-server/                  # MCP server to serve rules to agents
└── commands/                    # Flow command definitions

Making It Available: The MCP Server

Having a constitution is great. Getting it into the agent’s context is the real challenge.

My first approach was to integrate the full constitution through spec-kit commands. I built an MCP (Model Context Protocol) server that could serve the constitution documents to AI agents on demand. I added custom spec-kit commands that would automatically load the relevant rules, implementation guides, and examples whenever work was carried out on a repository.

This worked remarkably well. The quality of generated code improved significantly, and consistency across repositories jumped. The agent wasn’t guessing anymore, it had a reference.

For a brief, glorious moment, I was happy.

Then I looked at my token consumption.

The Token Problem

Loading the full constitution, the implementation guide, the library documentation, and relevant examples into context for every interaction was expensive. Even small changes would burn through vast amounts of tokens and eat into subscription limits. And despite the quality improvement, the process was still time-consuming, the agent had to process all that context before doing anything useful.

I needed a way to give the agent the essential rules without the encyclopedic overhead.

So I created a compacted rule file, roughly 90 lines of pure, machine-optimised do’s and don’ts. No explanations, no rationale, no prose. Just the rules:

## Architecture
- Three layers: `application/` -> `domain/` -> `infra/`
- Dependencies flow inward only: infra -> domain <- application
- Domain NEVER imports infra or application

## Data Contracts
- HTTP APIs MUST use DTOs, never expose domain entities directly
- DTO files: `{entity}_request.go`, `{entity}_response.go`, `mappers.go`

## Error Handling
- Wrap at every layer with context: `return fmt.Errorf("operation: %w", err)`
- Log errors ONLY in controllers, never in domain/repository layers
- Controllers: map domain errors to response sentinels via `c.Error()`

## Anti-Patterns
- NEVER expose domain entities in HTTP responses, use DTOs
- NEVER log in domain/repository layers
- NEVER use manual dependency wiring, use Wire
- NEVER call `c.JSON` for error responses, use `c.Error(response.ErrXxx)`

## Core Stack
- HTTP: gin | Logging: zerolog | DB: pgx/v5 | DI: Wire
- FORBIDDEN: ORMs (gorm, ent), reflection loggers (logrus), runtime DI (dig)

This is a trimmed excerpt, the real file covers architecture, data contracts, repository patterns, dependency injection, error handling, logging, configuration, migrations, testing, naming conventions, anti-patterns, and the full approved tech stack. All in about 90 lines.

The full constitution and implementation guide still exist, they’re the authoritative source. But for day-to-day AI-assisted development, the compacted rules are what gets loaded. The MCP server serves them as a resource, and each repository’s CLAUDE.md tells the agent to fetch them:

## Coding Standards

This project follows the **go-backend-services** organisation constitution.
Before implementing code changes, read the compact rules:
`ReadMcpResourceTool(server: "constitution",
    uri: "constitution://go-backend-services/compact-rules")`

After implementing any code changes, you MUST run
`mcp__constitution__validate_code` on every new or modified Go file.

The MCP server also provides tools for automated validation, checking for forbidden imports, logging in wrong layers, domain entities in HTTP responses, and an amendment system for proposing rule updates when new patterns emerge.

The Flow: Modular, Independent, Optional

With the constitution serving as the agent’s guardrails, I needed a workflow that gave me control over each step without coupling them together. I built four independent commands:

flowchart LR
    A["/flow.issue"] --> B["/flow.plan"]
    B --> C["/flow.implement"]
    C --> D["Manual Review"]
    D --> E["/flow.review"]
    E -->|"feedback loop"| F["Update Rules"]
    F -.->|"future agents"| C

    style A fill:#4a9eff,color:#fff
    style B fill:#4a9eff,color:#fff
    style C fill:#4a9eff,color:#fff
    style D fill:#f5a623,color:#fff
    style E fill:#4a9eff,color:#fff
    style F fill:#50c878,color:#fff

/flow.issue, I describe a problem or feature, and the agent analyses the codebase, asks clarifying questions, and creates a well-structured GitLab issue with acceptance criteria and technical scope. No code, no assumptions, just a clear definition of work.

/flow.plan, Takes a GitLab issue URL, reads the codebase, and produces a detailed implementation plan: files to change, implementation steps, migration SQL, testing strategy. The compact rules are already in context via CLAUDE.md, so the plan respects my architecture and conventions from the start.

/flow.implement, Executes the plan step by step. Creates a branch, implements changes, runs tests, validates every modified file against the constitution. If anything is unclear, it stops and asks rather than guessing. Leaves changes uncommitted for my review.

/flow.review, The most critical piece. This runs a comprehensive code review that goes beyond style and correctness:

Checks every acceptance criterion from the GitLab issue
Validates against the constitution rules
Evaluates test quality (not just “do tests pass” but “are these tests meaningful”)
Detects security issues, performance problems, and scope creep
Detects constitution gaps, if it finds a pattern violation that the rules should have prevented but didn’t, it proposes an amendment

After the review, it commits, pushes, and creates a merge request on GitLab with the full review report attached as a comment.

The critical thing is that every step is independent and optional. I can use /flow.plan without /flow.issue. I can implement manually and still use /flow.review. I can skip the review entirely for trivial changes. Each command stands on its own.

The Feedback Loop

The review step deserves special attention because it’s what keeps the whole system alive.

flowchart TD
    A["Agent produces code"] --> B["Review detects pattern"]
    B --> C{"Pattern in rules?"}
    C -->|"Yes, violated"| D["Flag as violation"]
    C -->|"No, new pattern"| E["Propose amendment"]
    E --> F["Consult developer"]
    F -->|"Adopt"| G["Update constitution +\ncompact rules"]
    F -->|"Reject"| H["Add to anti-patterns"]
    G --> I["Future agents use\nupdated rules"]
    H --> I

    style A fill:#4a9eff,color:#fff
    style E fill:#f5a623,color:#fff
    style F fill:#f5a623,color:#fff
    style G fill:#50c878,color:#fff
    style H fill:#e74c3c,color:#fff
    style I fill:#50c878,color:#fff

When the review agent encounters something new, a pattern that isn’t covered by the existing rules, it doesn’t just flag it and move on. It proposes an amendment to the constitution, but only after consulting me. I get to decide: is this a good pattern that should be adopted into the guidelines, or is this something that should be explicitly forbidden?

Either way, the rules get updated. The compacted rule file gets regenerated. And from that moment on, every agent working on every repository has the updated knowledge. The next time an agent encounters a similar situation, it already knows what to do.

This is the part that made everything click. The constitution isn’t a static document, it’s a living system that evolves with the codebase. Every review is an opportunity to improve the rules, and every improvement immediately benefits all future work across all repositories.

The Evolution at a Glance

flowchart TD
    A["Naive AI Usage\nManual prompts, constant corrections"] --> B["Prompt Engineering\nLonger prompts, still missing things"]
    B --> C["Documentation-First\nSpeckit: catch drift early"]
    C --> D["Full Constitution\nComprehensive rules + MCP server"]
    D --> E["Compacted Rules\n~90 lines, token-efficient"]
    E --> F["Modular Flow Commands\nIndependent steps + feedback loop"]

    A -.->|"😤 High drift, low confidence"| G[" "]
    F -.->|"✅ Low drift, high confidence"| H[" "]

    style A fill:#e74c3c,color:#fff
    style B fill:#e67e22,color:#fff
    style C fill:#f5a623,color:#fff
    style D fill:#3498db,color:#fff
    style E fill:#2ecc71,color:#fff
    style F fill:#27ae60,color:#fff
    style G fill:transparent,stroke:none,color:#e74c3c
    style H fill:transparent,stroke:none,color:#27ae60

Each stage solved a real problem but introduced a new one. Prompt engineering reduced drift but wasn’t scalable. Documentation-first caught issues earlier but moved the back-and-forth upstream. The full constitution was comprehensive but token-hungry. Compacted rules were efficient but static. The flow commands with a feedback loop finally closed the cycle.

Running Agents in Parallel

Here’s an unexpected benefit of this whole setup: confidence scales.

When I was doing everything through manual prompts, I couldn’t realistically run multiple agents at once. Each one needed babysitting, constant course corrections, catching drift, reviewing every few minutes. The cognitive overhead was enormous.

With the constitution and flow commands in place, I can spin up agents on different repositories, working on different features, and check in when they notify me they’re done. The compact rules ensure they all follow the same conventions. The validation step catches violations before I even look at the code. The review step gives me a structured report rather than a wall of diffs to parse.

It’s not perfect, far from it. I still do manual reviews. I still catch things the system misses. But the baseline quality is high enough that I can actually trust the output and focus my attention on the parts that matter: the business logic, the edge cases, the architectural decisions that require human judgment.

What I Learned

AI agents are force multipliers, including for your bad habits. If your repositories have inconsistencies, AI will find them and replicate them. If your conventions aren’t documented, AI will invent its own. The agent isn’t the problem, the lack of explicit, accessible standards is.

Documentation for AI is different from documentation for humans. The full constitution with rationale and examples is valuable for understanding why things are done a certain way. But the agent doesn’t need the why, it needs the what. The compacted rule file exists specifically because AI agents work better with concise, unambiguous instructions than with narrative documentation.

The feedback loop is everything. A static set of rules will always drift out of date. The review-and-amend cycle is what transforms the constitution from a document into a living system. Every new pattern, every edge case, every “hmm, the rules don’t cover this” moment becomes an opportunity to make every future interaction better.

Modular beats monolithic. Coupling issue creation, planning, implementation, and review into a single flow is tempting but fragile. Making each step independent means I can use what I need, skip what I don’t, and maintain control over the process without being locked into a rigid pipeline.

Token efficiency matters more than you think. The difference between loading a full constitution into context and loading a 90-line compacted version isn’t just cost, it’s the difference between being able to iterate quickly on small changes and having to ration your AI usage to only the big tasks.

Where It Stands

The system isn’t done, it probably never will be. The constitution currently covers Go backend services, and I need to extend it to the frontend and infrastructure layers. The amendment process could be more automated. Some of the flow commands could be smarter about context loading.

But the drift problem? That’s essentially solved. When I look at code produced by an agent today versus six months ago, the difference is stark. Every service follows the same architecture. Every error is handled the same way. Every repository uses the same libraries, the same patterns, the same conventions. And when something new comes along, the system adapts.

If you’re using AI agents across multiple repositories and finding yourself constantly correcting the same issues, consider this: the agent will do exactly what you tell it to do. The trick is making sure it has access to everything it needs to know, not in every prompt, but in a system that persists, evolves, and scales across every conversation.

Give your agent a constitution. You might be surprised how well it follows the rules.