Guardrails for AI Coding Assistants

I use AI coding assistants for most of my development work. They write code, run commands, manage git, and interact with files across my projects. That’s a lot of access. And with that access comes the potential to do real damage.

These are the guardrails I’ve added over time, each one prompted by something that almost went wrong.

Block .env files

This was the first guardrail I added. During an early session, the assistant tried to read a .env file to understand the database configuration. It didn’t do anything malicious, but the file contents, including API keys and database credentials, ended up in the conversation context.

The fix is a pre-tool hook that blocks any attempt to read .env files, whether through the read tool or shell commands like cat, head, or less:

{
  "PreToolUse": [
    {
      "matcher": "Read",
      "hooks": [
        {
          "type": "command",
          "command": "if echo \"$TOOL_INPUT\" | grep -qiE '.env'; then echo 'BLOCKED: Reading .env files is forbidden.' >&2; exit 2; fi"
        }
      ]
    }
  ]
}

There’s a matching hook for Bash commands that try to access .env files through shell tools. The assistant gets a clear error message and moves on without the secrets.

Worktree guards

This one came from a session where I had a git worktree set up for a feature branch, but the assistant edited files in the main repo instead. The changes were small and easy to undo, but it could have been worse.

The worktree guard is a shell script that fires on every write, edit, and destructive git command. It checks whether the current session owns a worktree, and if it does, blocks any file mutations outside of it:

  • Writing or editing a file in the main repo? Blocked. The error message shows the equivalent path inside the worktree.
  • Running git commit or git push without targeting the worktree? Blocked. The error suggests the correct command with cd or git -C.
  • Reading files in the main repo? Allowed. Read-only git commands? Allowed. The guard only blocks mutations.

The script walks up from the current directory to find the repo root, checks for an active worktree owned by the current session, and compares paths. It’s about 130 lines of bash, but the logic is straightforward.

Scoped permissions per project

Not every project needs the same level of access. My main work project allows database migrations, specific CLI tools, and git operations on certain branches. A side project might only need basic file operations and npm commands.

This is handled through project-level settings.local.json files that define which tools and commands are pre-approved:

{
  "allow": [
    "Bash(git:*)",
    "Bash(npm:*)",
    "Bash(npx *)"
  ]
}

Anything not on the list requires explicit approval. This means the assistant can run tests and commit code without asking, but if it tries to run a database migration or deploy, I get prompted first.

The key is matching the permissions to what the project actually needs. A blog doesn’t need database access. A Laravel app doesn’t need npm. Keeping the allow list tight means fewer surprises.

Execution plan enforcement

Before the assistant starts implementing anything non-trivial, I want a written plan. Not a conversation summary, a structured markdown file with a clear approach, the files that will change, and the expected outcome.

A pre-tool hook validates that execution plans follow a specific structure and land in the right directory. If the assistant tries to write a plan that doesn’t follow the format, it gets blocked with a message explaining the expected structure.

This sounds bureaucratic, but it’s saved me multiple times. When the assistant has a plan, it stays focused. Without one, it tends to wander, making changes that seem reasonable in isolation but don’t add up to a coherent solution.

Blind PR feedback loops

I have a skill that pulls new feedback on a pull request and applies the changes. It works well when I review every change before pushing. The problem is when I get lazy.

If I stop reviewing and just let the assistant address feedback automatically, it starts a cycle. A reviewer leaves a comment, the assistant applies it, pushes, the reviewer responds to the change, the assistant addresses that too. Some of that feedback should have been pushed back on. Not every comment warrants a code change. But the assistant doesn’t know that. It treats every piece of feedback as something to fix.

The fix wasn’t technical. It was discipline. The skill now always generates an execution plan that I review before any code gets written. I read each piece of feedback and decide whether it’s worth addressing or whether I need to reply to the reviewer instead. The assistant never addresses PR feedback without me in the loop.

CI pipeline debugging

I have a similar skill for CI failures. It pulls the build log from Buildkite, analyzes the failure, and attempts to fix it. The problem is that CI artifacts can be huge. When the assistant tries to process a large log, it sometimes latches onto the wrong failure, fixes something unrelated, pushes, and the build fails again for the same reason. Then it tries again. And again.

The guardrail here is the same: never auto-accept. These debugging sessions always require me to approve every change. I read the assistant’s analysis, confirm it’s looking at the right failure, and only then let it write a fix. It’s slower, but it stops the loops.

The human in the loop

The .env hook and the worktree guard are technical guardrails. Scripts that block bad actions automatically. But the PR feedback and CI debugging problems taught me something different: the most important guardrail is staying involved.

The assistant is good at doing what you tell it to do. The risk isn’t that it goes rogue. The risk is that you stop paying attention and let it do things that seem reasonable but aren’t. Every time I’ve had a problem, it was because I took myself out of the decision loop.

What I’ve learned

Guardrails should be invisible when things are going well. The best hooks are the ones I forget about. They only show up when something is about to go wrong. If a guardrail is firing constantly and slowing down the workflow, it’s either too aggressive or solving the wrong problem.

Start with the things that can’t be undone. Secrets in a conversation context can’t be taken back. A force push to main can’t be easily recovered. Focus guardrails on irreversible actions first. Everything else can be fixed with a git checkout.

The assistant adapts. After hitting a guardrail a few times, the assistant learns the pattern. It starts using worktree paths by default. It stops trying to read .env files. The guardrails are training it within the session, not just blocking it.

Keep your guardrails in version control. My hooks and rules live in a dotfiles repo that’s symlinked to ~/.claude/. When I set up a new machine or onboard a new project, the same guardrails are in place from the first session.


If you’re giving an AI assistant access to your codebase, think about what could go wrong and add guardrails before it does. A few shell scripts and config files are a small investment for the peace of mind.

If you’re interested in the broader workflow, I wrote about using structured notes to maintain context across AI sessions and using a Q&A decision tree before writing code.