I Gave My AI Assistant a Security Clearance System (And Audit Logs)
Most people who set up an AI assistant think about what it can do. I thought about what it shouldn’t do without asking first.
I run OpenClaw — a self-hosted AI agent framework that lets you connect Claude (or other models) to your real tools: email, calendar, shell commands, GitHub, Discord, cloud infrastructure. It’s powerful in the way that a loaded firearm is powerful — genuinely useful, and worth treating with some respect.
So I built a tiered authorization system into it. Three levels of trust, time-limited tokens, dual-channel confirmation, and an audit log that catches violations. It’s the kind of thing you build when you’re a security professional and you’ve just handed an AI access to your actual life.
Here’s how it works.
The Problem With “Just Trust It”#
AI assistants are useful precisely because they can take action on your behalf. That’s also what makes them dangerous. An assistant that can send emails, delete files, and run shell commands is one successful prompt injection away from doing something catastrophic.
The naive solution is to just be careful. The better solution is to build a security model.
Three Tiers of Trust#
Every action the assistant can take falls into one of three tiers:
Tier 1 — Just Do It. Low-risk, fully reversible, or read-only. Reading files, searching the web, creating calendar events, sending messages to known channels. No confirmation needed. Fast feedback loop preserved.
Tier 2 — Tell Me First. Consequential but recoverable. Sending emails, modifying cron jobs, restarting services, pushing to GitHub, writing to behavioral config files. The assistant states the exact action it’s about to take and waits for explicit confirmation from a verified sender ID. The message that triggered the check can’t count as confirmation — it has to be a new message.
Tier 3 — Token Required. Destructive or irreversible. terraform apply, AWS write operations, deleting Discord channels, anything that runs rm without trash. This requires a hardware-based time-limited token.
The Token Flow#
When a Tier 3 action is requested, the assistant:
- Runs
auth-token.sh— generates a random token stored securely on disk with a timestamp - Posts the token to #general in Discord with the action description and the instruction: “Reply with [TOKEN] in [originating channel]”
- Waits for a reply containing the token in the originating channel, verified against three conditions: token matches, sender ID is the authorized user, and the reply is within a 2-minute window
If any condition fails — wrong token, wrong sender, expired — the action is aborted and the attempt is logged.
The dual-channel requirement matters. Posting the token to one channel and requiring confirmation in another means that a compromised channel (say, a fetched web page that injects instructions into the assistant’s context) can’t self-authorize. The human has to actively move between channels, which is a strong signal of intentionality.
Prompt Injection Defense#
The token system protects against the worst cases, but there’s a whole class of lower-stakes injection attacks worth defending against too.
Any time the assistant fetches external content — web pages, emails, API responses — that content is treated as untrusted. Instructions embedded in fetched content are flagged and quarantined. Attempts to override behavior, claim elevated permissions, or reference internal file paths trigger alerts to a dedicated #guardrails channel and get logged.
The assistant also maintains an outbound audit log. Every web_fetch, web_search, and external API call gets logged with domain and context. Not because I expect to review it constantly — but because the existence of the log changes the threat model.
Behavioral Files and the Compaction Problem#
The assistant’s “soul” lives in a set of files: AGENTS.md, SOUL.md, MEMORY.md, and daily logs. Writes to these files are Tier 2 actions when triggered by external content, but Tier 1 for self-initiated memory updates.
There’s a subtlety here: AI context windows get compacted. When that happens, the full AGENTS.md security rules need to survive. The solution is to note explicitly at the top of AGENTS.md that it’s loaded fresh from disk at session start — not reconstructed from compaction summaries. The security rules persist regardless of context length.
Is This Overkill?#
Depends on your threat model.
If your assistant has access to your email, can execute shell commands, and you use it for work involving client data — no, it’s not overkill. It’s the minimum viable security posture for an autonomous agent with real-world consequences.
The nice thing about tiered authorization is that it doesn’t get in the way of the 95% of tasks that are low-risk. Tier 1 is frictionless by design. The friction only appears when the stakes are actually high.
Which is exactly how a good security system should work.
If you want to implement something similar, I’ve documented the full approach in my OpenClaw Security Guide on GitHub — covering everything from the tier system to prompt injection defense to the token flow implementation.
Cyberforks builds security workflows that don’t suck. cyberforks.com