Let it act — with your approval
Let it do something — but hold the gate
So far the agent has only talked. Now it acts — but on a leash. This is level 3 of the ladder: 'tell me what you intend to do, but don't do it unless I say yes.' The agent prepares; you approve before anything lands.
- 1 Turn on approval gating (e.g.
write_approval: true, and/skills approval onin chat) and restart. - 2 Ask Hermes to take a real but reversible action: draft a message, create a file, or prepare an edit.
- 3 When it proposes the action, review exactly what it wants to do before approving.
- 4 Approve one action you are happy with — watch it execute only after your yes.
- 5 Now ask for a second action and reject it. Confirm nothing happened.
One action ran only after you approved it, and one you rejected did not run at all. You held the gate.
The gate is where trust is built
Level 3 feels slower, and that is exactly the point — it is the rung where you learn whether this agent's judgement is good enough to trust further.
Level 3 — conditional action — is the hinge of the whole ladder. Below it the agent only informs; at and above it the agent changes things in the world. The approval gate (write_approval, skills approval) is what makes that safe: nothing irreversible happens without an explicit yes. Two questions decide whether a given action is even allowed to be gated rather than forbidden: is it reversible, and can you verify it was done right? Watching what the agent *proposes* over many approvals is how you calibrate trust — good proposals earn it a path to level 4; bad ones tell you to keep the gate shut.
- ?Why is level 3 the dividing line of the ladder — what changes between level 2 and level 3?
- ?For a given action, how do 'reversible?' and 'verifiable?' decide whether you gate it or simply forbid it?
- ?After ten good proposals, what would actually justify moving an action up to level 4 (act-unless-I-say-no)?
- ?What kind of action should never be merely gated, no matter how good the agent's track record?
Define your gate
A gate you keep in your head is a gate you will forget. You will write down which actions need approval and which do not.
Produce a short, explicit policy: which categories of action Hermes may do freely, which require your approval, and which it must never do — and configure the agent to match.
- 1 List the kinds of actions you would actually want Hermes to take in your work.
- 2 Sort each into: free (do it), gated (ask me first), forbidden (never).
- 3 Justify each placement with the two tests: reversible? verifiable?
- 4 Configure approval so the 'gated' category really does prompt you.
- 5 Test one action from each of the three buckets and confirm the behaviour matches your policy.
Your free / gated / forbidden policy (with the reversible+verifiable reasoning) and a test showing the gated category actually prompts for approval.
✎Approving every action is safe but exhausting; approving nothing means the agent is useless. Where, for your work, is the gate worth keeping closed — and where is holding it actually costing you more than the risk?