Let it act — with your approval

USE 0 - 20 min

Let it do something — but hold the gate

So far the agent has only talked. Now it acts — but on a leash. This is level 3 of the ladder: 'tell me what you intend to do, but don't do it unless I say yes.' The agent prepares; you approve before anything lands.

1 Turn on approval gating (e.g. write_approval: true, and /skills approval on in chat) and restart.
2 Ask Hermes to take a real but reversible action: draft a message, create a file, or prepare an edit.
3 When it proposes the action, review exactly what it wants to do before approving.
4 Approve one action you are happy with — watch it execute only after your yes.
5 Now ask for a second action and reject it. Confirm nothing happened.

✓

One action ran only after you approved it, and one you rejected did not run at all. You held the gate.

UNDERSTAND 20 - 45 min

The gate is where trust is built

Level 3 feels slower, and that is exactly the point — it is the rung where you learn whether this agent's judgement is good enough to trust further.

Key concept

Level 3 — conditional action — is the hinge of the whole ladder. Below it the agent only informs; at and above it the agent changes things in the world. The approval gate (write_approval, skills approval) is what makes that safe: nothing irreversible happens without an explicit yes. Two questions decide whether a given action is even allowed to be gated rather than forbidden: is it reversible, and can you verify it was done right? Watching what the agent *proposes* over many approvals is how you calibrate trust — good proposals earn it a path to level 4; bad ones tell you to keep the gate shut.

?Why is level 3 the dividing line of the ladder — what changes between level 2 and level 3?
?For a given action, how do 'reversible?' and 'verifiable?' decide whether you gate it or simply forbid it?
?After ten good proposals, what would actually justify moving an action up to level 4 (act-unless-I-say-no)?
?What kind of action should never be merely gated, no matter how good the agent's track record?

BUILD 45 - 60 min

Define your gate

A gate you keep in your head is a gate you will forget. You will write down which actions need approval and which do not.

Your task

Produce a short, explicit policy: which categories of action Hermes may do freely, which require your approval, and which it must never do — and configure the agent to match.

1 List the kinds of actions you would actually want Hermes to take in your work.
2 Sort each into: free (do it), gated (ask me first), forbidden (never).
3 Justify each placement with the two tests: reversible? verifiable?
4 Configure approval so the 'gated' category really does prompt you.
5 Test one action from each of the three buckets and confirm the behaviour matches your policy.

Deliverable

Your free / gated / forbidden policy (with the reversible+verifiable reasoning) and a test showing the gated category actually prompts for approval.

Self-check · tick before you mark done

I enabled approval and saw an action run only after I said yes.
I rejected an action and confirmed it did not run.
I can explain why level 3 is the hinge of the delegation ladder.
I wrote a free / gated / forbidden policy justified by reversibility and verifiability.

✎Approving every action is safe but exhausting; approving nothing means the agent is useless. Where, for your work, is the gate worth keeping closed — and where is holding it actually costing you more than the risk?

Helpful links

Skills & approval flow ↗