Part 1: I Built an Always-On AI Coding Agent That Plans, Codes, and Reviews Its Own Work

Always-On AI Coding Agent

How to turn a spare Mac into an autonomous coding coordinator using Hermes Agent and Claude Code

Most AI coding tools are reactive. You ask, they answer. You paste code, they suggest edits. But they don’t manage work. They don’t break a feature request into tasks, write the code, review it for security issues, check that it matches the spec, and report back when it’s done.

I wanted something that does all of that, autonomously, 24/7, accessible from my phone.

So I took a 2019 MacBook Pro that was collecting dust, installed some open-source tools, and built a coding agent that operates like a senior engineering lead: it plans before it codes, delegates implementation to a separate AI, and reviews every line before calling the work done. I can message it from Telegram while I’m out walking the dog, and come back to find a tested, reviewed pull request waiting for me.

The architecture: separation of concerns

The system has two layers, and the boundary between them is the most important design decision I made.

You

Telegram / Discord / WhatsApp / CLI / Web UI

↓

Hermes Coordinator

plans, reviews, communicates

↓

Claude Code

writes code, runs tests, commits

Layer 1: The Coordinator runs on Hermes Agent, an open-source autonomous AI agent framework from Nous Research, similar to OpenClaw. It handles planning, task decomposition, review, and communication, and it uses Gemini 3.5 Flash as its model, which is fast, cheap, and good at orchestration. What I like about Hermes is that it’s always on and learns over time. The more I use it, the better it gets at adapting to my coding approach.

Layer 2: The Implementer is Claude Code running in print mode (claude -p). It receives a standalone prompt, writes code, runs tests, and returns results. I chose Claude Code because I’m learning it, but you could substitute any coding agent with a non-interactive mode, such as Gemini CLI, Goose, or OpenCode.

The coordinator never writes code. Claude Code never plans. This separation prevents the failure mode I’ve seen in single-agent setups where the AI gets lost halfway through a complex task because it’s trying to do everything at once.

The coordinator: a senior engineering lead in a config file

The coordinator’s behavior is defined in a single file called SOUL.md, the agent’s personality and operating manual combined. The core philosophy:

Plan first: before any coding begins, break the work into small, independently testable tasks
Delegate all coding: use Claude Code for every implementation task, never write code directly
Review everything: after each task, run a two-stage review: spec compliance first, then code quality
Communicate clearly: keep the user informed of what was planned, what was done, and what passed review

When you send the coordinator a message like “Add user authentication to the API”, it doesn’t start writing code. It asks clarifying questions if anything is ambiguous, then creates an implementation plan where each task maps to a single Claude Code dispatch. It executes each task via claude -p with the exact prompt and allowed tools, checks each result against the spec and then against quality standards, and reports what got done and what needs attention.

One thing worth noting: Claude Code has no memory between dispatches. Each task prompt must be completely self-contained, including file paths, function signatures, project conventions, everything. The coordinator’s planning skill enforces this, producing prompts that any Claude Code instance could execute cold.

Seven roles, one agent

Rather than running seven separate AI agents (expensive, slow, hard to coordinate), the coordinator applies seven “role lenses” at different stages of the workflow. Each role is a Hermes skill, a markdown file with a charter, review checklist, and prompt templates.

Role	When applied	What it checks
Architect	During planning	System design, dependencies, interfaces, scalability
Implementer	During dispatch	Task decomposition, prompt construction, tool selection
Quality	During spec review	Spec compliance, test coverage, TDD enforcement
Security	Before merging	OWASP Top 10, hardcoded secrets, dependency CVEs
Docs	After implementation	README updates, API docs, changelogs
DevOps	For infrastructure changes	CI/CD, Docker configs, environment management
Reviewer	During code quality review	Correctness, readability, maintainability, integration

These seven roles are inspired by Squad, a framework for scaffolding teams of specialist AI agents. Squad assigns each agent a character identity and a functional role (frontend, backend, tester, etc.). I adapted the concept, consolidating the roles into seven that fit my coordinator’s review workflow.

The roles aren’t separate agents or separate API calls. They’re checklists and perspective shifts that the coordinator applies when reviewing Claude Code’s output. The Architect lens during planning. The Quality lens during spec review. The Security lens before anything gets committed.

The workflow engine

Four workflow skills chain together to form the development pipeline:

1. Writing plans

Every non-trivial task starts with a plan. The writing-plans skill produces structured implementation plans where each task is small enough for a single Claude Code dispatch (5-15 turns). Each task includes the exact claude -p prompt, allowed tools, timeout, and verification command.

A good plan makes implementation obvious. If Claude Code has to guess, the plan is incomplete.

2. Claude Code-driven development

This is the execution engine. For each task in the plan, the coordinator dispatches Claude Code with the complete prompt, then runs a spec review (did it implement what was asked, nothing more, nothing less?), then a quality review (is the code clean and tested?). If either review fails, it re-dispatches with specific fix instructions. After all tasks complete, an integration review checks whether everything works together.

The two-stage review matters because spec compliance and code quality are different problems. Code can be beautiful but wrong (doesn’t match the spec). Code can be correct but terrible (works but unmaintainable). Checking both, in order, catches both failure modes.

3. Test-driven development

Every Claude Code prompt includes TDD instructions: write a failing test first, verify the failure, write minimal code to pass, verify the pass, run the full suite for regressions. The coordinator verifies TDD was actually followed. If Claude Code skipped writing tests first, it gets re-dispatched with explicit enforcement.

4. Pre-commit verification

Before anything gets committed, a verification pipeline runs: static security scan on the diff, test suite execution, coordinator self-review, then an independent Claude Code review from a fresh instance with no context about how the code was written. If issues are found, a fix loop runs up to twice before escalating to the user.

No agent should verify its own work. Fresh context finds what familiarity misses.

Access anywhere

The always-on server means the coordinator is reachable from anywhere. Hermes supports multiple messaging platforms out of the box:

Telegram: Message from your phone. Ask for a feature, get a progress report, approve a plan. This is what I use when I’m away from my desk.
Discord: Connect a bot to your server for team access.
WhatsApp: Another option for mobile access.
CLI: Work from the terminal. A coder alias sets the right HERMES_HOME environment variable.
hermes-webui: A three-panel layout with sessions, chat, and agent details.

Keeping it running

Four macOS LaunchAgents auto-start everything on login and restart on crash:

Primary Hermes gateway
Coding coordinator gateway
Primary web UI (port 8787)
Coding coordinator web UI (port 8788)

Each uses KeepAlive with SuccessfulExit: false, so if the process crashes (non-zero exit), macOS restarts it automatically. Clean shutdowns stay down.

A nightly cron job backs up the entire coordinator configuration to a private GitHub repo (skills, SOUL.md, config files, but not secrets or databases). If the machine dies, I can recreate the setup from the backup in under an hour.

What I’m investigating next

Dynamic skill injection. Right now the seven roles are static. In practice, reviewing a React frontend requires different expertise than reviewing a Python API. I’m working on having the coordinator detect the tech stack (check package.json, requirements.txt, and whatever else the project has) and inject technology-specific skills into the Claude Code prompts via --append-system-prompt-file. Think curated best-practice guides for React, FastAPI, Go, loaded dynamically based on what the project actually uses.

Smarter task sizing. The coordinator sometimes creates tasks that are too small (trivial one-line changes) or too big (Claude Code runs out of turns). I want to add feedback from execution results back into the planning skill: if a task consistently finishes in 2 turns, combine it with the next one. If it hits the turn limit, break it down further.

CI integration. Right now the coordinator runs tests locally. Wiring it into GitHub Actions so it can create PRs, wait for CI, and respond to review comments would close the loop between “code written” and “code shipped.”

Getting started

If you want to build this yourself, I’ve written a detailed technical reference with every command, config file, and skill definition you need. It covers:

Preparing a Mac for always-on operation
Installing Hermes Agent and Claude Code
Creating the coordinator instance with all seven role skills and four workflow skills
Setting up Telegram, auto-start, backup, and web UIs
Supporting both Gemini and Claude as the coordinator model

These instructions should also work on WSL2, in a Docker container, or on Linux, though I haven’t tested those environments. I’m using an old Mac that has nothing important on it beyond what gets backed up nightly to GitHub, so I’m not worried about Hermes or Claude breaking anything I can’t fix. I’d be more careful on my main computer.

The full implementation details are in the companion repo: HermesCoderAgent. You could follow it step by step, or you could point a tool like Claude Code or Gemini CLI at it and have it build this interactively with you.

This setup uses Hermes Agent by Nous Research, Claude Code by Anthropic, and hermes-webui by nesquena. The seven-role model is inspired by Squad. The workflow skills are inspired by Superpowers by obra. Both Squad and Superpowers are worth checking out.

The Hermes Agent series

Part 1: I Built an Always-On AI Coding Agent That Plans, Codes, and Reviews Its Own Work (this post)
Part 2: One Coordinator, Swappable Coding Engines
Part 3: Dynamic Tool Discovery and Injection
Part 4: Running Untrusted Tools Safely
Part 5: GitHub Issues as the Agent’s Backlog
Part 6: The Autonomy Ladder
Part 7: How the Agent Learns From Its Mistakes

Implementation details and source