To make production errors file their own issues, I gave the system an Agent inbox — I'd actually wanted it to fix the bugs
The trigger is mundane: a 500 fires in production at 3am, nobody's watching, and by the time a user screenshots it into a group chat and I go digging through logs, the moment's long gone. There's off-the-shelf monitoring for this — I've run Rollbar and dashboards like it. But honestly: the UI is ugly, the free tier is hemmed in with limits, and lifting them means paying. This is a side project. I don't want to pay for it, and I want even less to babysit yet another heavy console.
And it's the AI era now. I'd rather not stand up one more complex UI for a human to stare at. What I wanted to try is the other path — Agent-first: no dashboard, just an agent watching the project for me, in real time.
Concretely, that means using it for project monitoring: have production errors email a robot inbox, and have an agent — a resident process on my machine — check that inbox in real time and act on a hit. The recipient isn't me; it's a program.
Building it, I realized I'd wandered into a new species that only showed up this year — the "inbox built for an Agent." I found two representatives, AgentMail and Tencent's Agently Mail, compared them, and built this on one of them. This post is that build log, and it pulls the two apart along the way.
First, what I actually built
In one line: the moment production Rails throws an unhandled exception, it scrubs, dedupes, and emails an Agent inbox; a resident process on my Mac mini polls that inbox every 10 seconds and, on a hit, opens a GitHub issue with gh.
unhandled 500"] -->|"Rails.error sub"| B["scrub + fingerprint
cross-process dedupe"] B -->|"SolidQueue async"| C["SendCloud SMTP"] C --> D[("Agent inbox
AgentMail")] D -->|"local daemon, every 10s"| E["launchd resident
Mac mini"] E -->|"gh issue create"| F["GitHub Issue
auto-reported"] style D fill:#dbeafe,stroke:#0b62f6 style F fill:#dcfce7,stroke:#15803d
There's a point here that looks roundabout but is actually the whole design's keystone: why email?
Stuck for a second: my service is local and unreachable from the public internet, so a webhook can't push to me. How does it even learn that production broke?Because my handler runs locally — the public internet can't reach in, so the "you broke, I'll push it to you" webhook model just can't land here. Email, on the other hand, is cross-network, async, and stored by nature: the production side only has to drop a letter in the box, and I fetch it whenever I'm free — neither side needs to know the other's IP. Here the inbox isn't "sending a notification," it's being used as a message bus. And that's exactly the future the "Agent inbox" products are betting on.
It really runs. I manually triggered a test exception in the production console, and a few dozen seconds later the daemon moved and an issue showed up on its own:
$ python listener.py listen
👂 Polling inbox=no8@…agentmail.to, every 10s. Ctrl-C to quit.
======================================================================
📩 New error email from: autofix@leonclass…
subject: [auto] NoMethodError #f89050d759aa
======================================================================
→ gh issue create --label auto-reported …
✓ created https://github.com/leonx-ai/leonclass/issues/290
That's the nice thing about deterministic stuff: it won't be polite on your behalf. It reports what should be reported, and doesn't nag you about repeats — the same error is deduped by fingerprint, with a cooldown window so one high-frequency 500 can't flood the inbox. The dedupe isn't kept in memory; it's persisted to a file, so a restart won't re-report:
"f89050d759aa": { "action": "issue", "status": "created",
"issue": "…/leonclass/issues/290" }
"05bddfdfed03": { "action": "issue", "status": "created",
"issue": "…/leonclass/issues/292" }
I'd actually wanted it to fix the bugs
Honest confession. My first version wasn't "file an issue" at all. What I wrote was: on an error, spin up a headless Claude Code, in an isolated git worktree, to read the code, change it, run the tests, and open a draft PR directly. The whole thing worked end to end.
Then I switched it off and fell back to "just file an issue" (the fix mode code is still there — one env var away).
It's not that it didn't work — it did. It's that another question had to be answered first: this loop that fixes code and opens PRs on its own — how much should I trust it?Auto-fixing code driven by a production signal sounds sexy, but to leave it on for real I first have to answer: are its fixes correct? Who reviews them? Could an error quietly grow itself a PR and get merged in the middle of the night? Until I can answer those cleanly, "collect first, put it in front of me first" is the honest posture. Don't trust, verify — and this time the thing being verified is the automation I built myself.
The guardrails are tightened in the same spirit: fix mode only ever opens a draft PR, never auto-merges, never touches main, with concurrency capped at 1 by default. Keep the capability around — but the default has to be restraint.
And the data. Error info leaves the country (it goes to a US-hosted inbox), so I only send the fingerprint + backtrace + a scrubbed message, with emails, IPs, long tokens, and long digit strings all replaced, and request params and user_id never sent at all. Weaker diagnostics, in exchange for sleeping at night.
So what kind of new species is an "Agent inbox"
A normal inbox is for people: you log in, you click in, you reply by hand. An Agent inbox swaps the subject — the subject is a program. What it wants: create mailboxes over an API, get triggered in real time on arrival, hold a verifiable identity to register with other services, and stay isolated from your "human inbox" so the two don't bleed into each other.
Two very different representatives showed up this year.
AgentMail: born for machine-to-machine
AgentMail introduces itself plainly — "API-first email built for AI agents." YC S25, raised $6M led by General Catalyst in early 2026, betting that "every agent should have its own inbox."
What it hands you is a full set of machine interfaces: REST API, Python / TypeScript SDKs, webhooks, WebSocket live events, IMAP/SMTP, and an MCP server. Create mailboxes programmatically, get an event pushed the moment mail arrives, fetch a body or attachment by message id — every action is for code, not one designed for "a person clicking a mouse." The free tier gives you 3 mailboxes and 3,000 messages a month, enough to get the loop running.
This is what my setup uses. For an unattended, automated error-collection pipeline, it's smooth enough that you don't have to think about it.
Tencent Agently Mail: an inbox with identity, with a human watching
Agently Mail (agent.qq.com), from Tencent's QQ Mail team, starts from a different place. It stresses giving an agent its own identity, an inbox isolated from your personal one — but at heart it's human-agent collaboration:
- Every write operation (send/reply/forward/delete) goes through a two-stage confirmation — the agent generates a summary of the action first, and only does it for real once you nod;
- Prompt-injection protection when reading mail;
- Onboarding is install a CLI + scan a WeChat QR to authorize, used inside an agent's chat window — already adapted for Claude Code, Cursor, Codex, Kimi, Doubao, and a bunch of other mainstream agents;
- China-native, the CLI is open source on GitHub under Apache-2.0, and it's on Tencent's SkillHub too.
Note: whether it has webhook push, and the protocol details, aren't much disclosed yet — its shape is more "let an agent tend an inbox with an identity for you" than "a background daemon fires the moment mail lands."
So where do the two fundamentally differ
It's not a difference in feature count — it's a difference in worldview. One bets on "an unattended machine bus," the other on "an agent-identity inbox with a human watching."
| Dimension | AgentMail | Tencent Agently Mail |
|---|---|---|
| Design philosophy | unattended machine bus | agent-identity inbox with a human watching |
| Onboarding | REST / SDK / Webhook / WebSocket / MCP | install CLI + WeChat QR, lives in a chat client |
| Autonomy | fully automated send/receive | two-stage human confirm on writes + injection defense |
| Typical shape | you build a backend service to listen | rides inside an agent client to tend mail for you |
| Region / compliance | US-hosted | China-native |
| Openness | commercial SaaS (with SDK / MCP) | CLI open-sourced Apache-2.0 + SkillHub |
| Best for | error collection, crawler signups, agent-to-agent comms — pure pipelines | human-agent handling of daily mail, where it must be controllable and auditable |
For my case the choice is obvious: I want "it does the work when nobody's around," and AgentMail's webhook/polling model fits naturally. But if what I wanted were "have an agent watch my inbox and draft replies for me to send," then Agently Mail's two-stage confirmation is a feature, not a hassle.
A few potholes, while I'm at it (deterministic stuff won't lie to you)
- Occasional dropped mail. Between SendCloud and the inbox, I've watched one letter arrive in a second and another vanish without a trace. For error collection it's tolerable — the same error sends again on the next cooldown window. But it's a reminder: the email pipe can't be fully trusted either; critical paths need retries or reconciliation.
- No git inside the container. The
git_shain production issues started out asunknown— the image has no.git. You never learn this kind of thing until you run it by hand. - Cross-process dedupe needs a shared cache. Production is multi-container, multi-process; in-process dedupe is no dedupe at all — it only counts once it lands in a shared cache.
Looking ahead: the inbox might be the most underrated infrastructure of the Agent era
Pull the view back a bit. I'm betting a few things happen.
The inbox becomes an agent's "system-level inbox": not just sending and receiving email, but a unified async channel for human↔agent, system↔agent, agent↔agent. My "error→email→handle" setup is just its smallest use case.
From "collecting" toward "self-healing": issue → draft PR → auto-verify → feed back — I've already framed out this chain, and only deliberately haven't turned it all on. The real barrier was never the tech; it's calibrating trust: how much autonomy are you willing to hand a loop driven by a production signal.
Identity and trust become the main battleground: once an agent has a verifiable inbox identity, it can register for services, run OAuth, be audited — and as the capability unlocks, prompt injection, data leaving the country, and the limits of autonomy all become governance questions you have to answer head-on. Tencent's two-stage confirm + injection defense is a bet on exactly this line.
Last
Full circle, back to the question I couldn't answer at the start: this loop that emails on its own and files issues on its own — how much should I trust it? My answer: let it only talk, not act, first; pile up a stretch of real issues, learn its temperament, then hand over autonomy one notch at a time. A 50-year-old protocol paired with a loop that knows restraint may beat the sexy "fully automated" slogan by a mile.
One plug: the thing we're seriously building, leonclass.com, an AI assistant that helps science teachers vet exam questions, is at heart a validator too — judging whether a question is self-consistent and whether it actually tests the intended skill. It's the same thing as this self-healing loop: the capability is never the hard part — the hard part is "do you dare trust it," and the only way to make it trustworthy is to let it talk-but-not-act under your eye first, then hand over power once it's earned.
If you're chewing on verification / education / agents too, come say hi.