leonx.ai Don't trust, verify
← Home

English · 中文

To make production errors file their own issues, I gave the system an Agent inbox — I'd actually wanted it to fix the bugs

2026-06-27 · an Agent-first take on project monitoring

The trigger is mundane: a 500 fires in production at 3am, nobody's watching, and by the time a user screenshots it into a group chat and I go digging through logs, the moment's long gone. There's off-the-shelf monitoring for this — I've run Rollbar and dashboards like it. But honestly: the UI is ugly, the free tier is hemmed in with limits, and lifting them means paying. This is a side project. I don't want to pay for it, and I want even less to babysit yet another heavy console.

And it's the AI era now. I'd rather not stand up one more complex UI for a human to stare at. What I wanted to try is the other path — Agent-first: no dashboard, just an agent watching the project for me, in real time.

Concretely, that means using it for project monitoring: have production errors email a robot inbox, and have an agent — a resident process on my machine — check that inbox in real time and act on a hit. The recipient isn't me; it's a program.

Building it, I realized I'd wandered into a new species that only showed up this year — the "inbox built for an Agent." I found two representatives, AgentMail and Tencent's Agently Mail, compared them, and built this on one of them. This post is that build log, and it pulls the two apart along the way.

First, what I actually built

In one line: the moment production Rails throws an unhandled exception, it scrubs, dedupes, and emails an Agent inbox; a resident process on my Mac mini polls that inbox every 10 seconds and, on a hit, opens a GitHub issue with gh.

flowchart LR A["Production Rails
unhandled 500"] -->|"Rails.error sub"| B["scrub + fingerprint
cross-process dedupe"] B -->|"SolidQueue async"| C["SendCloud SMTP"] C --> D[("Agent inbox
AgentMail")] D -->|"local daemon, every 10s"| E["launchd resident
Mac mini"] E -->|"gh issue create"| F["GitHub Issue
auto-reported"] style D fill:#dbeafe,stroke:#0b62f6 style F fill:#dcfce7,stroke:#15803d

There's a point here that looks roundabout but is actually the whole design's keystone: why email?

Stuck for a second: my service is local and unreachable from the public internet, so a webhook can't push to me. How does it even learn that production broke?

Because my handler runs locally — the public internet can't reach in, so the "you broke, I'll push it to you" webhook model just can't land here. Email, on the other hand, is cross-network, async, and stored by nature: the production side only has to drop a letter in the box, and I fetch it whenever I'm free — neither side needs to know the other's IP. Here the inbox isn't "sending a notification," it's being used as a message bus. And that's exactly the future the "Agent inbox" products are betting on.

It really runs. I manually triggered a test exception in the production console, and a few dozen seconds later the daemon moved and an issue showed up on its own:

$ python listener.py listen
👂 Polling inbox=no8@…agentmail.to, every 10s. Ctrl-C to quit.
======================================================================
📩 New error email  from: autofix@leonclass…
   subject: [auto] NoMethodError #f89050d759aa
======================================================================
→ gh issue create --label auto-reported …
✓ created  https://github.com/leonx-ai/leonclass/issues/290

That's the nice thing about deterministic stuff: it won't be polite on your behalf. It reports what should be reported, and doesn't nag you about repeats — the same error is deduped by fingerprint, with a cooldown window so one high-frequency 500 can't flood the inbox. The dedupe isn't kept in memory; it's persisted to a file, so a restart won't re-report:

"f89050d759aa": { "action": "issue", "status": "created",
                  "issue": "…/leonclass/issues/290" }
"05bddfdfed03": { "action": "issue", "status": "created",
                  "issue": "…/leonclass/issues/292" }

I'd actually wanted it to fix the bugs

Honest confession. My first version wasn't "file an issue" at all. What I wrote was: on an error, spin up a headless Claude Code, in an isolated git worktree, to read the code, change it, run the tests, and open a draft PR directly. The whole thing worked end to end.

Then I switched it off and fell back to "just file an issue" (the fix mode code is still there — one env var away).

It's not that it didn't work — it did. It's that another question had to be answered first: this loop that fixes code and opens PRs on its own — how much should I trust it?

Auto-fixing code driven by a production signal sounds sexy, but to leave it on for real I first have to answer: are its fixes correct? Who reviews them? Could an error quietly grow itself a PR and get merged in the middle of the night? Until I can answer those cleanly, "collect first, put it in front of me first" is the honest posture. Don't trust, verify — and this time the thing being verified is the automation I built myself.

The guardrails are tightened in the same spirit: fix mode only ever opens a draft PR, never auto-merges, never touches main, with concurrency capped at 1 by default. Keep the capability around — but the default has to be restraint.

And the data. Error info leaves the country (it goes to a US-hosted inbox), so I only send the fingerprint + backtrace + a scrubbed message, with emails, IPs, long tokens, and long digit strings all replaced, and request params and user_id never sent at all. Weaker diagnostics, in exchange for sleeping at night.

So what kind of new species is an "Agent inbox"

A normal inbox is for people: you log in, you click in, you reply by hand. An Agent inbox swaps the subject — the subject is a program. What it wants: create mailboxes over an API, get triggered in real time on arrival, hold a verifiable identity to register with other services, and stay isolated from your "human inbox" so the two don't bleed into each other.

Two very different representatives showed up this year.

AgentMail logo
Tencent Agently Mail logo

AgentMail: born for machine-to-machine

AgentMail introduces itself plainly — "API-first email built for AI agents." YC S25, raised $6M led by General Catalyst in early 2026, betting that "every agent should have its own inbox."

What it hands you is a full set of machine interfaces: REST API, Python / TypeScript SDKs, webhooks, WebSocket live events, IMAP/SMTP, and an MCP server. Create mailboxes programmatically, get an event pushed the moment mail arrives, fetch a body or attachment by message id — every action is for code, not one designed for "a person clicking a mouse." The free tier gives you 3 mailboxes and 3,000 messages a month, enough to get the loop running.

This is what my setup uses. For an unattended, automated error-collection pipeline, it's smooth enough that you don't have to think about it.

Tencent Agently Mail: an inbox with identity, with a human watching

Agently Mail (agent.qq.com), from Tencent's QQ Mail team, starts from a different place. It stresses giving an agent its own identity, an inbox isolated from your personal one — but at heart it's human-agent collaboration:

Note: whether it has webhook push, and the protocol details, aren't much disclosed yet — its shape is more "let an agent tend an inbox with an identity for you" than "a background daemon fires the moment mail lands."

So where do the two fundamentally differ

It's not a difference in feature count — it's a difference in worldview. One bets on "an unattended machine bus," the other on "an agent-identity inbox with a human watching."

quadrantChart title Two ways to live as an Agent inbox x-axis "human-in-loop / needs confirm" --> "unattended / fully automated" y-axis "general mail assistant" --> "machine-to-machine pipeline" quadrant-1 "automation pipeline" quadrant-2 "auto send/receive assistant" quadrant-3 "human-agent assistant" quadrant-4 "controlled automation" "AgentMail": [0.82, 0.8] "Agently Mail": [0.28, 0.32]
DimensionAgentMailTencent Agently Mail
Design philosophyunattended machine busagent-identity inbox with a human watching
OnboardingREST / SDK / Webhook / WebSocket / MCPinstall CLI + WeChat QR, lives in a chat client
Autonomyfully automated send/receivetwo-stage human confirm on writes + injection defense
Typical shapeyou build a backend service to listenrides inside an agent client to tend mail for you
Region / complianceUS-hostedChina-native
Opennesscommercial SaaS (with SDK / MCP)CLI open-sourced Apache-2.0 + SkillHub
Best forerror collection, crawler signups, agent-to-agent comms — pure pipelineshuman-agent handling of daily mail, where it must be controllable and auditable

For my case the choice is obvious: I want "it does the work when nobody's around," and AgentMail's webhook/polling model fits naturally. But if what I wanted were "have an agent watch my inbox and draft replies for me to send," then Agently Mail's two-stage confirmation is a feature, not a hassle.

A few potholes, while I'm at it (deterministic stuff won't lie to you)

Looking ahead: the inbox might be the most underrated infrastructure of the Agent era

Pull the view back a bit. I'm betting a few things happen.

The inbox becomes an agent's "system-level inbox": not just sending and receiving email, but a unified async channel for human↔agent, system↔agent, agent↔agent. My "error→email→handle" setup is just its smallest use case.

From "collecting" toward "self-healing": issue → draft PR → auto-verify → feed back — I've already framed out this chain, and only deliberately haven't turned it all on. The real barrier was never the tech; it's calibrating trust: how much autonomy are you willing to hand a loop driven by a production signal.

Identity and trust become the main battleground: once an agent has a verifiable inbox identity, it can register for services, run OAuth, be audited — and as the capability unlocks, prompt injection, data leaving the country, and the limits of autonomy all become governance questions you have to answer head-on. Tencent's two-stage confirm + injection defense is a bet on exactly this line.

Last

Full circle, back to the question I couldn't answer at the start: this loop that emails on its own and files issues on its own — how much should I trust it? My answer: let it only talk, not act, first; pile up a stretch of real issues, learn its temperament, then hand over autonomy one notch at a time. A 50-year-old protocol paired with a loop that knows restraint may beat the sexy "fully automated" slogan by a mile.

One plug: the thing we're seriously building, leonclass.com, an AI assistant that helps science teachers vet exam questions, is at heart a validator too — judging whether a question is self-consistent and whether it actually tests the intended skill. It's the same thing as this self-healing loop: the capability is never the hard part — the hard part is "do you dare trust it," and the only way to make it trustworthy is to let it talk-but-not-act under your eye first, then hand over power once it's earned.

If you're chewing on verification / education / agents too, come say hi.

AgentMail, Agently Mail, and other names and logos herein are trademarks of their respective owners. This is an independent technical commentary, unaffiliated with and unendorsed by any of the above. Product logos are taken from their public websites for illustration only; copyright remains with the rights holders.