breadAI – I built a Discord bot that reads my network logs so I don’t have to. It audits itself too

So I run a homelab. Nothing crazy. A couple Proxmox hypervisors, a Synology NAS, pfSense firewall, a managed Brocade switch, UniFi APs, three Pi-holes, an Emby server… okay, maybe it’s a little crazy. Point is, there’s a lot of stuff generating logs, and I was not reading any of them.

Every morning I’d tell myself “I should check the syslogs today” and every morning I’d get distracted by literally anything else. Months would go by. Something would break, and I’d dig through the logs after the fact like an idiot going “oh, it’s been warning me about this for 6 weeks.”

So I built a bot to do it for me.

What it actually does

Every morning at 8 AM, breadAI wakes up and goes to work. It connects to about 30 different data sources: 10 syslog databases, the Pi-holes, both Proxmox nodes, pfSense (via SNMP), the Brocade switch, UniFi controller, Emby, Proxmox Backup Server, and a few others. It pulls everything from the last 24 hours.

The problem is that’s a LOT of data. We’re talking ~11 million characters on a typical day. Most of it is noise. Routine backups completing, DHCP renewals, services phoning home. Nobody needs to read that.

So before any AI touches it, there’s a whole compression pipeline:

Regex noise filter: 40+ patterns that strip out known-noise lines. If I’ve seen it a thousand times and it never matters, it gets filtered.
Severity tagging: everything gets tagged [ERR], [WARN], or [INFO]
Smart dedup: identical messages get collapsed, scored by frequency + severity + how unusual they are compared to the 30-day baseline
Local LLM summarization: Ollama running llama3.1 on my RTX 3060 does a first-pass summary

By the end of that, ~11 million characters becomes about 2,800. That’s a 99.97% reduction. Only then does it hit the Claude API for actual analysis.

The Idea

I have an RTX 3060 in one of my servers running Ollama. The idea: use the local model to pre-filter everything first, then send only what matters to Claude (Anthropic’s API). Sending 475k log rows to an AI every morning is expensive and slow. Sending a compressed summary of the interesting stuff is neither. Simple idea in theory. Several months of tinkering in practice.

The Pipeline

Every morning at 8AM the bot collects from ~30 sources — ten syslog databases, UniFi, three Pi-holes, two Proxmox nodes, SabNZBd, Emby, Uptime Kuma, Nginx Proxy Manager, pfSense SNMP, SmokePing.

Then:

Noise gets killed first. Logs get severity tags, timestamps and PIDs get stripped for deduplication, and burst detection flags any device firing more than 100 events per minute — before deduplication, because you need to capture that frequency signal before collapsing it.

Survivors get scored. Severity + frequency + novelty vs. 30-day baseline. High-frequency routine stuff scores low. Rare or severe events score high. Results bucket into three tiers: critical (no cap), volume (150), recent (75). If there’s still too much, it keeps the first third plus the last two thirds so the newest stuff always makes it through.

Ollama compresses what’s left. llama3.1 on the local GPU takes the pre-filtered output and compresses it to a structured summary. About 0.5% of the original data makes it out the other side.

Claude does the actual thinking. The compressed output goes to the API. It acts as a NOC analyst, correlating events across sources, identifying root causes, writing a human-readable summary. Not just “here are the errors” but “NAS3 NFS errors and PM1 disk alerts both started at 02:14, this is probably NAS3, not PM1.”

Discord gets the report. Color-coded embeds, downtime classification, anomalies, 30-day trends, resource forecasts. If something’s actually on fire, I get a DM.

The Brain

The bot maintains persistent memory in a /brain/ directory — seven rolling 30-day datasets
covering latency, Pi-hole stats, PBS backup history, Proxmox metrics, Emby watches, UniFi
clients, and alert history. It also maintains per-device statistical baselines that update every audit, so novelty scoring gets better over time. There’s a full device registry too: bidirectional IP/MAC/hostname resolution for ~100 devices. When you ask about an event, it already knows that 10.1.11.42 is the Synology.

Noise Filtering & Alerts

If the bot keeps surfacing something you’ve already triaged, you just tell it to stop:

@breadAI stop reporting UFW blocks from the IoT VLAN

It understands the intent and adds the right suppression filter. No regex, no redeploys.

Alerts have a full state machine: active/resolved status, severity tiers, first/last seen timestamps, occurrence counts, and per-category cooldowns to prevent spam. It tracks crash frequency over 30 days, so instead of “device crashed” it says “5th crash this month.” Alerts auto-resolve at the end of each audit, if a problem clears itself, the bot knows.

Model Cascade

Not everything needs the heavy model. The bot routes by cost:

Haiku — forecast review, cheap classification. Fraction of a cent. Skipped entirely if everything looks healthy.
Sonnet — daily audit, most chat queries, CLIde verification runs
Opus — complex multi-source correlation when it actually matters
Sonnet fails → falls back to Opus → falls back to Haiku with no tools. Prompt caching reduces costs on repeated context. Temperature: 0.3. Not a creative writing project.

What You Can Ask It

17 chat tools via @mention. Before answering, it builds a context window from the last audit, active alerts, device registry, infrastructure topology, and operational notes. Then calls tools live for current data. It also does temporal reasoning — “what was happening around 2am?” gets translated to the right SQL window automatically, and it knows which errors during maintenance windows are expected.

“What was happening on pfSense around 2am last night?”
“Is NAS3 healthy?”
“Are my PBS backups running?”
“What’s been crashing this month?”
“Stop alerting me on this firewall rule”

The Audit Audits Itself

Every audit runs a 22-point self-verification pass. Not just “did the script finish” — it checks whether the data it collected is actually trustworthy: database counts, API reachability, RRD file freshness, backup job counts, blocklist sizes, write acceptance. A monitoring system that can fail silently is worse than no monitoring at all. If breadAI can’t reach the PBS API it says so, rather than just not mentioning backups.

Meet CLIde

This is the part I think matters most. LLMs are great at summarizing data. They’re also great at sounding confident about things that aren’t quite right — overstating a spike, misattributing a metric, reporting something that was a transient blip in a 24-hour average. When your audit system is the thing you rely on to catch problems, you can’t just trust it blindly.

So I added a second AI to check the first one’s work.

After the primary report generates, I extract every flagged finding into a structured list with specific verification hints — commands to run, API endpoints to hit, files to check. That list goes via SSH to a separate LXC container where Claude Code CLI is installed. This instance — CLIde — gets no project context, no memory files. Just a list of things to verify and tools to verify them with.

It then actually runs the checks. Curls Pi-hole APIs. SSHes into NAS3. Reads real logs. Checks real database state. Each finding comes back CONFIRMED, CONTRADICTED, or UNVERIFIABLE.

Two AI instances. Neither can see what the other saw. One summarizes, one verifies against ground truth. On a quiet night when nothing is wrong, CLIde just says “Tutto bene, boss.” Yeah, I gave it a mafia persona for all-clear responses. Life’s too short for “No significant issues or alerts to report.”

CLIde just hanging out waiting to be utilized

Anomaly Detection

Multidimensional — different thresholds for different signal types. Traffic spikes vs. 30-day baseline, latency degradation including “last 3 days is 2x the prior 7 days,” storage growth with a sawtooth filter so a manual cleanup doesn’t read as a false negative, and per-host CPU/RAM swing detection. Requires 7 days of history before flagging traffic anomalies — no spam while it’s still learning what normal looks like.

Forecasting

Linear regression across 30-day datasets for PBS datastore growth, Proxmox resources, SabNZBd disk usage, and WAN latency. Each forecast gets a confidence score and plain-English output: “PBS datastore at current fill rate: full in ~73 days.” Every prediction gets recorded and automatically verified when the date arrives — accuracy tracked by forecast type and confidence level.

WAN health uses a majority-vote check across 3 external targets per link. Two of three showing elevated latency = WAN issue. One of three = that host’s problem. It doesn’t page me because Cloudflare had a blip.

Resilience

Circuit breakers on all 10 external APIs — if one’s down, the audit continues without it rather than hanging.

Dual-write to both JSON and PostgreSQL. If the database is down the bot keeps running on JSON. Divergence detection flags if they drift more than 10 records.

Graceful degradation throughout — Ollama offline falls back to raw bucketing, database offline falls back to JSON only. The audit always runs and always notes what’s missing.

Lessons Learned

The GPU is shared. The audit runs at 8AM. @mention the bot at exactly 8AM and you’re waiting in line.

Portainer can’t read .env files from NFS. All variables have to go inline in the compose file.

The Anthropic SDK throws a cryptic tuple error if you pass system prompts the wrong way when prompt caching is configured. Plain string, not a list.

Discord embed fields have a 1024 character limit. Claude doesn’t know this. Handle truncation yourself or alerts end mid-sentence.

Use Haiku for forecast review. No reason to pay for the heavy model when you’re just asking “anything concerning here?”

Commit before switching machines. Learned this the hard way. Then learned it again.

Current State

v0.5.3c, running 24/7 in Docker on Synology NAS since December 2025. ~7,200 lines across 14 Python modules. Ten circuit breakers. Twenty-two verification checks per audit. Seven rolling historical datasets. Two AI instances keeping each other honest.

Overengineered? Almost certainly. Does it work? Yeah. I haven’t been paged for something stupid in months.