Deployment Hell: How My AI CTO Fixed a Broken ctrlman.dev Deployment
- Ctrl Man
- DevOps , AI Agents , Postmortem , Automation
- 28 Mar, 2026
The 2 AM Crisis
It was late evening when everything went dark. Both ctrlman.dev and app.ctrlman.dev were returning 502 Bad Gateway errors. The VPS was accessible, but nothing was working. SSH was failing, the disk was full, MongoDB was missing, and nginx was misconfigured.
In the past, I would have spent hours debugging alone, jumping between terminals, Googling error messages, and slowly piecing together what went wrong.
But this time, I did something different. I opened Telegram and sent a message to my AI CTO.
Enter Hermes: My AI CTO
Hermes Agent is an autonomous AI assistant that lives on my machines. Think of it as having a 24/7 CTO who:
- Never sleeps
- Can execute commands directly
- Remembers every deployment war story
- Coordinates multiple AI models for complex tasks
- Reports progress via Telegram
Our Workflow:
CEO (Mario) AI CTO (Hermes)
│ │
├─ "Server is down, fix it" ───►│
│ ├─ Diagnoses via SSH
│ ├─ Identifies 6 failures
│ ├─ Executes fixes
│ └─ Reports via Telegram
│◄─ "Fixed. Here's what happened" ─┤
Communication: Telegram (voice messages, screenshots, logs) Execution: Direct shell access with my SSH keys Models: Qwen 3.5 Plus (reasoning) + Qwen 4B (execution on RTX 3060)
The Cascade of Failures
Hermes connected to the Hetzner VPS and immediately identified six cascading failures. Here’s how we fixed them, together:
1. The SSH Connection Block
Symptom: SSH commands were asking for a passphrase, even though I’d removed it.
Hermes Diagnosed: The SSH agent had an old cached key with a passphrase.
The Fix:
ssh-keygen -p -f ~/.ssh/id_ed25519 -N ""
eval $(ssh-agent -s) && ssh-add ~/.ssh/id_ed25519
Telegram Update: “SSH agent had stale key. Cleared cache and reloaded. Now connected.”
Lesson: Always verify the SSH agent has the correct key loaded before deployment scripts run.
2. Disk Space Exhaustion (100% Full)
Symptom: apt-get install failed with “no free space in /var/cache/apt/archives/”
Hermes Diagnosed: Old project directories were consuming 11GB:
/home/webadmin/astroplate-combo2(11GB)/home/webadmin/proto-parsec(404MB)
The Fix:
df -h /
du -sh /home/webadmin/* | sort -rh | head -10
rm -rf /home/webadmin/astroplate-combo2
rm -rf /home/webadmin/proto-parsec
Result: Freed 11GB, disk usage dropped from 100% to 69%.
Telegram Update: “Disk was full. Old deployments consumed 11GB. Cleaned up. Now at 69%.”
Lesson: Add disk space checks to deployment pre-flight. Automate cleanup of old deployments.
3. The Missing Database
Symptom: App crashed immediately with MongoDB connection error: connect ECONNREFUSED ::1:27017
Hermes Diagnosed: MongoDB was never installed during initial server setup. The app kept restarting because it couldn’t connect.
The Fix:
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | gpg --dearmor -o /usr/share/keyrings/mongodb-server-7.0.gpg
echo 'deb [ signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] http://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse' > /etc/apt/sources.list.d/mongodb-org-7.0.list
apt-get update && apt-get install -y mongodb-org
systemctl start mongod
systemctl enable mongod
Telegram Update: “MongoDB was never installed. Installing now. Service started and enabled.”
Lesson: Database installation should be part of server bootstrap, not app deployment. Add health checks.
4. Port Mismatch & Configuration Nightmares
Symptom: app.ctrlman.dev still wouldn’t respond.
Hermes Diagnosed: Three different ports in use:
.envfile:PORT=8081- Nginx upstream:
localhost:4321 - PM2: Running on who-knows-what
The Fix:
sed -i 's/PORT=8081/PORT=4321/' /home/webadmin/proto-parsec-v2/.env
cd /home/webadmin/proto-parsec-v2
pm2 restart proto-parsec-app-v2 --update-env
Telegram Update: “Port mismatch found. .env had 8081, nginx expected 4321. Fixed and restarted.”
Lesson: Standardize PORT across .env, nginx, and PM2. Add port verification to deploy scripts.
5. Wrong Entry Point & PM2 Persistence
Symptom: PM2 processes wouldn’t survive reboots.
Hermes Diagnosed: Two issues:
- PM2 was started with
dist/server/entry.mjs(wrong - that’s Astro build output) - PM2 wasn’t linked to system startup
The Fix:
# Correct entry point for Astro SSR
pm2 delete astroplate-landing-v2
cd /home/webadmin/astroplate-combo2-v2
pm2 start server.mjs --name astroplate-landing-v2
# Enable persistence
pm2 startup
pm2 save
Telegram Update: “PM2 was using wrong entry point. Fixed to server.mjs. Enabled startup persistence.”
Lesson: Use server.mjs for Astro SSR, not build output. Run pm2 startup and pm2 save.
6. Nginx Configuration Gaps
Symptom: Still getting 502 on app.ctrlman.dev.
Hermes Diagnosed: Nginx config was correct, but needed a reload after all the changes.
The Fix:
nginx -t
systemctl reload nginx
Telegram Update: “Nginx config valid. Reloaded. All services healthy. Deployment complete.”
Lesson: Always test nginx config and reload after changes.
The Final Checklist
After this war story, Hermes and I created an automated pre-flight check:
#!/bin/bash
# Deployment Pre-Flight (auto-run by Hermes)
df -h / | awk 'NR==2 {if ($5+0 > 90) exit 1}'
systemctl is-active --quiet mongod || exit 1
curl -s http://localhost:4321/health || exit 1
nginx -t || exit 1
Key Takeaways
| # | Issue | Solution | Automated? |
|---|---|---|---|
| 1 | SSH agent cache | Clear and reload key | ✅ Yes |
| 2 | Full disk (100%) | Clean old deployments | ✅ Yes (cron) |
| 3 | Missing MongoDB | Install during bootstrap | ✅ Yes |
| 4 | Port mismatch | Standardize to 4321 | ✅ Yes |
| 5 | PM2 wrong entry | Use server.mjs | ✅ Yes |
| 6 | Nginx not reloaded | Test + reload | ✅ Yes |
The New Workflow: CEO + AI CTO
This deployment disaster became the catalyst for a completely new way of working:
Before (Solo Founder Struggle)
Problem → Google → Trial & Error → 4 hours later → Maybe fixed
After (CEO + AI CTO Partnership)
Problem → Telegram message → AI CTO diagnoses → Executes fixes → Reports back → 20 minutes → Done
Hermes Now Handles:
- ✅ Deployment automation (no more manual SSH)
- ✅ Health monitoring (disk, services, ports)
- ✅ Article writing (session files → blog posts via Qwen 4B on RTX 3060)
- ✅ Multi-agent coordination (Qwen Plus for reasoning, Qwen 4B for execution)
- ✅ Cron-based publishing (2 articles/day at 09:00 & 18:00)
The Bigger Vision
This isn’t just about fixing deployments. It’s about building a scalable AI-first company:
CEO (Me): Strategy, vision, product decisions, user relationships AI CTO (Hermes): Execution, automation, monitoring, documentation, content
Communication: Telegram (async, voice-friendly, mobile-first) Infrastructure: Multi-machine (local + RTX 3060 remote for heavy tasks) Content Pipeline: Session files → Qwen 4B → Blog articles → Published automatically
Result: I can focus on building the product while Hermes handles the operational complexity.
What’s Next
We’re now building:
- Automated health checks - Hermes monitors all services and alerts via Telegram
- Self-healing deployments - Auto-rollback on failure detection
- Content automation - 2 blog posts/day generated from session files
- Multi-agent workflows - Qwen for reasoning, Qwen 4B for execution, Kimi for review
The deployment gods weren’t on our side that night. But having an AI CTO who never sleeps? That’s better than luck. 🛠️
Have you tried working with AI agents for DevOps? I’d love to hear your experience. Find me on Telegram @ctrlman_dev or drop a comment below.
About the Author: Mario is the CEO of ctrlman.dev, building productivity tools and AI agent workflows. He coordinates daily with his AI CTO Hermes via Telegram to ship features, fix deployments, and publish content - all while focusing on product vision and user needs.