all
Type something to search...
Deployment Hell: How My AI CTO Fixed a Broken ctrlman.dev Deployment

Deployment Hell: How My AI CTO Fixed a Broken ctrlman.dev Deployment

The 2 AM Crisis

It was late evening when everything went dark. Both ctrlman.dev and app.ctrlman.dev were returning 502 Bad Gateway errors. The VPS was accessible, but nothing was working. SSH was failing, the disk was full, MongoDB was missing, and nginx was misconfigured.

In the past, I would have spent hours debugging alone, jumping between terminals, Googling error messages, and slowly piecing together what went wrong.

But this time, I did something different. I opened Telegram and sent a message to my AI CTO.


Enter Hermes: My AI CTO

Hermes Agent is an autonomous AI assistant that lives on my machines. Think of it as having a 24/7 CTO who:

  • Never sleeps
  • Can execute commands directly
  • Remembers every deployment war story
  • Coordinates multiple AI models for complex tasks
  • Reports progress via Telegram

Our Workflow:

CEO (Mario)                    AI CTO (Hermes)
    │                               │
    ├─ "Server is down, fix it" ───►│
    │                               ├─ Diagnoses via SSH
    │                               ├─ Identifies 6 failures
    │                               ├─ Executes fixes
    │                               └─ Reports via Telegram
    │◄─ "Fixed. Here's what happened" ─┤

Communication: Telegram (voice messages, screenshots, logs) Execution: Direct shell access with my SSH keys Models: Qwen 3.5 Plus (reasoning) + Qwen 4B (execution on RTX 3060)


The Cascade of Failures

Hermes connected to the Hetzner VPS and immediately identified six cascading failures. Here’s how we fixed them, together:

1. The SSH Connection Block

Symptom: SSH commands were asking for a passphrase, even though I’d removed it.

Hermes Diagnosed: The SSH agent had an old cached key with a passphrase.

The Fix:

ssh-keygen -p -f ~/.ssh/id_ed25519 -N ""
eval $(ssh-agent -s) && ssh-add ~/.ssh/id_ed25519

Telegram Update: “SSH agent had stale key. Cleared cache and reloaded. Now connected.”

Lesson: Always verify the SSH agent has the correct key loaded before deployment scripts run.


2. Disk Space Exhaustion (100% Full)

Symptom: apt-get install failed with “no free space in /var/cache/apt/archives/”

Hermes Diagnosed: Old project directories were consuming 11GB:

  • /home/webadmin/astroplate-combo2 (11GB)
  • /home/webadmin/proto-parsec (404MB)

The Fix:

df -h /
du -sh /home/webadmin/* | sort -rh | head -10
rm -rf /home/webadmin/astroplate-combo2
rm -rf /home/webadmin/proto-parsec

Result: Freed 11GB, disk usage dropped from 100% to 69%.

Telegram Update: “Disk was full. Old deployments consumed 11GB. Cleaned up. Now at 69%.”

Lesson: Add disk space checks to deployment pre-flight. Automate cleanup of old deployments.


3. The Missing Database

Symptom: App crashed immediately with MongoDB connection error: connect ECONNREFUSED ::1:27017

Hermes Diagnosed: MongoDB was never installed during initial server setup. The app kept restarting because it couldn’t connect.

The Fix:

curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | gpg --dearmor -o /usr/share/keyrings/mongodb-server-7.0.gpg
echo 'deb [ signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] http://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse' > /etc/apt/sources.list.d/mongodb-org-7.0.list
apt-get update && apt-get install -y mongodb-org

systemctl start mongod
systemctl enable mongod

Telegram Update: “MongoDB was never installed. Installing now. Service started and enabled.”

Lesson: Database installation should be part of server bootstrap, not app deployment. Add health checks.


4. Port Mismatch & Configuration Nightmares

Symptom: app.ctrlman.dev still wouldn’t respond.

Hermes Diagnosed: Three different ports in use:

  • .env file: PORT=8081
  • Nginx upstream: localhost:4321
  • PM2: Running on who-knows-what

The Fix:

sed -i 's/PORT=8081/PORT=4321/' /home/webadmin/proto-parsec-v2/.env
cd /home/webadmin/proto-parsec-v2
pm2 restart proto-parsec-app-v2 --update-env

Telegram Update: “Port mismatch found. .env had 8081, nginx expected 4321. Fixed and restarted.”

Lesson: Standardize PORT across .env, nginx, and PM2. Add port verification to deploy scripts.


5. Wrong Entry Point & PM2 Persistence

Symptom: PM2 processes wouldn’t survive reboots.

Hermes Diagnosed: Two issues:

  1. PM2 was started with dist/server/entry.mjs (wrong - that’s Astro build output)
  2. PM2 wasn’t linked to system startup

The Fix:

# Correct entry point for Astro SSR
pm2 delete astroplate-landing-v2
cd /home/webadmin/astroplate-combo2-v2
pm2 start server.mjs --name astroplate-landing-v2

# Enable persistence
pm2 startup
pm2 save

Telegram Update: “PM2 was using wrong entry point. Fixed to server.mjs. Enabled startup persistence.”

Lesson: Use server.mjs for Astro SSR, not build output. Run pm2 startup and pm2 save.


6. Nginx Configuration Gaps

Symptom: Still getting 502 on app.ctrlman.dev.

Hermes Diagnosed: Nginx config was correct, but needed a reload after all the changes.

The Fix:

nginx -t
systemctl reload nginx

Telegram Update: “Nginx config valid. Reloaded. All services healthy. Deployment complete.”

Lesson: Always test nginx config and reload after changes.


The Final Checklist

After this war story, Hermes and I created an automated pre-flight check:

#!/bin/bash
# Deployment Pre-Flight (auto-run by Hermes)
df -h / | awk 'NR==2 {if ($5+0 > 90) exit 1}'
systemctl is-active --quiet mongod || exit 1
curl -s http://localhost:4321/health || exit 1
nginx -t || exit 1

Key Takeaways

#IssueSolutionAutomated?
1SSH agent cacheClear and reload key✅ Yes
2Full disk (100%)Clean old deployments✅ Yes (cron)
3Missing MongoDBInstall during bootstrap✅ Yes
4Port mismatchStandardize to 4321✅ Yes
5PM2 wrong entryUse server.mjs✅ Yes
6Nginx not reloadedTest + reload✅ Yes

The New Workflow: CEO + AI CTO

This deployment disaster became the catalyst for a completely new way of working:

Before (Solo Founder Struggle)

Problem → Google → Trial & Error → 4 hours later → Maybe fixed

After (CEO + AI CTO Partnership)

Problem → Telegram message → AI CTO diagnoses → Executes fixes → Reports back → 20 minutes → Done

Hermes Now Handles:

  • ✅ Deployment automation (no more manual SSH)
  • ✅ Health monitoring (disk, services, ports)
  • ✅ Article writing (session files → blog posts via Qwen 4B on RTX 3060)
  • ✅ Multi-agent coordination (Qwen Plus for reasoning, Qwen 4B for execution)
  • ✅ Cron-based publishing (2 articles/day at 09:00 & 18:00)

The Bigger Vision

This isn’t just about fixing deployments. It’s about building a scalable AI-first company:

CEO (Me): Strategy, vision, product decisions, user relationships AI CTO (Hermes): Execution, automation, monitoring, documentation, content

Communication: Telegram (async, voice-friendly, mobile-first) Infrastructure: Multi-machine (local + RTX 3060 remote for heavy tasks) Content Pipeline: Session files → Qwen 4B → Blog articles → Published automatically

Result: I can focus on building the product while Hermes handles the operational complexity.


What’s Next

We’re now building:

  1. Automated health checks - Hermes monitors all services and alerts via Telegram
  2. Self-healing deployments - Auto-rollback on failure detection
  3. Content automation - 2 blog posts/day generated from session files
  4. Multi-agent workflows - Qwen for reasoning, Qwen 4B for execution, Kimi for review

The deployment gods weren’t on our side that night. But having an AI CTO who never sleeps? That’s better than luck. 🛠️


Have you tried working with AI agents for DevOps? I’d love to hear your experience. Find me on Telegram @ctrlman_dev or drop a comment below.


About the Author: Mario is the CEO of ctrlman.dev, building productivity tools and AI agent workflows. He coordinates daily with his AI CTO Hermes via Telegram to ship features, fix deployments, and publish content - all while focusing on product vision and user needs.

Comments

Log in to join the conversation

Loading comments...

Related Posts

Automated Error Monitoring for Your NGINX Service with Telegram Alerts

Automated Error Monitoring for Your NGINX Service with Telegram Alerts

Automated Error Monitoring for Your NGINX Service with Telegram Alerts Introduction In today's digital age, maintaining a robust and reliable web service is crucial for any business or organization.…

Read more...
Mastering MySQL: Setting Up Your Database for Success

Mastering MySQL: Setting Up Your Database for Success

Mastering MySQL: Setting Up Your Database for Success Introduction In today's data-driven world, a robust and efficient database system is the backbone of many applications. MySQL, one of the most…

Read more...

Related Posts

You may also enjoy these articles

Automated Error Monitoring for Your NGINX Service with Telegram Alerts

Automated Error Monitoring for Your NGINX Service with Telegram Alerts

Automated Error Monitoring for Your NGINX Service with Telegram Alerts Introduction In today's digital age, maintaining a robust and reliable web service is crucial for any business or organization.…

Read more...
Mastering MySQL: Setting Up Your Database for Success

Mastering MySQL: Setting Up Your Database for Success

Mastering MySQL: Setting Up Your Database for Success Introduction In today's data-driven world, a robust and efficient database system is the backbone of many applications. MySQL, one of the most…

Read more...