Why use both Prometheus + Grafana and Uptime Kuma instead of just one tool?

They cover opposite perspectives. Prometheus + Grafana monitors what is happening inside your server — CPU, memory, disk, and application-level metrics. Uptime Kuma monitors what users actually experience from the outside — whether your site loads, APIs respond, and TLS certificates are valid. Inside metrics can look healthy while the site is down due to a firewall block or DNS issue, so you need both.

How do you add application-level metrics beyond basic host metrics?

Node Exporter only covers host-level metrics like CPU, memory, and disk. For application observability, add the prom-client npm package to your NestJS application and expose a /metrics endpoint. Key metrics to track include HTTP request duration (as a histogram), request count by route and status code, database query duration, and cache hit/miss ratio. These application metrics catch problems that host metrics miss entirely, such as a memory leak in the NestJS heap, slow database queries, or a broken integration.

Monitoring and Alerting for Solo DevOps: A Practical Stack That Works

Q: How do you prevent alert fatigue when monitoring solo?

Design alerts around three principles: actionability (every alert must have a clear response action), urgency separation (P1 alerts for site-down or disk >95% trigger immediate Telegram notifications, P3 informational trends appear in Grafana only), and suppression (avoid duplicate alerts for the same root cause). The author also follows a strict rule: if you cannot describe the exact action you would take when an alert fires, do not create that alert. Starting with 5-6 critical alerts and learning from them for a month before adding more is the recommended approach.

Q: What is the recommended Telegram alert setup for production?

Set up a dedicated Telegram bot and channel exclusively for production alerts, separate from the main work chat. Production alerts mixed into the main chat tend to get ignored. A dedicated alerts channel with a distinct notification sound trains you to respond immediately. Route critical alerts (disk >90%, site down) to the alerts channel and informational alerts to a separate noise channel that can be muted at night.

Q: How long does it take to get useful monitoring running from scratch?

The fastest path takes about 30 minutes. Deploy Uptime Kuma first (10 minutes), add HTTP monitors for all production endpoints, and configure Telegram notifications. Then deploy Prometheus + Node Exporter via Docker Compose on your largest server (15 minutes), connect to Grafana Cloud free tier, import dashboard ID 1860 (Node Exporter Full), and set one alert for disk usage >85%. This gives you both external availability monitoring and basic host metrics with zero ongoing cost if you use Grafana Cloud's free tier.

As a solo DevOps engineer managing production workloads for Commsult Indonesia, I cannot afford a dedicated SRE team or enterprise APM tooling. What I can afford is a self-hosted monitoring stack that costs under /month to run and tells me about problems before clients notice. After trying several combinations, I settled on Prometheus + Grafana for infrastructure metrics, Uptime Kuma for external availability monitoring, and Telegram for alerts. This combination covers inside (server health) and outside (is the site reachable?) with a setup time of under two hours.

The Inside vs Outside Monitoring Split

Effective monitoring requires two perspectives. Inside monitoring (Prometheus + Grafana) tells you what is happening on your server: CPU spikes, memory pressure, disk saturation, query latency, and application-level metrics. Outside monitoring (Uptime Kuma) tells you what users experience: is the site loading, does the API respond, is the TLS certificate valid? You need both because inside metrics can look healthy while the site is down (firewall block, DNS issue) and the site can be up while inside metrics show a slow memory leak building toward a crash.

Prometheus + Grafana for Infrastructure Metrics

Run Prometheus, Node Exporter, and Grafana as Docker containers on a dedicated lightweight VPS (a /month DigitalOcean Basic Droplet handles monitoring for 5-10 hosts). Prometheus scrapes Node Exporter every 15 seconds for 1000+ host metrics. Import Grafana dashboard ID 1860 (Node Exporter Full) immediately after connecting Prometheus — it gives you 30+ panels covering CPU, memory, disk, and network without building anything. Set up alert rules for the four critical thresholds: CPU >85% for 5 minutes, memory available <10%, disk usage >90%, and load average >4.

Uptime Kuma for External Checks

Uptime Kuma is a self-hosted monitoring tool with a clean UI and built-in Telegram/Slack/Discord notifications. Deploy it via Docker (docker run -d --restart always -p 3001:3001 -v uptime-kuma:/app/data louislam/uptime-kuma:1) and set up monitors for every public endpoint: HTTP(S) checks for your web apps and APIs, DNS checks for your domains, certificate expiry checks for TLS certs (alert at 14 days remaining), and TCP port checks for databases on private networks. Uptime Kuma’s status pages let you share a public availability dashboard with clients without exposing your Grafana internals.

┌─────────────────────────────────────────────────────┐
│           SOLO DEVOPS MONITORING STACK              │
└─────────────────────────────────────────────────────┘

  INSIDE VIEW                    OUTSIDE VIEW
  ───────────────                ─────────────────
  Node Exporter                  Uptime Kuma
       │                              │
       ▼                              ▼
  Prometheus ─────────────────► HTTP checks
       │                         TLS expiry
       ▼                         DNS checks
  Grafana                             │
  (Dashboard 1860)                    │
       │                              │
       └──────────────────────────────┘
                         │
                         ▼
                   Telegram Bot
                 (Alert Channel)

From my experience managing solo DevOps for Commsult Indonesia, set up a dedicated Telegram bot and channel for production alerts — separate from the main work chat. Production alerts in the main chat get ignored; a dedicated alerts channel with a distinct notification sound trains your brain to respond immediately. Use Grafana’s Alert Contact Point to route critical alerts (disk >90%, site down) to the alerts channel and informational alerts (high CPU for <5 minutes) to a separate noise channel you can mute at night.

Alert Fatigue: The Real Enemy

Alert fatigue kills monitoring. If your alerts fire constantly for non-urgent conditions, you start ignoring them — including the ones that matter. Design your alerts with three principles: actionability (every alert should have a clear response action), urgency separation (P1 alerts wake you up at 3am, P3 alerts are checked in the morning), and suppression (avoid duplicate alerts for the same root cause). For a solo operator, I recommend: P1 (site down, disk >95%, OOM imminent) triggers immediate Telegram notification, P2 (disk >80%, high error rate) triggers Telegram at business hours, P3 (informational trends) appears in Grafana only.

Application-Level Metrics with Prometheus Client

Node Exporter covers host metrics. For application-level observability, add the prom-client npm package to your NestJS app and expose a /metrics endpoint. Track: HTTP request duration (histogram), request count by route and status code, database query duration, cache hit/miss ratio, and any business-critical metrics (e.g., orders processed per minute for an ERP). These application metrics catch problems that host metrics miss: a memory leak in your NestJS heap, slow database queries, or a broken payment integration.

# docker-compose.yml — Prometheus + Grafana + Node Exporter
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  node-exporter:
    image: prom/node-exporter:latest
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana

  uptime-kuma:
    image: louislam/uptime-kuma:1
    ports:
      - "3001:3001"
    volumes:
      - uptime-kuma:/app/data

volumes:
  prometheus_data:
  grafana_data:
  uptime-kuma:

Log Aggregation on a Budget

Full log aggregation (ELK stack, Loki) adds significant resource overhead. For a solo operator, a pragmatic approach: use journalctl -u <service> -f for real-time log tailing, ship critical application errors to a free Sentry instance (generous free tier, excellent error grouping and stack traces), and use Loki only if you need to search historical logs across multiple servers. Loki + Promtail + Grafana is the lightweight Prometheus-ecosystem log stack — much simpler than ELK and integrates with your existing Grafana setup.

Early in my monitoring journey at Commsult Indonesia I added 40+ Grafana panels and 20+ alert rules to feel comprehensive. The result was dashboard paralysis and alert fatigue — I stopped checking Grafana because there was too much noise to extract signal. I now follow a rule: if I cannot describe the exact action I would take when this alert fires, I do not create the alert. Start with 5-6 critical alerts, learn from them for a month, then add more. Less is more in solo monitoring.

My Production Monitoring Stack

My current stack for Commsult Indonesia: Prometheus scraping Node Exporter on 4 servers (2 DigitalOcean Droplets, 2 GCP instances), Grafana with 2 dashboards (infrastructure overview and per-service application metrics), Uptime Kuma checking 12 HTTP endpoints and 3 TLS certificates, Sentry for application error tracking, and Telegram for all alert delivery. Total infrastructure cost: ~/month for a dedicated monitoring Droplet. Total setup time for a new project: ~90 minutes. This stack has caught 3 disk space issues, 2 memory leaks, and 1 expired certificate before they became incidents.

Getting Started in 30 Minutes

Fastest path to useful monitoring: deploy Uptime Kuma first (10 minutes), add HTTP monitors for all your production endpoints, configure Telegram notifications. Then deploy Prometheus + Node Exporter via Docker Compose on your largest server (15 minutes), connect to Grafana Cloud free tier (no self-hosting needed), import dashboard 1860, and set one alert for disk usage >85%. This gives you external availability monitoring and basic host metrics in 30 minutes with zero ongoing cost if you use Grafana Cloud free tier.

Sources & Further Reading

Uptime Kuma — Official Site and Docker Deployment — https://uptimekuma.org/
DCHost — VPS Monitoring Without Tears with Prometheus Grafana and Uptime Kuma — https://www.dchost.com/blog/en/vps-monitoring-and-alerts-without-tears-getting-started-with-prometheus-grafana-and-uptime-kuma/
AWS Builder — Building a Self-Hosted Monitoring Stack with Uptime Kuma Grafana and Prometheus — https://builder.aws.com/content/37UYQpI9EINmQYcV0EYWgHYC0W0/building-a-self-hosted-monitoring-stack-with-uptime-kuma-grafana-and-prometheus

Frequently Asked Questions

Monitoring and Alerting for Solo DevOps: A Practical Stack That Works

Frequently Asked Questions

Monitoring and Alerting for Solo DevOps: A Practical Stack That Works

The Inside vs Outside Monitoring Split

Prometheus + Grafana for Infrastructure Metrics

Uptime Kuma for External Checks

Alert Fatigue: The Real Enemy

Application-Level Metrics with Prometheus Client

Log Aggregation on a Budget

My Production Monitoring Stack

Getting Started in 30 Minutes

Sources & Further Reading

The Inside vs Outside Monitoring Split

Prometheus + Grafana for Infrastructure Metrics

Uptime Kuma for External Checks

Alert Fatigue: The Real Enemy

Application-Level Metrics with Prometheus Client

Log Aggregation on a Budget

My Production Monitoring Stack

Getting Started in 30 Minutes

Sources & Further Reading