Skip to content

Monitoring

Comprehensive monitoring runbook for the Freeze Design webshop. This covers error tracking, uptime monitoring, analytics, and alert configuration for a solo-operator setup.

Monitoring Stack

Tool Purpose Tier Dashboard
Sentry Error tracking and performance Free (5K errors/mo) sentry.io
UptimeRobot Uptime and SSL monitoring Free (50 monitors) uptimerobot.com
PostHog Product analytics and session replay Free (1M events/mo) eu.posthog.com
Discord Alert notifications Free Webhook + Sentry bot

All alerts route to Discord channels. No paid monitoring services are required.


Sentry Setup

Sentry captures application errors from both the Django backend and the Next.js frontend.

Backend (Django)

The sentry-sdk package auto-captures Django errors. No per-view instrumentation is needed -- the SDK hooks into Django's middleware and exception handling automatically.

# backend/config/settings.py (already configured)
SENTRY_DSN = os.getenv("SENTRY_DSN")
if SENTRY_DSN:
    sentry_sdk.init(
        dsn=SENTRY_DSN,
        environment=os.getenv("SENTRY_ENVIRONMENT", "development"),
        traces_sample_rate=float(os.getenv("SENTRY_TRACES_SAMPLE_RATE", "0.1")),
        release=os.getenv("SENTRY_RELEASE"),
    )

Frontend (Next.js)

The @sentry/nextjs package captures client-side and server-side errors:

// sentry.client.config.ts
Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  tracesSampleRate: 0.1,
});

Environment Variables

Set these in .env.production (and .env.staging with adjusted values):

Variable Example Purpose
SENTRY_DSN https://key@sentry.io/id Project DSN from Sentry
SENTRY_ENVIRONMENT production Must be production for alert rules to fire
SENTRY_RELEASE a1b2c3d (git SHA) Enables regression detection
SENTRY_TRACES_SAMPLE_RATE 0.1 Performance sampling (10%)

Setting SENTRY_RELEASE

SENTRY_RELEASE enables Sentry to track which deploy introduced or reintroduced a bug. Without it, regression alerts (Rule 2) will not fire.

Set it at build time via CI/CD:

docker build --build-arg GIT_SHA=$(git rev-parse HEAD) -t freeze-backend .

In docker-compose.prod.yml:

backend:
  environment:
    - SENTRY_RELEASE=${GIT_SHA:-unknown}

Sentry Alert Rules

Six alert rules route to two Discord channels: #errors and #payments.

Overview

# Rule Name Type Channel Trigger Interval
1 New Issue Alert Issue #errors New issue created 30 min
2 Regression Alert Issue #errors Resolved issue reappears 30 min
3 Payment Error Alert Issue #payments New issue with domain:payment tag 5 min
4 Error Spike (Warning) Metric #errors >10 errors/hr 1 hr
5 Error Spike (Critical) Metric #errors >50 errors/hr 1 hr
6 Payment Error Spike Metric #payments >3 errors/hr with domain:payment 30 min

Issue Alerts

Issue alerts fire based on individual issue lifecycle events.

Rule 1: New Issue Alert

Fires when Sentry encounters an error it has never seen before.

Field Value
Name New Issue Alert
Environment production
When "A new issue is created"
If (no additional filters)
Then "Send a Discord notification" to #errors
Action interval 30 minutes

Steps: Alerts > Create Alert > Issues > Set Conditions.

Rule 2: Regression Alert

Fires when a previously resolved issue reappears in a new release.

Field Value
Name Regression Alert
Environment production
When "An issue changes state from resolved to regressed"
If (no additional filters)
Then "Send a Discord notification" to #errors
Action interval 30 minutes

Requires SENTRY_RELEASE to be set. Without release tracking, Sentry cannot detect regressions.

Rule 3: Payment Error Alert

Fires when a new payment-related error is created. Uses the domain:payment tag set by payment views via sentry_sdk.set_tag("domain", "payment").

Field Value
Name Payment Error Alert
Environment production
When "A new issue is created"
If "The issue's tags match domain equals payment"
Then "Send a Discord notification" to #payments
Action interval 5 minutes

The 5-minute interval ensures rapid awareness -- payment errors are business-critical.

Metric Alerts

Metric alerts fire when aggregate error counts cross a threshold over a time window.

Rule 4: Error Spike (Warning)

Field Value
Name Error Spike (Warning)
Environment production
Metric Number of errors
Threshold Above 10
Time window 1 hour
Resolve threshold Below 5
Action "Send a Discord notification" to #errors

Rule 5: Error Spike (Critical)

Field Value
Name Error Spike (Critical)
Environment production
Metric Number of errors
Threshold Above 50
Time window 1 hour
Resolve threshold Below 20
Action "Send a Discord notification" to #errors

For a low-traffic solo-operator e-commerce site, 50 errors/hour indicates a systemic failure (bad deploy, database down, external service outage).

Rule 6: Payment Error Spike

Field Value
Name Payment Error Spike
Environment production
Metric Number of errors
Filter domain:payment tag
Threshold Above 3
Time window 1 hour
Resolve threshold Below 1
Action "Send a Discord notification" to #payments

Even 3 payment errors in an hour is concerning for a low-traffic site. Payment failures directly impact revenue and customer trust.


Sentry Discord Integration

Sentry uses its native Discord bot (not custom webhooks) to post alerts to #errors and #payments. This is separate from the custom Discord webhooks used by UptimeRobot, backup notifications, and admin audit logs.

Step 1: Install the Integration

  1. Log in to sentry.io
  2. Go to Settings > your organization > Integrations
  3. Search for Discord and click Install
  4. Authorize the Sentry bot on your Discord server
  5. Confirm the integration shows as "Installed"

Step 2: Enable Developer Mode in Discord

You need numeric channel IDs for Sentry alert routing.

  1. Open Discord > User Settings (gear icon)
  2. Navigate to Advanced (under "App Settings")
  3. Toggle Developer Mode to ON
  4. You can now right-click any channel and select Copy Channel ID

Step 3: Copy Channel IDs

Right-click each channel and copy the numeric ID:

Channel Used By
#errors Rules 1, 2, 4, 5
#payments Rules 3, 6

When creating alert rules, paste the channel ID into the "Send a Discord notification" action. Use the numeric ID, not the channel name.

Bot Permissions

The Sentry Discord bot needs these permissions in the target channels:

  • Send Messages
  • Embed Links (for rich alert formatting)

If the bot cannot post, check Discord > Server Settings > Roles > Sentry bot role.


UptimeRobot Setup

UptimeRobot provides external uptime monitoring from outside your infrastructure. It catches issues that Sentry cannot: DNS failures, SSL expiry, network routing problems, complete VPS outages.

Free Tier Limits

Capability Limit
Monitors 50 (you need 2-3)
Check interval 5 minutes
Email alerts Unlimited
Webhook alerts Unlimited

No ongoing cost for the use case described here.

Account Setup

  1. Sign up at uptimerobot.com (free)
  2. Verify your email address
  3. Log in to the dashboard

Monitor 1: API Health Check

Checks that the backend API is responding correctly.

  1. Click Add New Monitor
  2. Configure:
  3. Monitor Type: HTTP(s)
  4. Friendly Name: API Health Check
  5. URL: https://freezedesign.eu/api/health/
  6. Monitoring Interval: 5 minutes
  7. Advanced settings:
  8. Request Method: GET
  9. Expected HTTP Status: 200
  10. Request Timeout: 30 seconds
  11. Trigger Alert After: 2 consecutive failures
  12. Enable alert contacts (email + Discord webhook)
  13. Click Create Monitor

Monitor 2: SSL Certificate Expiry

Alerts 7 days before the SSL certificate expires.

  1. Click Add New Monitor
  2. Configure:
  3. Monitor Type: HTTP(s)
  4. Friendly Name: SSL Certificate Expiry
  5. URL: https://freezedesign.eu
  6. Monitoring Interval: 1440 minutes (24 hours)
  7. Advanced settings:
  8. Request Method: HEAD
  9. SSL Certificate Expiry: Enable
  10. Alert When Certificate Expires In: 7 days
  11. Enable alert contacts (email + Discord webhook)
  12. Click Create Monitor

Monitor 3: Homepage (Optional)

Verifies the frontend is accessible. Recommended because the API health check only covers the backend.

  1. Click Add New Monitor
  2. Configure:
  3. Monitor Type: HTTP(s)
  4. Friendly Name: Homepage
  5. URL: https://freezedesign.eu/
  6. Monitoring Interval: 5 minutes
  7. Advanced settings:
  8. Expected HTTP Status: 200
  9. Trigger Alert After: 2 consecutive failures
  10. Enable alert contacts
  11. Click Create Monitor

UptimeRobot Discord Webhook

UptimeRobot uses a custom Discord webhook (not the Sentry bot) to post alerts to a #monitoring channel.

Create the Webhook

  1. In Discord, right-click the #monitoring channel > Edit Channel
  2. Go to Integrations > Webhooks > New Webhook
  3. Name it "Production Monitoring"
  4. Copy the webhook URL

Configure in UptimeRobot

  1. Go to My Settings > Alert Contacts > Add Alert Contact
  2. Select Webhook as the contact type
  3. Configure:
  4. Friendly Name: Discord Monitoring Channel
  5. URL: Paste the Discord webhook URL
  6. POST Value: "Send as JSON (application/json)"
  7. JSON Payload:
{
  "content": "**ALERT: *monitorFriendlyName***",
  "embeds": [{
    "title": "*alertTypeFriendlyName*",
    "description": "*monitorURL* is *alertDetails*",
    "color": 15158332,
    "timestamp": "*alertDateTime*"
  }]
}
  1. Click Create Alert Contact
  2. UptimeRobot sends a test notification to verify the webhook

UptimeRobot replaces the *variable* placeholders automatically:

Placeholder Replaced With
*monitorFriendlyName* Monitor name (e.g., "API Health Check")
*alertTypeFriendlyName* Alert type (e.g., "Down", "Up")
*monitorURL* The monitored URL
*alertDetails* Details (e.g., "is DOWN since 2026-02-01 12:00:00")
*alertDateTime* ISO 8601 timestamp

Health Check Endpoint

The backend exposes GET /api/health/ to verify critical services. UptimeRobot monitors this endpoint externally; Docker uses it for container health checks internally.

Request

curl https://freezedesign.eu/api/health/

Healthy Response (HTTP 200)

{
  "status": "healthy",
  "checks": {
    "database": {
      "status": "healthy",
      "message": "Database connection successful"
    },
    "redis_cache": {
      "status": "healthy",
      "message": "Redis cache connection successful"
    },
    "celery": {
      "status": "healthy",
      "message": "Celery workers active: 2",
      "workers": ["celery@worker1", "celery@worker2"]
    }
  }
}

Unhealthy Response (HTTP 503)

{
  "status": "unhealthy",
  "checks": {
    "database": {
      "status": "unhealthy",
      "message": "Database error: connection refused"
    },
    "redis_cache": {
      "status": "healthy",
      "message": "Redis cache connection successful"
    },
    "celery": {
      "status": "warning",
      "message": "No Celery workers responding"
    }
  }
}

What Each Check Verifies

Check Verifies Failure Impact
database PostgreSQL connectivity (simple SELECT) Site non-functional
redis_cache Cache read/write operations Degraded performance
celery Worker availability via Redis Background tasks stalled (warning only)

The health check is stricter than the frontend -- it validates all backend services, not just whether the application process is running.


PostHog Analytics

PostHog provides product analytics, session recording, feature flags, and A/B testing. It runs on the EU-hosted instance (eu.posthog.com) for GDPR compliance.

For full setup instructions, custom events, and dashboard configuration, see the PostHog Analytics Guide.

Quick Reference

Variable Value
NEXT_PUBLIC_POSTHOG_KEY Your project API key
NEXT_PUBLIC_POSTHOG_HOST https://eu.i.posthog.com

PostHog is a client-side analytics tool and does not generate operational alerts. Use Sentry for error alerting and UptimeRobot for availability alerting.


Verification

After setting up the monitoring stack, verify each component works end-to-end.

Test Sentry Alerts

Open a Django management shell on production (or staging):

docker compose exec backend python manage.py shell

Test 1: New Issue Alert (Rule 1)

import sentry_sdk
sentry_sdk.capture_message("Test alert: new issue verification", level="error")

Wait up to 30 minutes. Check #errors for a Discord notification.

Test 2: Payment Error Alert (Rule 3)

import sentry_sdk
sentry_sdk.set_tag("domain", "payment")
sentry_sdk.capture_message("Test alert: payment error verification", level="error")

Wait up to 5 minutes. Check #payments for a Discord notification.

Test 3: Regression Detection (Rule 2)

  1. Find the test issues from Test 1 and Test 2 in Sentry
  2. Mark them as Resolved
  3. Re-run the same commands from Test 1 and Test 2
  4. If SENTRY_RELEASE is configured correctly, Sentry creates a regression event
  5. Check #errors for a regression notification

Test 4: Metric Alerts (Rules 4-6)

Metric alerts require crossing the threshold. For testing:

  1. Edit "Error Spike (Warning)" and temporarily set threshold to 1
  2. Trigger a test error (see Test 1)
  3. Wait for the metric alert notification in #errors
  4. Restore the original threshold (10) after verification

Test UptimeRobot Alerts

Test Webhook

  1. In UptimeRobot, go to My Settings > Alert Contacts
  2. Find "Discord Monitoring Channel" and click Test
  3. Check #monitoring for a test message

Test Monitor (Staging Only)

  1. Stop the backend on staging:
    ssh user@staging-host 'docker compose stop backend'
    
  2. Wait 5-10 minutes for UptimeRobot to detect the outage
  3. Check #monitoring for a "Down" alert
  4. Restart the backend:
    ssh user@staging-host 'docker compose start backend'
    
  5. Wait for an "Up" alert in #monitoring

Only test on staging to avoid customer impact.

Verification Checklist

Component Test Method Expected Channel Expected Timing
New Issue Alert capture_message #errors Within 30 min
Regression Alert Resolve + re-trigger #errors Within 30 min
Payment Error Alert capture_message with tag #payments Within 5 min
Error Spike (Warning) Lower threshold temporarily #errors Within 1 hour
Error Spike (Critical) Lower threshold temporarily #errors Within 1 hour
Payment Error Spike Lower threshold temporarily #payments Within 30 min
UptimeRobot webhook Test button in dashboard #monitoring Immediate
UptimeRobot monitor Stop backend on staging #monitoring Within 10 min

Troubleshooting

Sentry Alerts Not Firing

  1. Check environment variable: Verify SENTRY_ENVIRONMENT=production is set. Alert rules filtered to "production" will not fire in "development" or "staging".
docker compose exec backend python -c "import os; print(os.getenv('SENTRY_ENVIRONMENT'))"
  1. Check Discord integration: Go to Sentry > Settings > Integrations > Discord. Ensure it shows "Installed" and the bot has permissions in your Discord server.

  2. Check channel ID: Ensure you used the numeric channel ID, not the channel name. Right-click the channel in Discord with Developer Mode enabled.

  3. Check alert status: Go to Sentry > Alerts. Each rule should show "Active", not "Disabled".

Regressions Not Detected

  1. Verify SENTRY_RELEASE is set:
docker compose exec backend python -c "import sentry_sdk; print(sentry_sdk.Hub.current.client.options.get('release'))"

If it returns None, the release is not configured. Add SENTRY_RELEASE=$(git rev-parse HEAD) to your deployment script or .env.production.

Payment Alerts Going to #errors Instead of #payments

  1. Verify payment views set the tag: sentry_sdk.set_tag("domain", "payment") must be called before any error can be captured in payment code paths.
  2. Verify the Payment Error Alert rule has the tag filter domain equals payment.

UptimeRobot Webhook Not Working

  1. Verify the webhook URL is correct in UptimeRobot
  2. Check that the Discord webhook has not been deleted
  3. Test the webhook manually:
curl -H "Content-Type: application/json" \
  -d '{"content": "Test message from curl"}' \
  https://discord.com/api/webhooks/YOUR_WEBHOOK_URL
  1. Check the alert contact is enabled for the monitor

False Positive UptimeRobot Alerts

If you receive alerts but the service is actually up:

  • Increase "Trigger Alert After" from 2 to 3-4 consecutive failures
  • Increase request timeout from 30s to 60s
  • Verify the health endpoint responds in under 30 seconds
  • Check that your server does not block UptimeRobot IPs (unlikely, but check firewall rules)

Health Check Returns 503 But Site Loads

The health check verifies all backend services, not just the application process. If /api/health/ returns 503 but the frontend loads:

  1. Check backend logs: docker logs backend --tail 100
  2. Check database: docker compose exec backend python manage.py dbshell
  3. Check Redis: docker compose exec redis redis-cli ping
  4. Check Celery: docker compose exec backend celery -A config inspect stats

The most common cause is Celery workers being down (reported as a warning, but if the database is also failing, the endpoint returns 503).

Alert Response Procedures

API Down Alert

  1. Check server status: ssh user@production-host 'docker ps'
  2. Check backend logs: ssh user@production-host 'docker logs backend --tail 100'
  3. Check health endpoint: curl https://freezedesign.eu/api/health/
  4. Common causes:
  5. Database connection lost -- check PostgreSQL container
  6. Redis connection lost -- check Redis container
  7. Backend container crashed -- check Docker logs
  8. Out of memory -- check VPS memory: free -h
  9. If a quick fix is not obvious, consider rollback: cd /opt/webshop && ./scripts/rollback.sh

SSL Certificate Expiry Alert

  1. SSH to the production server
  2. Renew the certificate:
    cd /opt/webshop
    docker compose -f docker-compose.prod.yml run --rm certbot renew
    docker compose -f docker-compose.prod.yml restart nginx
    
  3. Verify: curl -vI https://freezedesign.eu 2>&1 | grep "expire date"
  4. Check the auto-renewal cron job: systemctl status cron && crontab -l

Discord Channel Summary

Channel Source Alert Types
#errors Sentry bot New issues, regressions, error spikes
#payments Sentry bot Payment errors, payment error spikes
#monitoring UptimeRobot webhook Uptime alerts, SSL expiry
#backups Custom webhook Backup success/failure (see deployment docs)
#admin Custom webhook Admin audit events (see deployment docs)