Monitoring¶
Comprehensive monitoring runbook for the Freeze Design webshop. This covers error tracking, uptime monitoring, analytics, and alert configuration for a solo-operator setup.
Monitoring Stack¶
| Tool | Purpose | Tier | Dashboard |
|---|---|---|---|
| Sentry | Error tracking and performance | Free (5K errors/mo) | sentry.io |
| UptimeRobot | Uptime and SSL monitoring | Free (50 monitors) | uptimerobot.com |
| PostHog | Product analytics and session replay | Free (1M events/mo) | eu.posthog.com |
| Discord | Alert notifications | Free | Webhook + Sentry bot |
All alerts route to Discord channels. No paid monitoring services are required.
Sentry Setup¶
Sentry captures application errors from both the Django backend and the Next.js frontend.
Backend (Django)¶
The sentry-sdk package auto-captures Django errors. No per-view instrumentation is needed -- the SDK hooks into Django's middleware and exception handling automatically.
# backend/config/settings.py (already configured)
SENTRY_DSN = os.getenv("SENTRY_DSN")
if SENTRY_DSN:
sentry_sdk.init(
dsn=SENTRY_DSN,
environment=os.getenv("SENTRY_ENVIRONMENT", "development"),
traces_sample_rate=float(os.getenv("SENTRY_TRACES_SAMPLE_RATE", "0.1")),
release=os.getenv("SENTRY_RELEASE"),
)
Frontend (Next.js)¶
The @sentry/nextjs package captures client-side and server-side errors:
// sentry.client.config.ts
Sentry.init({
dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
tracesSampleRate: 0.1,
});
Environment Variables¶
Set these in .env.production (and .env.staging with adjusted values):
| Variable | Example | Purpose |
|---|---|---|
SENTRY_DSN |
https://key@sentry.io/id |
Project DSN from Sentry |
SENTRY_ENVIRONMENT |
production |
Must be production for alert rules to fire |
SENTRY_RELEASE |
a1b2c3d (git SHA) |
Enables regression detection |
SENTRY_TRACES_SAMPLE_RATE |
0.1 |
Performance sampling (10%) |
Setting SENTRY_RELEASE¶
SENTRY_RELEASE enables Sentry to track which deploy introduced or reintroduced a bug. Without it, regression alerts (Rule 2) will not fire.
Set it at build time via CI/CD:
In docker-compose.prod.yml:
Sentry Alert Rules¶
Six alert rules route to two Discord channels: #errors and #payments.
Overview¶
| # | Rule Name | Type | Channel | Trigger | Interval |
|---|---|---|---|---|---|
| 1 | New Issue Alert | Issue | #errors | New issue created | 30 min |
| 2 | Regression Alert | Issue | #errors | Resolved issue reappears | 30 min |
| 3 | Payment Error Alert | Issue | #payments | New issue with domain:payment tag |
5 min |
| 4 | Error Spike (Warning) | Metric | #errors | >10 errors/hr | 1 hr |
| 5 | Error Spike (Critical) | Metric | #errors | >50 errors/hr | 1 hr |
| 6 | Payment Error Spike | Metric | #payments | >3 errors/hr with domain:payment |
30 min |
Issue Alerts¶
Issue alerts fire based on individual issue lifecycle events.
Rule 1: New Issue Alert¶
Fires when Sentry encounters an error it has never seen before.
| Field | Value |
|---|---|
| Name | New Issue Alert |
| Environment | production |
| When | "A new issue is created" |
| If | (no additional filters) |
| Then | "Send a Discord notification" to #errors |
| Action interval | 30 minutes |
Steps: Alerts > Create Alert > Issues > Set Conditions.
Rule 2: Regression Alert¶
Fires when a previously resolved issue reappears in a new release.
| Field | Value |
|---|---|
| Name | Regression Alert |
| Environment | production |
| When | "An issue changes state from resolved to regressed" |
| If | (no additional filters) |
| Then | "Send a Discord notification" to #errors |
| Action interval | 30 minutes |
Requires SENTRY_RELEASE to be set. Without release tracking, Sentry cannot detect regressions.
Rule 3: Payment Error Alert¶
Fires when a new payment-related error is created. Uses the domain:payment tag set by payment views via sentry_sdk.set_tag("domain", "payment").
| Field | Value |
|---|---|
| Name | Payment Error Alert |
| Environment | production |
| When | "A new issue is created" |
| If | "The issue's tags match domain equals payment" |
| Then | "Send a Discord notification" to #payments |
| Action interval | 5 minutes |
The 5-minute interval ensures rapid awareness -- payment errors are business-critical.
Metric Alerts¶
Metric alerts fire when aggregate error counts cross a threshold over a time window.
Rule 4: Error Spike (Warning)¶
| Field | Value |
|---|---|
| Name | Error Spike (Warning) |
| Environment | production |
| Metric | Number of errors |
| Threshold | Above 10 |
| Time window | 1 hour |
| Resolve threshold | Below 5 |
| Action | "Send a Discord notification" to #errors |
Rule 5: Error Spike (Critical)¶
| Field | Value |
|---|---|
| Name | Error Spike (Critical) |
| Environment | production |
| Metric | Number of errors |
| Threshold | Above 50 |
| Time window | 1 hour |
| Resolve threshold | Below 20 |
| Action | "Send a Discord notification" to #errors |
For a low-traffic solo-operator e-commerce site, 50 errors/hour indicates a systemic failure (bad deploy, database down, external service outage).
Rule 6: Payment Error Spike¶
| Field | Value |
|---|---|
| Name | Payment Error Spike |
| Environment | production |
| Metric | Number of errors |
| Filter | domain:payment tag |
| Threshold | Above 3 |
| Time window | 1 hour |
| Resolve threshold | Below 1 |
| Action | "Send a Discord notification" to #payments |
Even 3 payment errors in an hour is concerning for a low-traffic site. Payment failures directly impact revenue and customer trust.
Sentry Discord Integration¶
Sentry uses its native Discord bot (not custom webhooks) to post alerts to #errors and #payments. This is separate from the custom Discord webhooks used by UptimeRobot, backup notifications, and admin audit logs.
Step 1: Install the Integration¶
- Log in to sentry.io
- Go to Settings > your organization > Integrations
- Search for Discord and click Install
- Authorize the Sentry bot on your Discord server
- Confirm the integration shows as "Installed"
Step 2: Enable Developer Mode in Discord¶
You need numeric channel IDs for Sentry alert routing.
- Open Discord > User Settings (gear icon)
- Navigate to Advanced (under "App Settings")
- Toggle Developer Mode to ON
- You can now right-click any channel and select Copy Channel ID
Step 3: Copy Channel IDs¶
Right-click each channel and copy the numeric ID:
| Channel | Used By |
|---|---|
#errors |
Rules 1, 2, 4, 5 |
#payments |
Rules 3, 6 |
When creating alert rules, paste the channel ID into the "Send a Discord notification" action. Use the numeric ID, not the channel name.
Bot Permissions¶
The Sentry Discord bot needs these permissions in the target channels:
- Send Messages
- Embed Links (for rich alert formatting)
If the bot cannot post, check Discord > Server Settings > Roles > Sentry bot role.
UptimeRobot Setup¶
UptimeRobot provides external uptime monitoring from outside your infrastructure. It catches issues that Sentry cannot: DNS failures, SSL expiry, network routing problems, complete VPS outages.
Free Tier Limits¶
| Capability | Limit |
|---|---|
| Monitors | 50 (you need 2-3) |
| Check interval | 5 minutes |
| Email alerts | Unlimited |
| Webhook alerts | Unlimited |
No ongoing cost for the use case described here.
Account Setup¶
- Sign up at uptimerobot.com (free)
- Verify your email address
- Log in to the dashboard
Monitor 1: API Health Check¶
Checks that the backend API is responding correctly.
- Click Add New Monitor
- Configure:
- Monitor Type: HTTP(s)
- Friendly Name: API Health Check
- URL:
https://freezedesign.eu/api/health/ - Monitoring Interval: 5 minutes
- Advanced settings:
- Request Method: GET
- Expected HTTP Status: 200
- Request Timeout: 30 seconds
- Trigger Alert After: 2 consecutive failures
- Enable alert contacts (email + Discord webhook)
- Click Create Monitor
Monitor 2: SSL Certificate Expiry¶
Alerts 7 days before the SSL certificate expires.
- Click Add New Monitor
- Configure:
- Monitor Type: HTTP(s)
- Friendly Name: SSL Certificate Expiry
- URL:
https://freezedesign.eu - Monitoring Interval: 1440 minutes (24 hours)
- Advanced settings:
- Request Method: HEAD
- SSL Certificate Expiry: Enable
- Alert When Certificate Expires In: 7 days
- Enable alert contacts (email + Discord webhook)
- Click Create Monitor
Monitor 3: Homepage (Optional)¶
Verifies the frontend is accessible. Recommended because the API health check only covers the backend.
- Click Add New Monitor
- Configure:
- Monitor Type: HTTP(s)
- Friendly Name: Homepage
- URL:
https://freezedesign.eu/ - Monitoring Interval: 5 minutes
- Advanced settings:
- Expected HTTP Status: 200
- Trigger Alert After: 2 consecutive failures
- Enable alert contacts
- Click Create Monitor
UptimeRobot Discord Webhook¶
UptimeRobot uses a custom Discord webhook (not the Sentry bot) to post alerts to a #monitoring channel.
Create the Webhook¶
- In Discord, right-click the
#monitoringchannel > Edit Channel - Go to Integrations > Webhooks > New Webhook
- Name it "Production Monitoring"
- Copy the webhook URL
Configure in UptimeRobot¶
- Go to My Settings > Alert Contacts > Add Alert Contact
- Select Webhook as the contact type
- Configure:
- Friendly Name: Discord Monitoring Channel
- URL: Paste the Discord webhook URL
- POST Value: "Send as JSON (application/json)"
- JSON Payload:
{
"content": "**ALERT: *monitorFriendlyName***",
"embeds": [{
"title": "*alertTypeFriendlyName*",
"description": "*monitorURL* is *alertDetails*",
"color": 15158332,
"timestamp": "*alertDateTime*"
}]
}
- Click Create Alert Contact
- UptimeRobot sends a test notification to verify the webhook
UptimeRobot replaces the *variable* placeholders automatically:
| Placeholder | Replaced With |
|---|---|
*monitorFriendlyName* |
Monitor name (e.g., "API Health Check") |
*alertTypeFriendlyName* |
Alert type (e.g., "Down", "Up") |
*monitorURL* |
The monitored URL |
*alertDetails* |
Details (e.g., "is DOWN since 2026-02-01 12:00:00") |
*alertDateTime* |
ISO 8601 timestamp |
Health Check Endpoint¶
The backend exposes GET /api/health/ to verify critical services. UptimeRobot monitors this endpoint externally; Docker uses it for container health checks internally.
Request¶
Healthy Response (HTTP 200)¶
{
"status": "healthy",
"checks": {
"database": {
"status": "healthy",
"message": "Database connection successful"
},
"redis_cache": {
"status": "healthy",
"message": "Redis cache connection successful"
},
"celery": {
"status": "healthy",
"message": "Celery workers active: 2",
"workers": ["celery@worker1", "celery@worker2"]
}
}
}
Unhealthy Response (HTTP 503)¶
{
"status": "unhealthy",
"checks": {
"database": {
"status": "unhealthy",
"message": "Database error: connection refused"
},
"redis_cache": {
"status": "healthy",
"message": "Redis cache connection successful"
},
"celery": {
"status": "warning",
"message": "No Celery workers responding"
}
}
}
What Each Check Verifies¶
| Check | Verifies | Failure Impact |
|---|---|---|
database |
PostgreSQL connectivity (simple SELECT) | Site non-functional |
redis_cache |
Cache read/write operations | Degraded performance |
celery |
Worker availability via Redis | Background tasks stalled (warning only) |
The health check is stricter than the frontend -- it validates all backend services, not just whether the application process is running.
PostHog Analytics¶
PostHog provides product analytics, session recording, feature flags, and A/B testing. It runs on the EU-hosted instance (eu.posthog.com) for GDPR compliance.
For full setup instructions, custom events, and dashboard configuration, see the PostHog Analytics Guide.
Quick Reference¶
| Variable | Value |
|---|---|
NEXT_PUBLIC_POSTHOG_KEY |
Your project API key |
NEXT_PUBLIC_POSTHOG_HOST |
https://eu.i.posthog.com |
PostHog is a client-side analytics tool and does not generate operational alerts. Use Sentry for error alerting and UptimeRobot for availability alerting.
Verification¶
After setting up the monitoring stack, verify each component works end-to-end.
Test Sentry Alerts¶
Open a Django management shell on production (or staging):
Test 1: New Issue Alert (Rule 1)¶
Wait up to 30 minutes. Check #errors for a Discord notification.
Test 2: Payment Error Alert (Rule 3)¶
import sentry_sdk
sentry_sdk.set_tag("domain", "payment")
sentry_sdk.capture_message("Test alert: payment error verification", level="error")
Wait up to 5 minutes. Check #payments for a Discord notification.
Test 3: Regression Detection (Rule 2)¶
- Find the test issues from Test 1 and Test 2 in Sentry
- Mark them as Resolved
- Re-run the same commands from Test 1 and Test 2
- If
SENTRY_RELEASEis configured correctly, Sentry creates a regression event - Check
#errorsfor a regression notification
Test 4: Metric Alerts (Rules 4-6)¶
Metric alerts require crossing the threshold. For testing:
- Edit "Error Spike (Warning)" and temporarily set threshold to 1
- Trigger a test error (see Test 1)
- Wait for the metric alert notification in
#errors - Restore the original threshold (10) after verification
Test UptimeRobot Alerts¶
Test Webhook¶
- In UptimeRobot, go to My Settings > Alert Contacts
- Find "Discord Monitoring Channel" and click Test
- Check
#monitoringfor a test message
Test Monitor (Staging Only)¶
- Stop the backend on staging:
- Wait 5-10 minutes for UptimeRobot to detect the outage
- Check
#monitoringfor a "Down" alert - Restart the backend:
- Wait for an "Up" alert in
#monitoring
Only test on staging to avoid customer impact.
Verification Checklist¶
| Component | Test Method | Expected Channel | Expected Timing |
|---|---|---|---|
| New Issue Alert | capture_message |
#errors | Within 30 min |
| Regression Alert | Resolve + re-trigger | #errors | Within 30 min |
| Payment Error Alert | capture_message with tag |
#payments | Within 5 min |
| Error Spike (Warning) | Lower threshold temporarily | #errors | Within 1 hour |
| Error Spike (Critical) | Lower threshold temporarily | #errors | Within 1 hour |
| Payment Error Spike | Lower threshold temporarily | #payments | Within 30 min |
| UptimeRobot webhook | Test button in dashboard | #monitoring | Immediate |
| UptimeRobot monitor | Stop backend on staging | #monitoring | Within 10 min |
Troubleshooting¶
Sentry Alerts Not Firing¶
- Check environment variable: Verify
SENTRY_ENVIRONMENT=productionis set. Alert rules filtered to "production" will not fire in "development" or "staging".
-
Check Discord integration: Go to Sentry > Settings > Integrations > Discord. Ensure it shows "Installed" and the bot has permissions in your Discord server.
-
Check channel ID: Ensure you used the numeric channel ID, not the channel name. Right-click the channel in Discord with Developer Mode enabled.
-
Check alert status: Go to Sentry > Alerts. Each rule should show "Active", not "Disabled".
Regressions Not Detected¶
- Verify
SENTRY_RELEASEis set:
docker compose exec backend python -c "import sentry_sdk; print(sentry_sdk.Hub.current.client.options.get('release'))"
If it returns None, the release is not configured. Add SENTRY_RELEASE=$(git rev-parse HEAD) to your deployment script or .env.production.
Payment Alerts Going to #errors Instead of #payments¶
- Verify payment views set the tag:
sentry_sdk.set_tag("domain", "payment")must be called before any error can be captured in payment code paths. - Verify the Payment Error Alert rule has the tag filter
domainequalspayment.
UptimeRobot Webhook Not Working¶
- Verify the webhook URL is correct in UptimeRobot
- Check that the Discord webhook has not been deleted
- Test the webhook manually:
curl -H "Content-Type: application/json" \
-d '{"content": "Test message from curl"}' \
https://discord.com/api/webhooks/YOUR_WEBHOOK_URL
- Check the alert contact is enabled for the monitor
False Positive UptimeRobot Alerts¶
If you receive alerts but the service is actually up:
- Increase "Trigger Alert After" from 2 to 3-4 consecutive failures
- Increase request timeout from 30s to 60s
- Verify the health endpoint responds in under 30 seconds
- Check that your server does not block UptimeRobot IPs (unlikely, but check firewall rules)
Health Check Returns 503 But Site Loads¶
The health check verifies all backend services, not just the application process. If /api/health/ returns 503 but the frontend loads:
- Check backend logs:
docker logs backend --tail 100 - Check database:
docker compose exec backend python manage.py dbshell - Check Redis:
docker compose exec redis redis-cli ping - Check Celery:
docker compose exec backend celery -A config inspect stats
The most common cause is Celery workers being down (reported as a warning, but if the database is also failing, the endpoint returns 503).
Alert Response Procedures¶
API Down Alert¶
- Check server status:
ssh user@production-host 'docker ps' - Check backend logs:
ssh user@production-host 'docker logs backend --tail 100' - Check health endpoint:
curl https://freezedesign.eu/api/health/ - Common causes:
- Database connection lost -- check PostgreSQL container
- Redis connection lost -- check Redis container
- Backend container crashed -- check Docker logs
- Out of memory -- check VPS memory:
free -h - If a quick fix is not obvious, consider rollback:
cd /opt/webshop && ./scripts/rollback.sh
SSL Certificate Expiry Alert¶
- SSH to the production server
- Renew the certificate:
- Verify:
curl -vI https://freezedesign.eu 2>&1 | grep "expire date" - Check the auto-renewal cron job:
systemctl status cron && crontab -l
Discord Channel Summary¶
| Channel | Source | Alert Types |
|---|---|---|
#errors |
Sentry bot | New issues, regressions, error spikes |
#payments |
Sentry bot | Payment errors, payment error spikes |
#monitoring |
UptimeRobot webhook | Uptime alerts, SSL expiry |
#backups |
Custom webhook | Backup success/failure (see deployment docs) |
#admin |
Custom webhook | Admin audit events (see deployment docs) |