Alerting & Notifications
Shardlyn monitors your nodes and instances and can alert you when things go wrong. Define rules based on metrics and receive notifications via email or in-app alerts.
Alert Rules
Alert rules define the conditions that trigger an alert. Each rule monitors a specific metric and fires when the threshold is exceeded for a given duration.
Creating an Alert Rule
- Navigate to Settings > Alerts
- Click Create Rule
- Configure the rule:
- Name: A descriptive name (e.g., "High CPU on production nodes")
- Target: A specific node, instance, or all resources of a type
- Metric: The metric to monitor (CPU usage, memory usage, disk usage, etc.)
- Operator: Greater than, less than, or equal to
- Threshold: The value that triggers the alert
- Duration: How long the condition must persist before alerting (in seconds)
- Severity:
info,warn,error, orcritical
- Configure notification channels:
- In-app: Show alert in the dashboard
- Email: Send notification via email
- Webhook: POST to a custom URL
- Set a cooldown period to avoid alert fatigue (time between repeated alerts)
- Click Create
Example Rules
| Rule | Metric | Threshold | Duration | Severity |
|---|---|---|---|---|
| High CPU | CPU usage | > 90% | 300s | warn |
| Low disk space | Disk usage | > 85% | 60s | error |
| Node offline | Heartbeat missed | > 300s | 300s | critical |
| High memory | Memory usage | > 95% | 120s | error |
Managing Rules
- Enable/Disable: Toggle rules without deleting them
- Edit: Modify thresholds, targets, or notification channels
- Delete: Remove rules permanently
Alert Lifecycle
When a rule's condition is met:
- Fired: The alert is created with the current metric value
- Notified: Notifications are sent to configured channels
- Cooldown: No duplicate alerts until the cooldown period expires
- Resolved: When the condition clears (metric returns to normal)
Notification Channels
In-App
Alerts appear in the dashboard notification center. A badge shows the count of unresolved alerts.
Email
Alerts are sent to the organization's admin email addresses via the Resend email service. Emails include:
- Alert severity and rule name
- Current metric value vs. threshold
- Affected resource (node or instance name)
- Timestamp
- Link to the dashboard
Webhook
Send alerts to external services (Slack, Discord, PagerDuty, etc.) via HTTP POST:
json
{
"alert_id": "uuid",
"rule_name": "High CPU on prod",
"severity": "warn",
"target_type": "node",
"target_id": "uuid",
"metric": "cpu_usage",
"metric_value": 95.2,
"threshold": 90,
"message": "CPU usage exceeded 90% for 5 minutes",
"fired_at": "2026-01-31T12:00:00Z"
}Viewing Alerts
Navigate to Settings > Alerts to see:
- Active (fired) alerts with severity indicators
- Alert history with timestamps and metric values
- Rule configurations
Best Practices
- Start with conservative thresholds and tighten over time
- Use duration to avoid false positives from short spikes
- Set cooldown periods to prevent notification fatigue
- Monitor disk space — running out of disk causes more outages than CPU or memory
- Use severity levels to prioritize response:
criticalfor immediate action,warnfor investigation
Next Steps
- Observability — Prometheus metrics, Grafana dashboards, and monitoring
- Billing & Subscriptions — Plan limits that affect alert retention and features
- Metrics Reference — Full list of available metrics for alert rules