Alerting & Notifications

Shardlyn monitors your nodes and instances and can alert you when things go wrong. Define rules based on metrics and receive notifications via email or in-app alerts.

Alert Rules

Alert rules define the conditions that trigger an alert. Each rule monitors a specific metric and fires when the threshold is exceeded for a given duration.

Creating an Alert Rule

Navigate to Settings > Alerts
Click Create Rule
Configure the rule:
- Name: A descriptive name (e.g., "High CPU on production nodes")
- Target: A specific node, instance, or all resources of a type
- Metric: The metric to monitor (CPU usage, memory usage, disk usage, etc.)
- Operator: Greater than, less than, or equal to
- Threshold: The value that triggers the alert
- Duration: How long the condition must persist before alerting (in seconds)
- Severity: info, warn, error, or critical
Configure notification channels:
- In-app: Show alert in the dashboard
- Email: Send notification via email
- Webhook: POST to a custom URL
Set a cooldown period to avoid alert fatigue (time between repeated alerts)
Click Create

Example Rules

Rule	Metric	Threshold	Duration	Severity
High CPU	CPU usage	> 90%	300s	warn
Low disk space	Disk usage	> 85%	60s	error
Node offline	Heartbeat missed	> 300s	300s	critical
High memory	Memory usage	> 95%	120s	error

Managing Rules

Enable/Disable: Toggle rules without deleting them
Edit: Modify thresholds, targets, or notification channels
Delete: Remove rules permanently

Alert Lifecycle

When a rule's condition is met:

Fired: The alert is created with the current metric value
Notified: Notifications are sent to configured channels
Cooldown: No duplicate alerts until the cooldown period expires
Resolved: When the condition clears (metric returns to normal)

Notification Channels

In-App

Alerts appear in the dashboard notification center. A badge shows the count of unresolved alerts.

Email

Alerts are sent to the organization's admin email addresses via the Resend email service. Emails include:

Alert severity and rule name
Current metric value vs. threshold
Affected resource (node or instance name)
Timestamp
Link to the dashboard

Webhook

Send alerts to external services (Slack, Discord, PagerDuty, etc.) via HTTP POST:

json

{
  "alert_id": "uuid",
  "rule_name": "High CPU on prod",
  "severity": "warn",
  "target_type": "node",
  "target_id": "uuid",
  "metric": "cpu_usage",
  "metric_value": 95.2,
  "threshold": 90,
  "message": "CPU usage exceeded 90% for 5 minutes",
  "fired_at": "2026-01-31T12:00:00Z"
}

Viewing Alerts

Navigate to Settings > Alerts to see:

Active (fired) alerts with severity indicators
Alert history with timestamps and metric values
Rule configurations

Best Practices

Start with conservative thresholds and tighten over time
Use duration to avoid false positives from short spikes
Set cooldown periods to prevent notification fatigue
Monitor disk space — running out of disk causes more outages than CPU or memory
Use severity levels to prioritize response: critical for immediate action, warn for investigation

Next Steps

Observability — Prometheus metrics, Grafana dashboards, and monitoring
Billing & Subscriptions — Plan limits that affect alert retention and features
Metrics Reference — Full list of available metrics for alert rules

Alerting & Notifications ​

Alert Rules ​

Creating an Alert Rule ​

Example Rules ​

Managing Rules ​

Alert Lifecycle ​

Notification Channels ​

In-App ​

Email ​

Webhook ​

Viewing Alerts ​

Best Practices ​

Next Steps ​