Skip to content

Alerting & Notifications

Shardlyn monitors your nodes and instances and can alert you when things go wrong. Define rules based on metrics and receive notifications via email or in-app alerts.

Alert Rules

Alert rules define the conditions that trigger an alert. Each rule monitors a specific metric and fires when the threshold is exceeded for a given duration.

Creating an Alert Rule

  1. Navigate to Settings > Alerts
  2. Click Create Rule
  3. Configure the rule:
    • Name: A descriptive name (e.g., "High CPU on production nodes")
    • Target: A specific node, instance, or all resources of a type
    • Metric: The metric to monitor (CPU usage, memory usage, disk usage, etc.)
    • Operator: Greater than, less than, or equal to
    • Threshold: The value that triggers the alert
    • Duration: How long the condition must persist before alerting (in seconds)
    • Severity: info, warn, error, or critical
  4. Configure notification channels:
    • In-app: Show alert in the dashboard
    • Email: Send notification via email
    • Webhook: POST to a custom URL
  5. Set a cooldown period to avoid alert fatigue (time between repeated alerts)
  6. Click Create

Example Rules

RuleMetricThresholdDurationSeverity
High CPUCPU usage> 90%300swarn
Low disk spaceDisk usage> 85%60serror
Node offlineHeartbeat missed> 300s300scritical
High memoryMemory usage> 95%120serror

Managing Rules

  • Enable/Disable: Toggle rules without deleting them
  • Edit: Modify thresholds, targets, or notification channels
  • Delete: Remove rules permanently

Alert Lifecycle

When a rule's condition is met:

  1. Fired: The alert is created with the current metric value
  2. Notified: Notifications are sent to configured channels
  3. Cooldown: No duplicate alerts until the cooldown period expires
  4. Resolved: When the condition clears (metric returns to normal)

Notification Channels

In-App

Alerts appear in the dashboard notification center. A badge shows the count of unresolved alerts.

Email

Alerts are sent to the organization's admin email addresses via the Resend email service. Emails include:

  • Alert severity and rule name
  • Current metric value vs. threshold
  • Affected resource (node or instance name)
  • Timestamp
  • Link to the dashboard

Webhook

Send alerts to external services (Slack, Discord, PagerDuty, etc.) via HTTP POST:

json
{
  "alert_id": "uuid",
  "rule_name": "High CPU on prod",
  "severity": "warn",
  "target_type": "node",
  "target_id": "uuid",
  "metric": "cpu_usage",
  "metric_value": 95.2,
  "threshold": 90,
  "message": "CPU usage exceeded 90% for 5 minutes",
  "fired_at": "2026-01-31T12:00:00Z"
}

Viewing Alerts

Navigate to Settings > Alerts to see:

  • Active (fired) alerts with severity indicators
  • Alert history with timestamps and metric values
  • Rule configurations

Best Practices

  • Start with conservative thresholds and tighten over time
  • Use duration to avoid false positives from short spikes
  • Set cooldown periods to prevent notification fatigue
  • Monitor disk space — running out of disk causes more outages than CPU or memory
  • Use severity levels to prioritize response: critical for immediate action, warn for investigation

Next Steps

Built for teams that want control of their own infrastructure.