Shardlyn Architecture
Overview
Shardlyn is a cloud-native, BYO-cloud control plane for deploying and managing containerized workloads across multiple cloud providers. It follows a declarative, pull-based architecture where lightweight agents on your servers report state and receive desired state from the Shardlyn control plane. Deploy game servers, web applications, databases, and more with a unified management experience.
System Architecture
Components
Control Plane (Managed)
The control plane is fully managed by Shardlyn and handles:
- API Server: REST API for the dashboard and external integrations
- Authentication: JWT-based auth with MFA support and GitHub OAuth
- Authorization: RBAC with organization-level isolation
- Reconciler: Computes desired state and distributes to agents
- Provisioner: Provisions cloud infrastructure via Terraform (AWS, GCP, Hetzner, OCI)
- Planner: Resource sizing and bin-packing algorithms
- WebSocket Hub: Real-time log streaming and interactive console
- Billing: Subscription management via Stripe
- Alerting: Rule-based alerts with email notifications
- DNS Management: Cloudflare DNS integration for custom domains
- SSH CA: Certificate authority for secure server access
- Backup Manager: Scheduled backups to S3-compatible storage
Agent (Runs on Your Nodes)
The agent is a lightweight Go binary that runs on each of your servers:
- Registers with the control plane using a one-time bootstrap token
- Reports heartbeat with resource usage and container states
- Receives desired state from the control plane
- Applies changes via Docker API (create, start, stop, remove)
- Syncs Git repositories to container volumes (for Git Deploy)
- Exposes Prometheus metrics for observability
Data Flow
State Machines
Instance States
Node States
Security Model
Authentication
Shardlyn supports multiple authentication methods:
- Email/Password: With bcrypt hashing and optional MFA (TOTP)
- GitHub OAuth: Sign in with your GitHub account
- API Tokens: For programmatic access and CI/CD integrations
- SSH Certificates: Signed by Shardlyn's CA for secure server access
Agent Authentication
1. You create a node registration token in the dashboard
2. The agent is installed on your server with the token
3. Agent calls the control plane with the token (one-time use)
4. Control plane returns a persistent auth token
5. Agent uses the auth token for all subsequent communicationRBAC Model
| Resource | Admin | User |
|---|---|---|
| Users | CRUD | R (self) |
| Nodes | CRUD | R |
| Workloads | CRUD | CRUD |
| Instances | CRUD | CRUD |
| Provisioning | CRUD | R |
| Organizations | CRUD | R |
| Billing | CRUD | R |
| Audit Logs | R | - |
Key Design Decisions
Pull-Based Communication
Agents poll the control plane (heartbeat) rather than control plane pushing to agents.
Why this matters:
- Simpler networking — no inbound ports needed on your servers
- Works behind NAT and firewalls
- Agents can be offline without affecting other nodes
- Natural rate limiting
Stateless Control Plane
All state lives in PostgreSQL. The control plane can be horizontally scaled.
Idempotent Operations
All agent operations are idempotent. Creating an already-running container is a no-op. This enables safe retries, crash recovery, and multiple reconciliation loops.
Declarative Configuration
Users declare desired state (workload spec), and the system converges to it. No imperative "start this container" commands.
Performance
| Setting | Default | Notes |
|---|---|---|
| Heartbeat interval | 10s | Lower = faster updates, more load |
| PostgreSQL pool | 10 connections | Configurable per deployment |
| Prometheus scrape | 15s | Configurable |
Related Documentation
- Getting Started — Deploy your first application
- Workload Specification — Declarative workload format reference
- API Reference — Full REST API and WebSocket endpoints
- Metrics Reference — Prometheus metrics reference
- Threat Model — Security architecture and mitigations
- Observability Guide — Monitoring and alerting setup