K3s has rapidly become the go-to lightweight Kubernetes distribution, packaging everything into a single binary under 70MB that runs on as little as 512MB of RAM. But lightweight does not mean unsuitable for production. With the right architecture and operational practices, K3s delivers enterprise-grade reliability for workloads of all sizes.
Kubo is a managed Kubernetes platform built on K3s, offering production-grade clusters from just ¥48,000/month (~$320/month). Many of the best practices outlined in this guide are automatically applied on Kubo, significantly reducing your infrastructure management burden.
Designing for High Availability
The most critical aspect of running K3s in production is high availability (HA). According to the K3s official documentation, an HA configuration requires a minimum of three server nodes, and the cluster must comprise an odd number of servers to maintain etcd quorum.
Choosing Your Datastore
K3s supports multiple datastore backends, each suited to different scenarios:
- Embedded etcd (recommended): Self-contained, easiest to manage. Suitable for most production deployments
- External PostgreSQL/MySQL: For large-scale clusters where you need to scale the datastore independently
- Embedded SQLite: Single-node only. Not recommended for production
When using embedded etcd, ensure server nodes can communicate on ports 2379-2380. Review the complete K3s system requirements to verify all networking prerequisites.
Load Balancer Strategy
Place a load balancer in front of your server nodes, but remember that a single load balancer becomes a single point of failure. Deploy redundant load balancers using Keepalived, or leverage cloud load balancers with built-in high availability.
Minimum hardware requirements per the official documentation:
- Server nodes: 2 CPU cores, 2GB RAM
- Agent nodes: 1 CPU core, 512MB RAM
- Storage: SSD recommended (NVMe preferred for etcd workloads)
Security Hardening
K3s ships with many security mitigations enabled by default, passing a number of CIS Kubernetes Benchmark controls out of the box. However, production environments require additional hardening.
Pod Security Standards
K3s v1.25+ supports Pod Security Admissions (PSA). Enable it with the --admission-control-config-file flag and enforce the restricted profile for production namespaces.
RBAC and Secrets Management
- Design RBAC policies following the principle of least privilege
- Encrypt Kubernetes Secrets at rest using the
--secrets-encryptionflag - Consider integrating external secret managers like HashiCorp Vault or cloud-native alternatives
Network Policies
K3s bundles a Network Policy controller by default. Implement Kubernetes Network Policies to restrict pod-to-pod communication to the minimum necessary.
Retrofitting security is always harder than building it in. Implement network policies, PSA, RBAC, and secrets management from Day 1.
With Kubo and Captain.AI, these security configurations are pre-applied at the platform level, letting you focus on your applications rather than infrastructure hardening.
Monitoring and Alerting
You cannot manage what you cannot see. Install comprehensive monitoring and alerting before issues become incidents.
Prometheus + Grafana Stack
The Prometheus and Grafana combination is the standard for K3s cluster monitoring:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace
Critical Metrics to Watch
| Metric | Threshold | Action |
|---|---|---|
| Server CPU utilization | > 90% | Consider adding nodes |
| Memory utilization | > 80% | Review resource limits |
| etcd latency | > 100ms | Optimize disk io |
| Pod restart count | Increasing trend | Investigate OOM/CrashLoop |
Refer to the K3s resource profiling documentation for guidance on appropriate resource allocation based on cluster size.
Log Aggregation
Use Fluentd or Fluent Bit to centralize logs into Elasticsearch or Grafana Loki. Note that K3s does not enable audit logging by default — enable it explicitly for production environments.
Backup and Disaster Recovery
Your ability to recover from failures defines the reliability of your production environment. Combine etcd snapshots with application-level backups for comprehensive protection.
etcd Snapshots
K3s provides built-in etcd snapshot capabilities:
# Manual snapshot
k3s etcd-snapshot save --name pre-upgrade-$(date +%Y%m%d)
# Automatic snapshot configuration (server startup options)
# --etcd-snapshot-schedule-cron "0 */4 * * *" # Every 4 hours
# --etcd-snapshot-retention 10 # Keep 10 snapshots
Configure automatic snapshots every 4-6 hours and store them externally in S3-compatible object storage.
Application Backups with Velero
Use Velero to back up Kubernetes resources and persistent volumes. This is essential for protecting application data that etcd snapshots alone cannot cover.
Test Your Restores
The value of a backup is determined by the success rate of your restores. Regularly test your restore procedures to verify that your RTO (Recovery Time Objective) and RPO (Recovery Point Objective) meet requirements.
Resource Management and Upgrade Strategy
Resource Requests and Limits
Set appropriate requests and limits for every workload. Over-provisioning wastes resources; under-provisioning causes instability.
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Rolling Upgrades
Automate K3s upgrades using the system-upgrade-controller. For production environments, follow this process:
- Test the new version in a staging environment
- Take an etcd snapshot
- Upgrade server nodes sequentially
- Upgrade worker nodes sequentially
- Verify application functionality
Storage Considerations
Use fast SSDs (preferably NVMe) for the K3s data directory at /var/lib/rancher/k3s. On ARM devices, avoid SD cards and eMMC storage — they cannot handle the io load required for stable etcd operation.
Production Readiness Checklist
K3s is lightweight yet fully capable of powering production workloads when properly configured. Use this checklist to verify your readiness:
- HA configuration (3+ odd-number server nodes with embedded etcd)
- Security hardening following the CIS benchmark guide
- Comprehensive monitoring with Prometheus + Grafana
- Automated etcd snapshots + Velero application backups
- Resource requests/limits set for all workloads
- Rolling upgrade procedures established and tested
Want to skip the operational complexity? Kubo provides managed K3s clusters from ¥48,000/month with HA, security, monitoring, and backups pre-configured. For AI workload orchestration, explore Captain.AI integration.
To learn more, visit Kubo or contact us.