The Complete Guide to K3s Production Best Practices

K3s has rapidly become the go-to lightweight Kubernetes distribution, packaging everything into a single binary under 70MB that runs on as little as 512MB of RAM. But lightweight does not mean unsuitable for production. With the right architecture and operational practices, K3s delivers enterprise-grade reliability for workloads of all sizes.

Kubo is a managed Kubernetes platform built on K3s, offering production-grade clusters from just ¥48,000/month (~$320/month). Many of the best practices outlined in this guide are automatically applied on Kubo, significantly reducing your infrastructure management burden.

Designing for High Availability

The most critical aspect of running K3s in production is high availability (HA). According to the K3s official documentation, an HA configuration requires a minimum of three server nodes, and the cluster must comprise an odd number of servers to maintain etcd quorum.

Choosing Your Datastore

K3s supports multiple datastore backends, each suited to different scenarios:

Embedded etcd (recommended): Self-contained, easiest to manage. Suitable for most production deployments
External PostgreSQL/MySQL: For large-scale clusters where you need to scale the datastore independently
Embedded SQLite: Single-node only. Not recommended for production

When using embedded etcd, ensure server nodes can communicate on ports 2379-2380. Review the complete K3s system requirements to verify all networking prerequisites.

Load Balancer Strategy

Place a load balancer in front of your server nodes, but remember that a single load balancer becomes a single point of failure. Deploy redundant load balancers using Keepalived, or leverage cloud load balancers with built-in high availability.

Minimum hardware requirements per the official documentation:

Server nodes: 2 CPU cores, 2GB RAM
Agent nodes: 1 CPU core, 512MB RAM
Storage: SSD recommended (NVMe preferred for etcd workloads)

Security Hardening

K3s ships with many security mitigations enabled by default, passing a number of CIS Kubernetes Benchmark controls out of the box. However, production environments require additional hardening.

Pod Security Standards

K3s v1.25+ supports Pod Security Admissions (PSA). Enable it with the --admission-control-config-file flag and enforce the restricted profile for production namespaces.

RBAC and Secrets Management

Design RBAC policies following the principle of least privilege
Encrypt Kubernetes Secrets at rest using the --secrets-encryption flag
Consider integrating external secret managers like HashiCorp Vault or cloud-native alternatives

Network Policies

K3s bundles a Network Policy controller by default. Implement Kubernetes Network Policies to restrict pod-to-pod communication to the minimum necessary.

Retrofitting security is always harder than building it in. Implement network policies, PSA, RBAC, and secrets management from Day 1.

With Kubo and Captain.AI, these security configurations are pre-applied at the platform level, letting you focus on your applications rather than infrastructure hardening.

Monitoring and Alerting

You cannot manage what you cannot see. Install comprehensive monitoring and alerting before issues become incidents.

Prometheus + Grafana Stack

The Prometheus and Grafana combination is the standard for K3s cluster monitoring:

bash

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace

Critical Metrics to Watch

Metric	Threshold	Action
Server CPU utilization	> 90%	Consider adding nodes
Memory utilization	> 80%	Review resource limits
etcd latency	> 100ms	Optimize disk io
Pod restart count	Increasing trend	Investigate OOM/CrashLoop

Refer to the K3s resource profiling documentation for guidance on appropriate resource allocation based on cluster size.

Log Aggregation

Use Fluentd or Fluent Bit to centralize logs into Elasticsearch or Grafana Loki. Note that K3s does not enable audit logging by default — enable it explicitly for production environments.

Backup and Disaster Recovery

Your ability to recover from failures defines the reliability of your production environment. Combine etcd snapshots with application-level backups for comprehensive protection.

etcd Snapshots

K3s provides built-in etcd snapshot capabilities:

bash

# Manual snapshot
k3s etcd-snapshot save --name pre-upgrade-$(date +%Y%m%d)

# Automatic snapshot configuration (server startup options)
# --etcd-snapshot-schedule-cron "0 */4 * * *"  # Every 4 hours
# --etcd-snapshot-retention 10                   # Keep 10 snapshots

Configure automatic snapshots every 4-6 hours and store them externally in S3-compatible object storage.

Application Backups with Velero

Use Velero to back up Kubernetes resources and persistent volumes. This is essential for protecting application data that etcd snapshots alone cannot cover.

Test Your Restores

The value of a backup is determined by the success rate of your restores. Regularly test your restore procedures to verify that your RTO (Recovery Time Objective) and RPO (Recovery Point Objective) meet requirements.

Resource Management and Upgrade Strategy

Resource Requests and Limits

Set appropriate requests and limits for every workload. Over-provisioning wastes resources; under-provisioning causes instability.

yaml

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Rolling Upgrades

Automate K3s upgrades using the system-upgrade-controller. For production environments, follow this process:

Test the new version in a staging environment
Take an etcd snapshot
Upgrade server nodes sequentially
Upgrade worker nodes sequentially
Verify application functionality

Storage Considerations

Use fast SSDs (preferably NVMe) for the K3s data directory at /var/lib/rancher/k3s. On ARM devices, avoid SD cards and eMMC storage — they cannot handle the io load required for stable etcd operation.

Production Readiness Checklist

K3s is lightweight yet fully capable of powering production workloads when properly configured. Use this checklist to verify your readiness:

HA configuration (3+ odd-number server nodes with embedded etcd)
Security hardening following the CIS benchmark guide
Comprehensive monitoring with Prometheus + Grafana
Automated etcd snapshots + Velero application backups
Resource requests/limits set for all workloads
Rolling upgrade procedures established and tested

Want to skip the operational complexity? Kubo provides managed K3s clusters from ¥48,000/month with HA, security, monitoring, and backups pre-configured. For AI workload orchestration, explore Captain.AI integration.

To learn more, visit Kubo or contact us.