[{"data":1,"prerenderedAt":770},["ShallowReactive",2],{"blog-en-k3s-production-best-practices":3,"blog-en-k3s-production-best-practices-alt":454},{"id":4,"title":5,"author":6,"body":7,"date":754,"description":755,"extension":756,"image":241,"locale":757,"meta":758,"navigation":454,"path":759,"seo":760,"stem":761,"tags":762,"__hash__":769},"blog\u002Fblog\u002Fen\u002Fk3s-production-best-practices.md","The Complete Guide to K3s Production Best Practices","Kubo Team",{"type":8,"value":9,"toc":727},"minimark",[10,21,29,34,43,48,51,74,83,87,90,101,112,116,125,129,147,151,172,176,191,197,210,214,217,221,235,297,301,368,377,381,396,400,403,407,410,474,477,481,490,494,500,504,508,518,605,609,618,636,640,647,651,654,697,711,723],[11,12,13,20],"p",{},[14,15,19],"a",{"href":16,"rel":17},"https:\u002F\u002Fk3s.io\u002F",[18],"nofollow","K3s"," has rapidly become the go-to lightweight Kubernetes distribution, packaging everything into a single binary under 70MB that runs on as little as 512MB of RAM. But lightweight does not mean unsuitable for production. With the right architecture and operational practices, K3s delivers enterprise-grade reliability for workloads of all sizes.",[11,22,23,28],{},[14,24,27],{"href":25,"rel":26},"https:\u002F\u002Fkubo.hexabase.io\u002F",[18],"Kubo"," is a managed Kubernetes platform built on K3s, offering production-grade clusters from just ¥48,000\u002Fmonth (~$320\u002Fmonth). Many of the best practices outlined in this guide are automatically applied on Kubo, significantly reducing your infrastructure management burden.",[30,31,33],"h2",{"id":32},"designing-for-high-availability","Designing for High Availability",[11,35,36,37,42],{},"The most critical aspect of running K3s in production is high availability (HA). According to the ",[14,38,41],{"href":39,"rel":40},"https:\u002F\u002Fdocs.k3s.io\u002Fdatastore\u002Fha-embedded",[18],"K3s official documentation",", an HA configuration requires a minimum of three server nodes, and the cluster must comprise an odd number of servers to maintain etcd quorum.",[44,45,47],"h3",{"id":46},"choosing-your-datastore","Choosing Your Datastore",[11,49,50],{},"K3s supports multiple datastore backends, each suited to different scenarios:",[52,53,54,62,68],"ul",{},[55,56,57,61],"li",{},[58,59,60],"strong",{},"Embedded etcd (recommended)",": Self-contained, easiest to manage. Suitable for most production deployments",[55,63,64,67],{},[58,65,66],{},"External PostgreSQL\u002FMySQL",": For large-scale clusters where you need to scale the datastore independently",[55,69,70,73],{},[58,71,72],{},"Embedded SQLite",": Single-node only. Not recommended for production",[11,75,76,77,82],{},"When using embedded etcd, ensure server nodes can communicate on ports 2379-2380. Review the complete ",[14,78,81],{"href":79,"rel":80},"https:\u002F\u002Fdocs.k3s.io\u002Finstallation\u002Frequirements",[18],"K3s system requirements"," to verify all networking prerequisites.",[44,84,86],{"id":85},"load-balancer-strategy","Load Balancer Strategy",[11,88,89],{},"Place a load balancer in front of your server nodes, but remember that a single load balancer becomes a single point of failure. Deploy redundant load balancers using Keepalived, or leverage cloud load balancers with built-in high availability.",[11,91,92,95,96,100],{},[58,93,94],{},"Minimum hardware requirements"," per the ",[14,97,99],{"href":79,"rel":98},[18],"official documentation",":",[52,102,103,106,109],{},[55,104,105],{},"Server nodes: 2 CPU cores, 2GB RAM",[55,107,108],{},"Agent nodes: 1 CPU core, 512MB RAM",[55,110,111],{},"Storage: SSD recommended (NVMe preferred for etcd workloads)",[30,113,115],{"id":114},"security-hardening","Security Hardening",[11,117,118,119,124],{},"K3s ships with many security mitigations enabled by default, passing a number of ",[14,120,123],{"href":121,"rel":122},"https:\u002F\u002Fdocs.k3s.io\u002Fsecurity\u002Fhardening-guide",[18],"CIS Kubernetes Benchmark"," controls out of the box. However, production environments require additional hardening.",[44,126,128],{"id":127},"pod-security-standards","Pod Security Standards",[11,130,131,132,137,138,142,143,146],{},"K3s v1.25+ supports ",[14,133,136],{"href":134,"rel":135},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Fconcepts\u002Fsecurity\u002Fpod-security-admission\u002F",[18],"Pod Security Admissions (PSA)",". Enable it with the ",[139,140,141],"code",{},"--admission-control-config-file"," flag and enforce the ",[139,144,145],{},"restricted"," profile for production namespaces.",[44,148,150],{"id":149},"rbac-and-secrets-management","RBAC and Secrets Management",[52,152,153,156,163],{},[55,154,155],{},"Design RBAC policies following the principle of least privilege",[55,157,158,159,162],{},"Encrypt Kubernetes Secrets at rest using the ",[139,160,161],{},"--secrets-encryption"," flag",[55,164,165,166,171],{},"Consider integrating external secret managers like ",[14,167,170],{"href":168,"rel":169},"https:\u002F\u002Fwww.vaultproject.io\u002F",[18],"HashiCorp Vault"," or cloud-native alternatives",[44,173,175],{"id":174},"network-policies","Network Policies",[11,177,178,179,184,185,190],{},"K3s bundles a ",[14,180,183],{"href":181,"rel":182},"https:\u002F\u002Fdocs.k3s.io\u002Fsecurity",[18],"Network Policy controller"," by default. Implement ",[14,186,189],{"href":187,"rel":188},"https:\u002F\u002Fkubernetes.io\u002Fdocs\u002Fconcepts\u002Fservices-networking\u002Fnetwork-policies\u002F",[18],"Kubernetes Network Policies"," to restrict pod-to-pod communication to the minimum necessary.",[192,193,194],"blockquote",{},[11,195,196],{},"Retrofitting security is always harder than building it in. Implement network policies, PSA, RBAC, and secrets management from Day 1.",[11,198,199,200,203,204,209],{},"With ",[14,201,27],{"href":25,"rel":202},[18]," and ",[14,205,208],{"href":206,"rel":207},"https:\u002F\u002Fwww.hexabase.com\u002Fproduct\u002Fcaptain-ai\u002F",[18],"Captain.AI",", these security configurations are pre-applied at the platform level, letting you focus on your applications rather than infrastructure hardening.",[30,211,213],{"id":212},"monitoring-and-alerting","Monitoring and Alerting",[11,215,216],{},"You cannot manage what you cannot see. Install comprehensive monitoring and alerting before issues become incidents.",[44,218,220],{"id":219},"prometheus-grafana-stack","Prometheus + Grafana Stack",[11,222,223,224,203,229,234],{},"The ",[14,225,228],{"href":226,"rel":227},"https:\u002F\u002Fprometheus.io\u002F",[18],"Prometheus",[14,230,233],{"href":231,"rel":232},"https:\u002F\u002Fgrafana.com\u002F",[18],"Grafana"," combination is the standard for K3s cluster monitoring:",[236,237,242],"pre",{"className":238,"code":239,"language":240,"meta":241,"style":241},"language-bash shiki shiki-themes tokyo-night","helm repo add prometheus-community https:\u002F\u002Fprometheus-community.github.io\u002Fhelm-charts\nhelm install kube-prometheus-stack prometheus-community\u002Fkube-prometheus-stack \\\n  --namespace monitoring --create-namespace\n","bash","",[139,243,244,266,284],{"__ignoreMap":241},[245,246,249,253,257,260,263],"span",{"class":247,"line":248},"line",1,[245,250,252],{"class":251},"sE3pS","helm",[245,254,256],{"class":255},"sPY7s"," repo",[245,258,259],{"class":255}," add",[245,261,262],{"class":255}," prometheus-community",[245,264,265],{"class":255}," https:\u002F\u002Fprometheus-community.github.io\u002Fhelm-charts\n",[245,267,269,271,274,277,280],{"class":247,"line":268},2,[245,270,252],{"class":251},[245,272,273],{"class":255}," install",[245,275,276],{"class":255}," kube-prometheus-stack",[245,278,279],{"class":255}," prometheus-community\u002Fkube-prometheus-stack",[245,281,283],{"class":282},"sAklC"," \\\n",[245,285,287,291,294],{"class":247,"line":286},3,[245,288,290],{"class":289},"sT800","  --namespace",[245,292,293],{"class":255}," monitoring",[245,295,296],{"class":289}," --create-namespace\n",[44,298,300],{"id":299},"critical-metrics-to-watch","Critical Metrics to Watch",[302,303,304,320],"table",{},[305,306,307],"thead",{},[308,309,310,314,317],"tr",{},[311,312,313],"th",{},"Metric",[311,315,316],{},"Threshold",[311,318,319],{},"Action",[321,322,323,335,346,357],"tbody",{},[308,324,325,329,332],{},[326,327,328],"td",{},"Server CPU utilization",[326,330,331],{},"> 90%",[326,333,334],{},"Consider adding nodes",[308,336,337,340,343],{},[326,338,339],{},"Memory utilization",[326,341,342],{},"> 80%",[326,344,345],{},"Review resource limits",[308,347,348,351,354],{},[326,349,350],{},"etcd latency",[326,352,353],{},"> 100ms",[326,355,356],{},"Optimize disk io",[308,358,359,362,365],{},[326,360,361],{},"Pod restart count",[326,363,364],{},"Increasing trend",[326,366,367],{},"Investigate OOM\u002FCrashLoop",[11,369,370,371,376],{},"Refer to the ",[14,372,375],{"href":373,"rel":374},"https:\u002F\u002Fdocs.k3s.io\u002Freference\u002Fresource-profiling",[18],"K3s resource profiling"," documentation for guidance on appropriate resource allocation based on cluster size.",[44,378,380],{"id":379},"log-aggregation","Log Aggregation",[11,382,383,384,389,390,395],{},"Use Fluentd or Fluent Bit to centralize logs into ",[14,385,388],{"href":386,"rel":387},"https:\u002F\u002Fwww.elastic.co\u002F",[18],"Elasticsearch"," or ",[14,391,394],{"href":392,"rel":393},"https:\u002F\u002Fgrafana.com\u002Foss\u002Floki\u002F",[18],"Grafana Loki",". Note that K3s does not enable audit logging by default — enable it explicitly for production environments.",[30,397,399],{"id":398},"backup-and-disaster-recovery","Backup and Disaster Recovery",[11,401,402],{},"Your ability to recover from failures defines the reliability of your production environment. Combine etcd snapshots with application-level backups for comprehensive protection.",[44,404,406],{"id":405},"etcd-snapshots","etcd Snapshots",[11,408,409],{},"K3s provides built-in etcd snapshot capabilities:",[236,411,413],{"className":238,"code":412,"language":240,"meta":241,"style":241},"# Manual snapshot\nk3s etcd-snapshot save --name pre-upgrade-$(date +%Y%m%d)\n\n# Automatic snapshot configuration (server startup options)\n# --etcd-snapshot-schedule-cron \"0 *\u002F4 * * *\"  # Every 4 hours\n# --etcd-snapshot-retention 10                   # Keep 10 snapshots\n",[139,414,415,421,450,456,462,468],{"__ignoreMap":241},[245,416,417],{"class":247,"line":248},[245,418,420],{"class":419},"sbD-w","# Manual snapshot\n",[245,422,423,426,429,432,435,438,441,444,447],{"class":247,"line":268},[245,424,425],{"class":251},"k3s",[245,427,428],{"class":255}," etcd-snapshot",[245,430,431],{"class":255}," save",[245,433,434],{"class":289}," --name",[245,436,437],{"class":255}," pre-upgrade-",[245,439,440],{"class":282},"$(",[245,442,443],{"class":251},"date",[245,445,446],{"class":255}," +%Y%m%d",[245,448,449],{"class":282},")\n",[245,451,452],{"class":247,"line":286},[245,453,455],{"emptyLinePlaceholder":454},true,"\n",[245,457,459],{"class":247,"line":458},4,[245,460,461],{"class":419},"# Automatic snapshot configuration (server startup options)\n",[245,463,465],{"class":247,"line":464},5,[245,466,467],{"class":419},"# --etcd-snapshot-schedule-cron \"0 *\u002F4 * * *\"  # Every 4 hours\n",[245,469,471],{"class":247,"line":470},6,[245,472,473],{"class":419},"# --etcd-snapshot-retention 10                   # Keep 10 snapshots\n",[11,475,476],{},"Configure automatic snapshots every 4-6 hours and store them externally in S3-compatible object storage.",[44,478,480],{"id":479},"application-backups-with-velero","Application Backups with Velero",[11,482,483,484,489],{},"Use ",[14,485,488],{"href":486,"rel":487},"https:\u002F\u002Fvelero.io\u002F",[18],"Velero"," to back up Kubernetes resources and persistent volumes. This is essential for protecting application data that etcd snapshots alone cannot cover.",[44,491,493],{"id":492},"test-your-restores","Test Your Restores",[11,495,496,499],{},[58,497,498],{},"The value of a backup is determined by the success rate of your restores."," Regularly test your restore procedures to verify that your RTO (Recovery Time Objective) and RPO (Recovery Point Objective) meet requirements.",[30,501,503],{"id":502},"resource-management-and-upgrade-strategy","Resource Management and Upgrade Strategy",[44,505,507],{"id":506},"resource-requests-and-limits","Resource Requests and Limits",[11,509,510,511,203,514,517],{},"Set appropriate ",[139,512,513],{},"requests",[139,515,516],{},"limits"," for every workload. Over-provisioning wastes resources; under-provisioning causes instability.",[236,519,523],{"className":520,"code":521,"language":522,"meta":241,"style":241},"language-yaml shiki shiki-themes tokyo-night","resources:\n  requests:\n    cpu: \"250m\"\n    memory: \"256Mi\"\n  limits:\n    cpu: \"500m\"\n    memory: \"512Mi\"\n","yaml",[139,524,525,534,541,557,571,578,591],{"__ignoreMap":241},[245,526,527,531],{"class":247,"line":248},[245,528,530],{"class":529},"s0U2E","resources",[245,532,533],{"class":282},":\n",[245,535,536,539],{"class":247,"line":268},[245,537,538],{"class":529},"  requests",[245,540,533],{"class":282},[245,542,543,546,548,551,554],{"class":247,"line":286},[245,544,545],{"class":529},"    cpu",[245,547,100],{"class":282},[245,549,550],{"class":282}," \"",[245,552,553],{"class":255},"250m",[245,555,556],{"class":282},"\"\n",[245,558,559,562,564,566,569],{"class":247,"line":458},[245,560,561],{"class":529},"    memory",[245,563,100],{"class":282},[245,565,550],{"class":282},[245,567,568],{"class":255},"256Mi",[245,570,556],{"class":282},[245,572,573,576],{"class":247,"line":464},[245,574,575],{"class":529},"  limits",[245,577,533],{"class":282},[245,579,580,582,584,586,589],{"class":247,"line":470},[245,581,545],{"class":529},[245,583,100],{"class":282},[245,585,550],{"class":282},[245,587,588],{"class":255},"500m",[245,590,556],{"class":282},[245,592,594,596,598,600,603],{"class":247,"line":593},7,[245,595,561],{"class":529},[245,597,100],{"class":282},[245,599,550],{"class":282},[245,601,602],{"class":255},"512Mi",[245,604,556],{"class":282},[44,606,608],{"id":607},"rolling-upgrades","Rolling Upgrades",[11,610,611,612,617],{},"Automate K3s upgrades using the ",[14,613,616],{"href":614,"rel":615},"https:\u002F\u002Fgithub.com\u002Francher\u002Fsystem-upgrade-controller",[18],"system-upgrade-controller",". For production environments, follow this process:",[619,620,621,624,627,630,633],"ol",{},[55,622,623],{},"Test the new version in a staging environment",[55,625,626],{},"Take an etcd snapshot",[55,628,629],{},"Upgrade server nodes sequentially",[55,631,632],{},"Upgrade worker nodes sequentially",[55,634,635],{},"Verify application functionality",[44,637,639],{"id":638},"storage-considerations","Storage Considerations",[11,641,642,643,646],{},"Use fast SSDs (preferably NVMe) for the K3s data directory at ",[139,644,645],{},"\u002Fvar\u002Flib\u002Francher\u002Fk3s",". On ARM devices, avoid SD cards and eMMC storage — they cannot handle the io load required for stable etcd operation.",[30,648,650],{"id":649},"production-readiness-checklist","Production Readiness Checklist",[11,652,653],{},"K3s is lightweight yet fully capable of powering production workloads when properly configured. Use this checklist to verify your readiness:",[52,655,658,667,673,679,685,691],{"className":656},[657],"contains-task-list",[55,659,662,666],{"className":660},[661],"task-list-item",[663,664],"input",{"disabled":454,"type":665},"checkbox"," HA configuration (3+ odd-number server nodes with embedded etcd)",[55,668,670,672],{"className":669},[661],[663,671],{"disabled":454,"type":665}," Security hardening following the CIS benchmark guide",[55,674,676,678],{"className":675},[661],[663,677],{"disabled":454,"type":665}," Comprehensive monitoring with Prometheus + Grafana",[55,680,682,684],{"className":681},[661],[663,683],{"disabled":454,"type":665}," Automated etcd snapshots + Velero application backups",[55,686,688,690],{"className":687},[661],[663,689],{"disabled":454,"type":665}," Resource requests\u002Flimits set for all workloads",[55,692,694,696],{"className":693},[661],[663,695],{"disabled":454,"type":665}," Rolling upgrade procedures established and tested",[11,698,699,702,703,706,707,710],{},[58,700,701],{},"Want to skip the operational complexity?"," ",[14,704,27],{"href":25,"rel":705},[18]," provides managed K3s clusters from ¥48,000\u002Fmonth with HA, security, monitoring, and backups pre-configured. For AI workload orchestration, explore ",[14,708,208],{"href":206,"rel":709},[18]," integration.",[11,712,713,714,389,717,722],{},"To learn more, visit ",[14,715,27],{"href":25,"rel":716},[18],[14,718,721],{"href":719,"rel":720},"https:\u002F\u002Fwww.hexabase.com\u002Fcontact-us\u002F",[18],"contact us",".",[724,725,726],"style",{},"html pre.shiki code .sE3pS, html code.shiki .sE3pS{--shiki-default:#C0CAF5}html pre.shiki code .sPY7s, html code.shiki .sPY7s{--shiki-default:#9ECE6A}html pre.shiki code .sAklC, html code.shiki .sAklC{--shiki-default:#89DDFF}html pre.shiki code .sT800, html code.shiki .sT800{--shiki-default:#E0AF68}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .sbD-w, html code.shiki .sbD-w{--shiki-default:#51597D;--shiki-default-font-style:italic}html pre.shiki code .s0U2E, html code.shiki .s0U2E{--shiki-default:#F7768E}",{"title":241,"searchDepth":268,"depth":268,"links":728},[729,733,738,743,748,753],{"id":32,"depth":268,"text":33,"children":730},[731,732],{"id":46,"depth":286,"text":47},{"id":85,"depth":286,"text":86},{"id":114,"depth":268,"text":115,"children":734},[735,736,737],{"id":127,"depth":286,"text":128},{"id":149,"depth":286,"text":150},{"id":174,"depth":286,"text":175},{"id":212,"depth":268,"text":213,"children":739},[740,741,742],{"id":219,"depth":286,"text":220},{"id":299,"depth":286,"text":300},{"id":379,"depth":286,"text":380},{"id":398,"depth":268,"text":399,"children":744},[745,746,747],{"id":405,"depth":286,"text":406},{"id":479,"depth":286,"text":480},{"id":492,"depth":286,"text":493},{"id":502,"depth":268,"text":503,"children":749},[750,751,752],{"id":506,"depth":286,"text":507},{"id":607,"depth":286,"text":608},{"id":638,"depth":286,"text":639},{"id":649,"depth":268,"text":650},"2026-05-27","Learn how to run K3s in production with confidence. Covers HA architecture, security hardening, monitoring, backup strategies, and resource management.","md","en",{},"\u002Fblog\u002Fen\u002Fk3s-production-best-practices",{"title":5,"description":755},"blog\u002Fen\u002Fk3s-production-best-practices",[19,763,764,765,766,767,768],"Kubernetes","Production","High Availability","Security","Operations","Best Practices","tdqVsTmVT3lxnchlHPFknFg80rX9ThHkHU04O6RicOI",1780391431960]