Advanced Kubernetes lessons from the field

Kubernetes tips from the experienced team at Sela Cloud to keep your apps stable and up to date.

Shahar Blank, Shir Fogel, Natali Cutic, Noam Amrani (Sela)

Considerations

Plan your CIDRs

In the context of Kubernetes cluster setup, particularly when considering CIDR (Classless Inter-Domain Routing) selection, it's crucial to understand the implications of different network models for pods. Among these is 'Native Cloud Provider Networking,' a term that refers to the approach of integrating pods directly with the cloud provider’s networking infrastructure.

Native Cloud Provider Networking involves allocating a segment of the cloud provider’s subnet specifically for the use of Kubernetes pods. This method streamlines networking by minimizing additional layers of network abstraction, potentially leading to enhanced performance and efficiency. However, this requires precise planning to ensure the CIDR range chosen is ample enough to accommodate the desired number of pods, directly influencing the scalability and capacity of your Kubernetes cluster.

This approach contrasts with other networking models, such as overlay networks, which add a layer of network abstraction over the existing infrastructure. While overlay networks offer flexibility and ease of management, they might introduce additional complexity and performance overhead.

The choice of Native Cloud Provider Networking underscores the importance of CIDR planning in Kubernetes clusters. It’s not just about ensuring sufficient IP address allocation but also about determining the limits on the number of pods that can be effectively supported within the cluster.

In summary, opting for Native Cloud Provider Networking as your Kubernetes networking solution necessitates a strategic approach to CIDR selection. It impacts not only the network functionality but also the scalability and the operational capacity of your Kubernetes cluster in a cloud environment.

Manage secrets securely

Securely Manage Secrets, not in Source Code. In Kubernetes deployments, whether managed or unmanaged, handling secrets securely is crucial for maintaining strong security. Instead of embedding secrets in your source code, use more secure methods. Cloud-based Kubernetes environments can leverage their provider's integrated secrets management tools, such as AWS Secrets Manager or Azure Key Vault, in combination with Kubernetes' own Secrets mechanism. For clusters running on-premises, you might opt for robust third-party solutions like HashiCorp Vault or integrate the clusters with your existing secrets management infrastructure. Additionally, implementing automated secrets rotation and strict access policies can significantly bolster your security posture.

Implement RBAC

Role-Based Access Control (RBAC) is a fundamental security practice in Kubernetes, regardless of the environment – be it cloud-based or on-premises. In managed Kubernetes services, cloud providers typically offer seamless integrations with their own identity and access management systems, which aids in the straightforward implementation of RBAC policies. For those operating Kubernetes in on-premises environments, integrating RBAC with existing LDAP (Lightweight Directory Access Protocol) or Active Directory systems is often the go-to approach. It's also important to regularly review and update your RBAC policies to keep them aligned with any changes in your team structure or the nature of the workloads you're managing.

Choose your CNI and network policies

A Container Network Interface (CNI) is a set of standards and libraries that allows containerized applications, like those orchestrated by Kubernetes, to connect to the underlying network. It's responsible for setting up and managing network interfaces for containers, enabling them to communicate with each other and other network resources. Choosing the right CNI is crucial for Kubernetes deployments as it directly affects network performance, security, and scalability.

Types of CNI for Major Cloud Providers

Amazon Elastic Kubernetes Service (EKS)

AWS CNI

Description: This is the default CNI for EKS. It integrates seamlessly with AWS's networking infrastructure, allowing pods to have an IP address from the VPC.

Use Case: Ideal for seamless integration with AWS services and for those requiring each pod to have a distinct IP address within the VPC.

Calico

Description: An open-source CNI option that can be used alongside AWS CNI for network policy enforcement.

Use Case: Best when additional network policies and security are needed, like DNS based rules

Azure Kubernetes Service (AKS)

Kubenet

Use Case: Suited for simpler, smaller deployments where ease of setup is a priority.

Azure CNI

Use Case: Ideal for larger, complex deployments requiring advanced networking features and efficient IP management.

Google Kubernetes Engine (GKE)

Google Cloud CNI

Description: Tailored for GKE, allowing pods to have an IP address from the Google Cloud network.

Use Case: Best for native integration with Google Cloud services and advanced networking needs.

Calico

Use Case: When additional network policy enforcement is required beyond what is offered by Google Cloud CNI.

Cilium

Cilium is an open-source project that provides and secures network connectivity for container-based applications. What sets Cilium apart is its use of eBPF (extended Berkeley Packet Filter), a powerful technology deeply integrated into the Linux kernel

Cilium is supported by all three major Kubernetes services - EKS, AKS, and GKE, allowing users to utilize its advanced networking and security features in various cloud environments

eBPF Technology

eBPF is a revolutionary technology that allows for dynamic code execution in the Linux kernel without changing kernel source code or loading kernel modules. This leads to significant performance improvements and flexibility. eBPF enables Cilium to efficiently handle networking, security, and observability logic directly in the kernel, which is faster than traditional approaches

When to Choose Which CNI

For Simplicity and Smaller Deployments: Choose Kubenet in AKS or the default CNIs in EKS and GKE if you need a straightforward setup with minimal configuration.

For Advanced Networking and Integration with Cloud Services: Opt for Azure CNI in AKS, AWS CNI in EKS, or Google Cloud CNI in GKE. These provide better network performance, efficient IP management, and tighter integration with each cloud provider’s infrastructure.

For Enhanced Security and Network Policy Enforcement: Calico is a good choice across EKS, AKS, and GKE. It’s particularly useful in scenarios where fine-grained network policies and additional security measures are necessary.

For enhanced performance, scalability and visibility, utilizing cilium eBPF can be a great gain.

Selecting a CNI should be based on your specific requirements, including network complexity, security needs, integration with cloud resources, and scalability demands.

Keep it Stateless

While stateless applications offer simplicity and ease of maintenance, stateful applications are also viable and can be effectively managed, albeit with a higher complexity. Stateful applications, which maintain persistent data and session information across transactions, are essential in certain scenarios but require more intricate management strategies, especially in cloud environments and on-premises clusters.

The complexity in maintaining stateful applications arises from the need to manage the state – the persistent information that must be maintained and accessible across various transactions. This contrasts with stateless architectures, where each transaction is independent of the others. In cloud settings, managing stateful applications demands careful planning to leverage the scalability and resilience of the cloud effectively. For on-premises clusters, the challenges include ensuring consistent performance and data integrity across more limited resources.

Kubernetes, a robust orchestration tool for containerized applications, provides advanced features to handle both stateless and stateful applications. Using Kubernetes constructs like StatefulSets, alongside Deployments and ReplicaSets, administrators can manage the complexities of stateful applications. These tools help in maintaining data consistency, managing deployments, and scaling applications, although they present a steeper learning curve compared to managing stateless applications. In conclusion, while stateful applications are integral to many systems and can be efficiently managed, especially with tools like Kubernetes, they inherently demand a more complex maintenance approach compared to stateless applications.

Maintenance

A Kubernetes cluster especially for a production environment has a lot of requirements. For example, the applications and services should communicate between nodes, should be accessible to outside users, there should be an option to monitor the applications to handle crashes, and more. These are just examples of the requirements that Kubernetes in a production environment should meet, but the components that come built-in with Kubernetes do not satisfy them. For that we have Cloud Addons and Third Party Tools.

Cloud Add-Ons

Description: Cloud Addons are extensions that inject the missing capabilities into our services in our cloud provider. Each cloud provider has his own cloud addons.
The cloud providers guarantee updates, security, and seamless integration with other cluster components.

Types of Cloud Addons

Amazon Elastic Kubernetes Service (EKS)

In the AWS ecosystem, managed addons are the magic touch that eases your Kubernetes journey. Take, for instance, the AMAZON VPC CNI for interfacing with AWS resources, CORE DNS for Kubernetes DNS, KUBE PROXY facilitating pod communication, and the EBS CSI DRIVER ensuring Amazon EBS storage compatibility. A recent gem, the MOUNTPOINT FOR AMAZON S3 CSI DRIVER, lets you seamlessly load S3 files into pods.

Beyond these AWS stalwarts, there's a universe of independent addons like Data Dog, Grafana Labs, and more.

Azure Kubernetes Service (AKS)

Azure's AKS follows the same principles but dances with slightly different masters such as KEDA, OPEN SERVICE MESH, and more.

Third Party applications

Description: Third-Party tools are also external applications that we install on our cluster to add capabilities that it lacks. In Kubernetes, it is common to install third Party tools using HelmCharts.

So here are some examples of first party tools that we usually install in our Kubernetes cluster:

Monitoring and Logging tools

When it comes to monitoring, Prometheus takes center stage.For logs, choices abound - Fluent Bit, Fluent D, Loki, and more. Visualizing the data with Grafana (which easily interfaces with Prometheus) and Kibana ( if we use ElasticSearch).

Gateway to the Outside World: Load Balancers and More

Install ALB for AWS environments or any flavor of INGRESS NGINX for universal appeal. Choices are yours, options are aplenty.

Tools for Scaling and Deployment

Scale your cluster effortlessly with Keda and Karpenter(which is an AWS Cluster Autoscaler). Argocd steps in for smooth Continuous Delivery, utilizing Helm Charts to deploy applications into the cluster.

Here's the beauty – the selection of Third-Party Tools is yours to make. Your requirements, your use case, your cluster – the tools you choose define your Kubernetes journey.

Monthly Maintenance

In the dynamic world of Kubernetes and third-party tools, keeping your Helm Charts up-to-date is the key to a seamless journey. Helm Charts, the backbone of tool installations, release new versions periodically, bringing bug fixes and innovations to your applications. The task at hand? Regularly updating those versions.

In addition, if we use built in terraform modules, we have to remember to update the version of the module every period for the same reason of fixing bugs and improving the module.

Monthly Tasks for Stability

Here's the recommendation: make version updates a monthly ritual. Set aside time to update Terraform modules, Helm Chart versions, and keep your environment in sync. It's not just about staying current; it's about safeguarding against disruptions and unlocking the full potential of your Kubernetes experience.

Scheduling

Requests & Limits

Definition: Requests and limits in Kubernetes refer to the allocation of resources (CPU and memory) for containers running within pods. Requests specify the amount of resources a container needs to run, while limits define the maximum amount a container is allowed to use.

Advantage: By setting these parameters, Kubernetes can make intelligent decisions about resource allocation. It ensures that your application scales efficiently, maintaining performance without wasting resources.

Considerations:

Pods with high resource requests might be harder to schedule in the cluster if nodes are running close to capacity. Pods requesting more resources than a node can provide will remain unscheduled.

Ensure that your cluster has nodes with varying resource capacities to handle different types of workloads.

Overcommitting (having total requests or limits across all pods exceeding node capacity) can be strategic but risky. It can lead to resource contention and degraded performance.

Liveness & Readiness

Definition: Liveness and readiness probes are mechanisms used to determine the health and availability of a container within a pod. Liveness probes check if the container is running, while readiness probes verify if the container is ready to start accepting traffic. This is expressed by HTTP Request/TCP Connection/ Exec Commands.

Advantage: Liveness probes ensure that Kubernetes can automatically restart containers that are not functioning correctly, enhancing application reliability. Readiness probes help in controlled traffic distribution by signaling when a container is ready to serve requests, preventing premature traffic exposure to potentially unstable services.

Considerations:

Ensure that probes accurately reflect the state of your application to avoid unnecessary restarts (for liveness) or traffic routing issues (for readiness).
Use readiness probes to check dependencies (like a database or another service) that are crucial for your application to function correctly.

HPA (Horizontal Pod Autoscaler) & Keda (Kubernetes event driven autoscaler)

Definition: HPA is a Kubernetes feature that automatically adjusts the number of replica pods in a deployment based on CPU utilization or custom metrics. KEDA extends this capability by enabling autoscaling based on various event sources (e.g. SQS, Kafka, etc.).

Advantage: HPA and KEDA enable dynamic scaling of applications in response to varying workloads, ensuring optimal resource utilization and maintaining application performance and responsiveness without manual intervention. This scalability is crucial for handling traffic spikes or fluctuations efficiently.

Considerations:

HPA:

Set suitable thresholds for scaling up and down. Fine-tuning these values is critical to prevent frequent scale-in and scale-out, which could lead to instability.
Define minimum and maximum pod counts to control the scaling range and prevent over-scaling, which could lead to resource exhaustion.

KEDA:

One of KEDA’s key features is scaling down to zero pods when there are no events to process. Ensure your application can handle this gracefully.

DaemonSet

Definition: A DaemonSet ensures that a specific pod runs on each node in a Kubernetes cluster. It's used to deploy system-level daemons or agents that need to be present on every node, such as monitoring agents, log collectors, or networking components.

Advantage: DaemonSets ensure certain workloads are present on every node, guaranteeing that essential services or functionalities are available across the entire cluster. They are valuable for tasks that require node-specific operations or for ensuring specific configurations on each node uniformly.

Considerations:

Not everything needs to be a DaemonSet, every such pod on a node eventually takes up space on the machine. Note the purpose of the application and whether it really needs to be on every node.

Useful Tools for Kubernetes and YAML Management

K8s Tools

K9S

K9S is an interactive terminal-based UI tool for Kubernetes. It provides a powerful interface to interact with your Kubernetes clusters, making it easier to navigate, observe, and manage your deployments. K9S offers real-time views into your Kubernetes resources, making it a must-have tool for Kubernetes administrators.

OpenLens

OpenLens is a robust, open-source IDE compatible with Kubernetes. It's designed to provide a user-friendly, graphical interface for cluster management. With OpenLens, users can inspect cluster states, view logs, manage resources, and troubleshoot issues more effectively.

kubectl

kubectl is the command-line tool for Kubernetes, allowing users to run commands against Kubernetes clusters. It's used for deploying applications, inspecting and managing cluster resources, and viewing logs. kubectl is an essential tool for anyone working with Kubernetes, offering a wide range of functionalities.

Kubectl Plugins

Kubectl Plugins extend the capabilities of kubectl, providing additional features and commands. These plugins are community-driven and help in customizing the kubectl experience to suit specific needs. They can be very useful for simplifying complex kubectl commands.

Examples

This GitHub repository, natalicot/kubectl-tips-and-tricks, is a treasure trove of tips, tricks, and scripts for kubectl. It's an invaluable resource for Kubernetes users looking to enhance their kubectl skills and streamline their workflows.

YAML Tools

Helm Chart

Helm Chart is a package manager for Kubernetes, which simplifies the deployment of applications. Helm Charts help in defining, installing, and upgrading even the most complex Kubernetes applications, making them an essential tool for Kubernetes application management.

Kustomize

Kustomize is a standalone tool to customize Kubernetes objects through a YAML file. It introduces a template-free way to customize application configuration, which simplifies the process of managing variations of Kubernetes applications.

Manifests

In Kubernetes, Manifests are YAML files that describe how to deploy and manage applications in the cluster. They are crucial for defining the desired states of your Kubernetes resources.

Examples

The argocd-demo/k8s_manifests repository on GitHub provides a collection of Kubernetes manifests of a kustomized project and a helm chart structure installed by Argo. So you view the different approaches.