“Deep Dive” – Kubernetes the Hard Way on AWS

Konstantin Troshin |
21. Oktober 2020 |

Dieser Blogbeitrag ist nur in englischer Sprache verfügbar. | This blog post is only available in English.

During my preparation for the CKA exam, I have decided to run through Kelsey Hightower’s https://github.com/kelseyhightower/kubernetes-the-hard-way. To make the educational potential even higher as well as because I like AWS, I also decided to use AWS instead of GCP and aws-cni instead of the bridge used by Kelsey.

The adventure started with provisioning of the infrastructure for which I re-used Terraform modules that I developed for other projects. I provisioned a VPC with the same IP ranges as Kelsey uses as well as 6 t3.medium instances (2 autoscaling groups with 3 instances each) running Ubuntu 18.04 that seemed comparable to the ones Kelsey uses on GCP. I also opted for a network load balancer to route the traffic between the control plane nodes. Instead of opening the port 22 for access via SSH, I opted to use Systems Manager which is a better and more secure option on AWS. After installing systems manager plugin and configuring openSSH to use it as well as installing the SSM agent on the cluster VMs (via user data) and adding required permissions to a policy attached to the roles corresponding to the instance profiles of the instances, the instances can be accessed with

ssh -i rsa/k8s.pem -p 5223 -l ubuntu ${INSTANCE_ID}

Where k8s.pem is the private key that matches the public key used for the instances by terraform and ${INSTANCE_ID} is the AWS ID of the instance. To improve automation, I used tags placed on the instances to derive the instance IDs via AWS cli.

The next task was to adapt Kelsey’s code to generate certificates and configs for the cluster. To improve automation; I used tags k8smaster and k8snode placed on the instances by terraform (so that the script always gets correct private and public IP addresses despite them being assigned randomly during a new terraform deployment). To support the enterprise networks (in which almost all non-standard ports are blocked), the NLB was configured to listen to port 443 and forward the requests to the port 6443 on the master nodes. Thus, the slight changes in configurations compared to the original ones from Kelsey.

github: init.sh

After generation of the certificates and configs, I used a bash script to distribute them to the corresponding instances by using the tags again.

github: dist.sh

Next step was to start the control plane components on the master nodes. To access the nodes, I used a modified version of a script I already had that can execute payload scripts on the target machines. It also can substitute the variables that are used in ${VAR} format with the corresponding values from the local environment which allowed to provide the values derived via AWS cli locally (such as IP addresses of all three master VMs) to the target machines. This worked well and after a quick googling for a fix in the config of the kube-apiserver (at the time of my experiments, Kelsey’s repo used k8s 1.15 which supported --runtime-config=api, whereas k8s 1.18 needs --runtime-config=api/all=true ) the control plane VMs were ready.

github: executePayloadMasters.sh

After registering of the instances with the target group of the NLB by using AWS cli, the control plane was ready.

github: nlb.sh

Since I wanted to use aws-cni that involves configuration via kubectl, I switched to configuring remote access via kubectl instead of going for the worker nodes first.


Configuring the worker nodes was a bit more finicky. I used the same ideas to execute payloads on the worker nodes but found that nodes were permanently not ready at first. To debug this, I connected to one of the nodes and tried to get the logs from the pods (kubectl could not do that at that point). CRI-containerd was a new thing to me, so first I tried to get logs using ctr (which is the client for containerd) but it showed me 0 containers running. I then looked into Kelsey’s code more closely and found that he also downloads crictl (albeit he never uses it in his examples). After a quick google search, I configured it properly and was able to get logs with

sudo crictl logs ${CONTAINER_ID}

github: crictl.sh

The logs pointed me to my mistake: during the replacement of the hard-coded IP addresses in Kelsey’s code with mine derived by AWS cli, I also removed from the hosts in the kube-apiserver certificate which was highly disliked by the aws-node containers that tried to access it via this IP. With this, I also learned that the pods running on the worker nodes access the api server via the kubernetes service, which gets the first IP of the --service-cluster-ip-range CIDR. After a quick fix of the certificates, this issue was resolved and the nodes became ready. Also, it seems that starting one of the daemons somehow kicks the script out to the shell, so that if ssh -tt is used, the final exit does not work anymore. To avoid this issue, I used the strategy of starting the script in the background and then re-connecting to the instance and following its progress. This fixed the hanging of my executePayload script and allowed for the smooth deployment across the worker nodes.

github: executePayloadWorkers.sh

Next, I thought, it would be nice to make type:LoadBalancer work. Since Kelsey does not provide any guidance on this part, I turned to Google for help. It turned out that the documentation of the official kubernetes/cloud-provider-aws project is still very scarce, but, at least, the README.md provided the necessary AWS policies and some initial guidelines to start. Use of --cloud-provider=external resulted in tainted nodes with NoSchedule , so I googled more and found the necessary information in one of the forums. It is necessary to set the hostnames of the workers to their internal DNS addresses (such as 10–200–0–1.eu-central-1.compute.internal) as well as to tag the nodes with kubernetes.io/cluster/${CLUSTER_NAME} (kubernetes-the-hard-way in our case) set to owned as well as the subnets in which the nodes run with kubernetes.io/cluster/${CLUSTER_NAME}=shared. To avoid deployment of the load balancers in the private subnets (instead of the public ones) it is also necessary to tag the public subnets with kubernetes.io/role/elb = 1. With this and setting --cloud-provider=aws for the kube-apiserverkube-controller-manager and kubelet, the core AWS provider was initialized and started working. Hopefully, the documentation of the external aws cloud provider will be better once kubernetes retires the core provider.

To test things, I created a test deployment and a service of type:LoadBalancer for it. The ELB was deployed successfully; however it reported the nodes as unhealthy. It turned out that the node ports were not working. Furthermore, only the node on which the pod was running could access the pod via its cluster IP. Googling this issue brought me to the discussion here which also provided a solution — to change the proxy mode of kube-proxy from iptables to ipvs. Once this was fixed, the node ports started working and also the load balancer could forward the traffic to them.

Similarly to Kelsey I also deployed core DNS to enable routing within the cluster — this worked out of the box.

Next, I decided to play a bit with the Nginx ingress controller in the same environment. I successfully deployed the ingress controller as described here and tried to create an ingress object. However, this resulted in an error message with the following content:

Error from server (InternalError): error when creating “endpoint.yaml”: Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io”: Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded

After googling a bit, I found out that the request to the ingress controller admission service is done by the kube-apiserver which sits on the control plane (master) nodes in our scenario. After connecting to one of the master nodes, I was able to verify that this call does not work because the ClusterIP of the service which is in the range (--service-cluster-ip-range parameter) is not accessible. After reading about this range, I learned that the IP addresses in it are virtualized by the kube-proxy and are not provided via AWS ENIs. This brought me to the idea that the control plane nodes also need to run the kube-proxy to be able to reach k8s services within the cluster. Firstly, I tried to deploy kube-proxy alone, but it could not sync any rules on its own. After more googling, I concluded that kubelet is also required for the kube-proxy to run as needed. So, I added both kubelet and kube-proxy to the master nodes as systemd services (I also had to create the corresponding certificates and configs for the master nodes as can be seen here) and, after adding the required permissions to the instance profile of the master nodes, kubelet and kube-proxy were working and the master nodes were registered as ready. To prevent scheduling of pods on the master nodes, I also added the --register-with-taints=node-role.kubernetes.io/master=true:NoSchedule option to the kubelets on them. Finally, I added a small script to also label the nodes as master.

github: labelMasters.sh

After this, kubectl reported six ready nodes (three masters and three workers) and the nginx ingress controller started working as expected.

The resulting set of terraform and bash scripts was able to deploy and destroy the clusters in an easy and reproducible manner. To test the robustness of the things, I changed the Ubuntu version from 18.04 to 20.04 — it worked fine on there as well. As k8s 1.19 came out, I also tested the code with it and, after a small adaptation (changing the API for the scheduler config from alpha to beta), it worked fine too.

In summary, adapting Kelsey’s example to AWS and extending it to include some important k8s features was fun and brought me more insight into kubernetes inner workings as well as some bugs and ways to fix them. Of course, it is fairly easy to get a kubernetes cluster that can do all the things described above by using EKS but sometimes it is worth it to take the “scenic route” to learn a thing or two.

The complete code can be found in https://github.com/konstl000/kubernetes-the-hard-way-aws. The terraform modules used are here.


Now I officially earned the right to be on this pic (passed the CKA exam). Albeit things I learned from adapting k8s the Hard Way to AWS were not directly asked for, doing this certainly helped me to better understand k8s which, in turn, was very useful during the exam.

Haben wir Ihr Interesse geweckt? Dann schreiben Sie uns gerne an.

Folgen Sie uns auf unseren Social Media Accounts, um keinen neuen Blogartikel zu verpassen.

linkedin     xing     facebook     twitter

0 Kommentare

Einen Kommentar abschicken

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert