There is no excerpt because this is a protected post.
After we started in-place updating Kubernetes 1.10 to 1.12, everything seemed fine. Eight hours of sleep, and a few pods OOMKills later – we noticed that some of the AWS ELBs that front our Kubernetes services were reporting a few unhealthy target nodes. While one of them was marking every node as unhealthy – technically taking the service down, most of the affected ELBs were only missing a few nodes, the same set across all services. Fortunately, troubleshooting Kubernetes services and ingresses is a well-known process as tenants commonly misconfigure them. The first step involves verifying that endpoints for the service considered are listed by Kubernetes, which reveals the most common configuration issues such as a missing port declaration on the containers specification, a missing network policy or a pod label/port name typo. Regarding the service that was completely unhealthy – it turned out that a network policy was missing. Unauthorized traffic is […]