December 23rd, 2021

Anthos GKE - December 23rd, 2021 [Issue]

Services

## Issue * When deploying Anthos clusters on VMware releases with a version number of 1.9.0 or higher, that have the Seesaw bundled load balancer in an environment that uses NSX-T stateful distributed firewall rules, `stackdriver-operator` might fail to create `gke-metrics-agent-conf` ConfigMap and cause `gke-connect-agent` Pods to be in a crash loop. The underlying issue is that stateful NSX-T distributed firewall rules terminate the connection from a client to the user cluster API server through the Seesaw load balancer because Seesaw uses asymmetric connection flows. The integration issue with NSX-T distributed firewall rules affect all Anthos clusters on VMWare releases that use Seesaw. You might see similar connection problems on your own applications when they create large Kubernetes objects whose sizes are bigger than 32K. Follow [these instructions](https://cloud.google.com/anthos/clusters/docs/on-prem/1.10/how-to/bundled-load-balance#configuring%5Fstateless%5Fnsx-t%5Fdistributed%5Ffirewall%5Fpolicies%5Ffor%5Fuse%5Fwith%5Fseesaw%5Fload%5Fbalancer) to disable NSX-T distributed firewall rules, or to use stateless distributed firewall rules for Seesaw VMs. * If your clusters use a manual load balancer, follow [these instructions](https://cloud.google.com/anthos/clusters/docs/on-prem/1.10/how-to/manual-load-balance#resetting%5Fconnections%5Fto%5Ffailed%5Fnodes%5Frecommended) to configure your load balancer to reset client connections when it detects a backend node failure. Without this configuration, clients of the Kubernetes API server might stop responding for several minutes when a server instance goes down.