another man's ramblings on code and tech

Istio Multi-Primary Multi-Cluster Blues


A recent initiative for my team at work has been to setup the potential for blue/green deployments at the cluster level. This requires bringing up two GKE clusters in parallel and connecting their networks via Istio. Once a multi-primary multi-cluster Istio deployment is configured across different networks, one can balance load between the two clusters using DestinationRules and subset. This did not prove to be an easy task in the end. This blog post will review the headaches and confusion that lead me to getting this functionality working on my own GKE setup.

What does Multi-Primary Multi-Network mean?

A multi-cluster Istio deployment with multi-primary on multiple networks means that:

  • Two GKE clusters will be running Istio
  • Each will have their own version of istiod running (making them both "primaries")
  • The two clusters will live in different VPCs; they don't have to be on the same network

Issue 1: Finding a tutorial that works

The first problem I ran into was determining how to get Istio multi-cluster running in the first place. I ran through a few different tutorials, including the GKE multi-cluster with shared control plane docs, but the only one that worked for me was the GKE multi-primary multi-network docs. I guess this feature isn't used by too many people, as there is not that many tutorials or reliable documentation on the internet for it.

Issue 2: IstioOperator blues

We have an infrastructure as code approach to our CI and standing up environments. This has made the IstioOperator installation the best fit for juggling multiple versions of Istio across the myriad of project clusters we must maintain. We use the method that installs the operator with Helm and skip istioctl, such that we can have everything live as code. Unfortunately, you may see that cute note at the top of the IstioOperator installation method: "Use of the operator for new Istio installations is discouraged in favor of the Istioctl and Helm". I discovered the hard way that the recommended installation method that we used in our project had been deprecated, forcing me to refactor our IstioOperator configuration. Before, we kept the control plane and gateways defined in separate operator definitions, as the linked documentation explains. However, when I tried installing multi-cluster Istio with two separate operator definitions, I started getting authentication issues and SSL complaints from the gateways and istiod (the control plane). It seems when the two are defined separately, the cacert secret (that you create in the tutorial) isn't respected by both. So, I simply aggregated my operators into one big installation rather than separate pieces as per the old 1.8 documentation. After that, I was able to get past the SSL issues. I never was able to find a setting in the IstioOperator config options that would allow me to override the cacert secret such that I could force both operators to use the same.

Issue 3: Standing up two clusters and constantly wiping Istio

The third issue that plagued this setup for me was that there are many moving parts in setting up Istio multi-cluster. You need to stand up two clusters, configure them, and install Istio. This makes for many opportunities to mess things up. In addition, it can sometimes be annoying to uninstall Istio, so after awhile I came up with a quick utility script to wipe Istio for me. Here's an example script that worked for me with one operator running:

kubectl delete ns istio-system &
# Note; if you have multiple istiooperators, then you'll have to modify this
kubectl get istiooperator -n istio-system $(kubectl get istiooperator -n istio-system --no-headers -o custom-columns=":metadata.name") -o=json | jq '.metadata.finalizers = null' | kubectl apply -f -

kubectl delete ns istio-operator

Cross Cluster Gateway

This is an important aspect of this Istio deployment; the cross cluster gateway that allows for access across networks. As a part of the tutorial, you'll setup an east-west gateway. This is different from our normal ingress gateways in that it only handles inter-cluster communication. This is how we're able to bridge the separate networks. The two clusters grab their respective configurations with the remote secret that is generated for each cluster.

Routing

To route to a specific cluster, you use subsets with destination rules. For what it's worth, the topology.istio.io/cluster key used in the docs never worked for me, but the topology.istio.io/network key did.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: multi-cluster-destination
spec:
  host: si-sandbox-1-app
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  subsets:
  - name: west
    labels:
      topology.istio.io/network: network1
  - name: central
    labels:
      topology.istio.io/network: network2

Misc Notes

  • You need to handle CAs on your own for this type of deployment; you can use a CA provider like Vault or generate your own certs
  • Sidecars must be enabled in namespaces that you wish to interact cross cluster
  • You can keep traffic in cluster on a service, namespace, or global level for necessary services
Date: May 20 2022

PREVIOUS NEXT