Istio Multi-Primary Multi-Cluster Blues
A recent initiative for my team at work has been to setup the potential for blue/green deployments at the cluster level. This requires bringing up two GKE clusters in parallel and connecting their networks via Istio. Once a multi-primary multi-cluster Istio deployment is configured across different networks, one can balance load between the two clusters using
subset. This did not prove to be an easy task in the end. This blog post will review the headaches and confusion that lead me to getting this functionality working on my own GKE setup.
What does Multi-Primary Multi-Network mean?
A multi-cluster Istio deployment with multi-primary on multiple networks means that:
- Two GKE clusters will be running Istio
- Each will have their own version of
istiodrunning (making them both "primaries")
- The two clusters will live in different VPCs; they don't have to be on the same network
Issue 1: Finding a tutorial that works
The first problem I ran into was determining how to get Istio multi-cluster running in the first place. I ran through a few different tutorials, including the GKE multi-cluster with shared control plane docs, but the only one that worked for me was the GKE multi-primary multi-network docs. I guess this feature isn't used by too many people, as there is not that many tutorials or reliable documentation on the internet for it.
Issue 2: IstioOperator blues
We have an infrastructure as code approach to our CI and standing up environments. This has made the
IstioOperator installation the best fit for juggling multiple versions of Istio across the myriad of project clusters we must maintain. We use the method that installs the operator with Helm and skip
istioctl, such that we can have everything live as code. Unfortunately, you may see that cute note at the top of the IstioOperator installation method: "Use of the operator for new Istio installations is discouraged in favor of the Istioctl and Helm". I discovered the hard way that the recommended installation method that we used in our project had been deprecated, forcing me to refactor our
IstioOperator configuration. Before, we kept the control plane and gateways defined in separate operator definitions, as the linked documentation explains. However, when I tried installing multi-cluster Istio with two separate operator definitions, I started getting authentication issues and SSL complaints from the gateways and
istiod (the control plane). It seems when the two are defined separately, the
cacert secret (that you create in the tutorial) isn't respected by both. So, I simply aggregated my operators into one big installation rather than separate pieces as per the old 1.8 documentation. After that, I was able to get past the SSL issues. I never was able to find a setting in the IstioOperator config options that would allow me to override the
cacert secret such that I could force both operators to use the same.
Issue 3: Standing up two clusters and constantly wiping Istio
The third issue that plagued this setup for me was that there are many moving parts in setting up Istio multi-cluster. You need to stand up two clusters, configure them, and install Istio. This makes for many opportunities to mess things up. In addition, it can sometimes be annoying to uninstall Istio, so after awhile I came up with a quick utility script to wipe Istio for me. Here's an example script that worked for me with one operator running:
kubectl delete ns istio-system & # Note; if you have multiple istiooperators, then you'll have to modify this kubectl get istiooperator -n istio-system $(kubectl get istiooperator -n istio-system --no-headers -o custom-columns=":metadata.name") -o=json | jq '.metadata.finalizers = null' | kubectl apply -f - kubectl delete ns istio-operator
Cross Cluster Gateway
This is an important aspect of this Istio deployment; the cross cluster gateway that allows for access across networks. As a part of the tutorial, you'll setup an
east-west gateway. This is different from our normal
ingress gateways in that it only handles inter-cluster communication. This is how we're able to bridge the separate networks. The two clusters grab their respective configurations with the remote secret that is generated for each cluster.
To route to a specific cluster, you use
subsets with destination rules. For what it's worth, the
topology.istio.io/cluster key used in the docs never worked for me, but the
topology.istio.io/network key did.
apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: multi-cluster-destination spec: host: si-sandbox-1-app trafficPolicy: tls: mode: ISTIO_MUTUAL subsets: - name: west labels: topology.istio.io/network: network1 - name: central labels: topology.istio.io/network: network2
- You need to handle CAs on your own for this type of deployment; you can use a CA provider like Vault or generate your own certs
- This documentation was useful for using my own CAs
- Sidecars must be enabled in namespaces that you wish to interact cross cluster
- You can keep traffic in cluster on a service, namespace, or global level for necessary services