DNS on GKE: Everything you need to know

Abdellfetah SGHIOUAR
Google Cloud - Community
12 min readJul 22, 2022

--

Wow another article about DNS on Kubernetes/GKE, aren’t there enough of these already on the Internet? Yes, there are but I wrote this as a resumed version of what I learned when I attempted to understand everything DNS on GKE. So I mainly wrote this for me and if you find it useful everything wins :)

This article is trying to answer one simple question: When deciding on how to use DNS with GKE, what are the available native k8s options, which options exist on Google Cloud, and how do these two things play together?

Everything in this article is true as of the first time it was released (July 2022) and tested on GKE Version 1.22.8

Kube-DNS

Kube-DNS has been around for a while. It was the default DNS Server available in upstream Kubernetes (and GKE) up until v1.12 when CoreDNS was introduced as a second option. It was possible of course before to swap kube-dns for CoreDNS, but now CoreDNS is available as one option when you deploy K8s via kube-adm, kube-up…etc.

kube-dns is pretty straightforward. It handles DNS registration and resolution for Pods and Services.

Source: https://cloud.google.com/kubernetes-engine/docs/how-to/kube-dns

Kube-DNS itself is deployed into the kube-system namespace. Alongside the kube-dns-autoscaler which does what its name says, It horizontally auto-scales kube-dns. Autoscaling follows a simple logic that tries to balance the total number of nodes and the cores per node. on GKE you can follow this guide for configuring custom autoscaling.

You can also configure custom domain resolvers. For example if you have your own DNS Servers (On the VPC on OnPrem connected to Google Cloud via Cloud VPN or Interconnect).

In simple terms, if you have an API (api.company.com) that is resolvable by an external DNS Server reachable on IP Address a.b.c.d and you want workloads in GKE to resolve this FQDN, you can configure kube-dns to forward the resolving request to the external DNS Server using subdomains.

If you are using Google Cloud DNS to manage your domains using private zones, you don’t have to configure anything extra. all you need to do is make sure your private zone is configured to allow the VPC on which the GKE cluster is created to query it.

Let’s take a real example. Say for instance my API is available on the VPC and is exposed behind an Internal LoadBalancer with a static IP Address. Instead of using the IP, you will want to use an FQDN to reach the API. You can use Cloud DNS to register the load balancer IP. For example, api.mycompany.com resolves to 1.2.3.4

  1. Create a GKE Cluster
export PROJECT_ID=`gcloud config get-value project --quiet`
export ZONE=insert_your_zone
gcloud container clusters create gke-dns \
--project $PROJECT_ID \
--zone $ZONE \
--cluster-version 1.22.8-gke.202 \
--release-channel regular

2. Create a Cloud DNS Private Zone and add a record

gcloud dns managed-zones create mycompany \
--description=mycompany_zone \
--dns-name=mycompany.com \
--visibility=private \
--networks=default
gcloud dns record-sets create api.mycompany.com. \
--rrdatas=1.2.3.4 \
--type=A \
--ttl=60 \
--zone=mycompany

3. Run a workload on the GKE cluster.

kubectl apply -f https://raw.githubusercontent.com/boredabdel/useful-k8s-stuff/main/netshoot-pod.yaml

4. Remote exec into the pod and try to resolve api.mycompany.com

kubectl exec -it netshoot -- bash
dig api.mycompany.com

The output should look like

bash-5.1# dig api.mycompany.com; <<>> DiG 9.18.3 <<>> api.mycompany.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43315
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;api.mycompany.com. IN A
;; ANSWER SECTION:
api.mycompany.com. 60 IN A 1.2.3.4
;; Query time: 7 msec
;; SERVER: 10.112.0.10#53(10.112.0.10) (UDP)
;; WHEN: Wed Jul 20 09:05:47 UTC 2022
;; MSG SIZE rcvd: 62

We used Cloud DNS to resolve an API endpoint from a workload on GKE without having to do anything extra.

5. Cleanup

gcloud container clusters delete gke-dns --zone $ZONE \
--project $PROJECT_ID \
gcloud dns record-sets delete api.mycompany.com \
--type=A \
--zone mycompany
gcloud dns managed-zones delete mycompany

Cloud DNS

Source: https://cloud.google.com/kubernetes-engine/docs/how-to/cloud-dns

One option is to replace Kube-DNS with a much more reliable and robust option Cloud DNS.

Fun fact: did you know Cloud DNS has a 100% Service Level Objectif (SLO). Yes Yes 100%, you can check it here.

Now, this is where things get confusing a bit. When using Cloud DNS with GKE, Only Services ClusterIPs (including headless or external name Services) are registered with Cloud DNS. If you create a Service of the type LoadBalancer or an Ingress, the load balancer IP (Private or Public) is NOT registered with Cloud DNS (at least for now, this might change in the future).

If you want to automatically register a GKE load balancer IP with a DNS Server read on for the next options. But for now, let’s explore this.

Cloud DNS Scopes

Before you use Cloud DNS with GKE you have to decide if you want to go with the VPC or Cluster Scope:

  • VPC Scope: DNS records are resolvable within the entire VPC. This option is GA.
  • Cluster Scope: DNS records are resolvable only within the Cluster. This option is in Preview for now.

There are a couple of caveats you have to keep in mind when making the decision:

  • With VPC Scope and because Services can overlap across clusters on the same VPC (which normally is not an issue with kube-dns). So you have to set a custom domain for each cluster. For example, if you have cluster1 and cluster2 and you want to use VPC Scopes DNS. Your Services FQDN will become svc.cluster.cluster1 and svc.cluster.cluster2 respectively. Instead of the usual svc.cluster.local. Read more here.
  • Cluster Scope allows you to seamlessly migrate from kube-dns to Cloud DNS without worrying about overlapping Services.
  • Enabling Cloud DNS is a one way-operation, there is no way to go back to kube-dns.
  • Enabling Cloud DNS on an existing cluster is not enough, you have to perform an upgrade on the node pools which will force the nodes to re-create.

Let’s create a Cloud DNS-enabled cluster with VPC scope and see what it looks like.

export PROJECT_ID=`gcloud config get-value project --quiet`
export ZONE=insert_your_zone
gcloud container clusters create gke-clouddns \
--project $PROJECT_ID \
--zone $ZONE \
--cluster-version 1.22.8-gke.202 \
--release-channel regular \
--cluster-dns=clouddns \
--cluster-dns-scope=vpc \
--cluster-dns-domain=gke-clouddns

If you check the Cloud DNS page, you will notice a new private zone was provisioned for you. This is where all the records for Services are going to be registered. The screenshot below shows my cluster.

Screenshot of private Cloud DNS zone

Let’s create a Kubernetes Service and see it being registered in Cloud DNS.

cat << EOF > service.yaml
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
ports:
- protocol: TCP
port: 80
targetPort: 9376
EOF
kubectl apply -f service.yaml
kubectl get service

You will notice the service has a ClusterIP.

my-service   ClusterIP   10.8.12.96   <none>        80/TCP    14s

Head to the Cloud DNS page, you should see a new record called my-service.default.svc.gke-clouddns. If you check the IP address it should match the ClusterIP.

You might say great, now what? Cloud DNS with GKE handles records for Services, why should I care?

Well, it’s pretty straightforward, but if the advantages of Cloud DNS don’t convince you here is something that could. Imagine you have a StatefulSet (Maybe Kafka) you want to talk to? Because VPC native clusters are routable inside the VPC, you can technically call the pod using its IP without the need for a load balancer! here is an example

cat << EOF > headless-service.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 1 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
EOF
kubectl apply -f headless-service.yaml

What did we do? We deployed a dummy Statefulset that runs Nginx and create a Headless Service to expose it. Kubernetes Headless Services doesn’t create a ClusterIP. So what will happen now?

Head to the Cloud DNS page and look at the record that was created called web-0.nginx.default.svc.gke-clouddns. If you check the record it should have the IP address of the Pod.

This could be useful for situations where you need to talk to the pod using its IP without a load balancer. And the DNS record will be updated if the pod changes IP. Go ahead and try to delete the pod called web-0 and refresh the Cloud DNS page.

Cleanup

export PROJECT_ID=`gcloud config get-value project --quiet`
export ZONE=insert_your_zone
gcloud container clusters delete gke-clouddns \
--project $PROJECT_ID \
--zone $ZONE

NodeLocal DNSCache

NodeLocal DNSCache is a Kubernetes native feature that is available also on GKE. Its runs a pod that caches DNS requests on the node, making DNS lookup faster and reducing the number of DNS queries to kube-dns or Cloud DNS.

You can enable it on an existing cluster with version > 1.15.

Now let’s get to how to automatic load balancer registration.

External DNS

Source: https://github.com/kubernetes-sigs/external-dns

external-dns is an open source controller for GKE that integrates with a lot of DNS providers. You can check the list of compatible providers on this page. The Google Cloud DNS support is stable.

Let’s see how it could be used to integrate GKE and Google Cloud DNS for Services of type LoadBalancer and Ingresses.

  1. Create a GKE cluster
export PROJECT_ID=`gcloud config get-value project --quiet`
export ZONE=insert_your_zone
gcloud container clusters create gke-clouddns \
--project $PROJECT_ID \
--zone $ZONE \
--cluster-version 1.22.8-gke.202 \
--release-channel regular \
--scopes "https://www.googleapis.com/auth/ndev.clouddns.readwrite"

2. Create a Cloud DNS Public Managed Zone (this works with private Zones as well)

gcloud dns managed-zones create mycompany-dot-com\
--dns-name "mycompany.com" \
--description "Automatically managed zone by external-dns"

3. Deploy external-dns and make sure it works

kubectl apply -f https://raw.githubusercontent.com/boredabdel/useful-k8s-stuff/main/external-dns-gcp.yamlkubectl -n external-dns get pods -l app=external-dns

4. Deploy a test app and a Service

cat << EOF > nginx.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
annotations:
external-dns.alpha.kubernetes.io/hostname: nginx.mycompany.com.
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 80
selector:
app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
EOF
kubectl apply -f nginx.yaml

This will deploy nginx with a Service of type LoadBalancer. It takes a couple of minutes for the LB to be provisioned and to get an IP assigned.

You can check it with this command:

kubectl get services

Once the IP of the LoadBalancer is visible in the console, head to the Cloud DNS page, you should see an entry matching the annotation in the Service.

Now let’s try with an Ingress.

5. Deploy a test app and an Ingress

cat << EOF > ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx
spec:
rules:
- host: nginx-ingress.mycompany.com
http:
paths:
- path: /
backend:
service:
name: nginx
port:
number: 80
pathType: Prefix
EOF
kubectl apply -f ingress.yaml

The Ingress takes about 5 min to provision the load balancer and gets an IP. Monitor it with:

watch kubectl get ingress

Check the Cloud DNS page, you should see a new DNS record for the nginx-ingress.mycompany.com FQDN.

6. Cleanup

export PROJECT_ID=`gcloud config get-value project --quiet`
export ZONE=insert_your_zone
gcloud container clusters delete gke-clouddns \
--project $PROJECT_ID \
--zone $ZONE
gcloud dns record-sets delete \
--zone mycompany-dot-com \
--type A \
nginx-ingress.mycompany.com
gcloud dns record-sets delete \
--zone mycompany-dot-com \
--type TXT \
a-nginx-ingress.mycompany.com
gcloud dns record-sets delete \
--zone mycompany-dot-com \
--type TXT \
nginx-ingress.mycompany.com
gcloud dns managed-zones delete mycompany-dot-com

Service Directory

Source: https://cloud.google.com/icons

Service Directory is a service discovery tool in Google Cloud. It allows you to publish and discover services across multiple environments and query them using gRPC, HTTP, or DNS.

In Service Directory terminology, an endpoint is an individual IP/Port pair or an optional URL. endpoints are groups in Services that are grouped in namespaces.

Service Directory has many advantages:

  • Independent globally available service that can be reached from any compute environment but also from onPrem.
  • Integrated with Cloud DNS
  • Integrated with IAM, and Cloud Ops for monitoring and logging.
  • Built-in support in gcloud

In this section, we will see how we can leverage Service Directory, GKE, and Cloud DNS to automatically register an Internal LoadBalancer created by a Kubernetes Service and query its IP over DNS from Cloud DNS.

This is my favorite combination, everything in this setup is native to Google Cloud, fully integrated and managed, and doesn’t require installing extra controllers.

Limitations:

  • The GKE integration into Service Directory is in Preview as of the moment of this article's release date.
  • The GKE integration only supports Kubernetes Services. Ingresses and Gateways have to be registered manually for now.
  1. Enable needed APIs.
export PROJECT_ID=`gcloud config get-value project --quiet`gcloud services enable \
--project=$PROJECT_ID \
container.googleapis.com \
gkeconnect.googleapis.com \
gkehub.googleapis.com \
cloudresourcemanager.googleapis.com

2. Create a GKE cluster with Workload Identity enabled.

export ZONE=insert_your_zonegcloud container clusters create gke-sd \
--project $PROJECT_ID \
--zone $ZONE \
--cluster-version 1.22.8-gke.202 \
--release-channel regular \
--workload-pool=$PROJECT_ID.svc.id.goog \
--scopes "https://www.googleapis.com/auth/cloud-platform"

3. Register the cluster with the fleet.

gcloud container fleet memberships register gke-sd \
--gke-uri="https://container.googleapis.com/v1/projects/$PROJECT_ID/locations/$ZONE/clusters/gke-sd" \
--enable-workload-identity

4. Enable Service Directory

gcloud alpha container hub service-directory enable

5. Configure a Service Directory registration policy

cat << EOF > registration-policy.yaml
apiVersion: networking.gke.io/v1alpha1
kind: ServiceDirectoryRegistrationPolicy
metadata:
name: default
namespace: default
spec:
resources:
- kind: Service
EOF
kubectl apply -f registration-policy.yaml

6. Deploy an app with an Internal LoadBalancer

cat << EOF > nginx.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
sd-import: "true"
annotations:
cloud.google.com/load-balancer-type: "Internal"
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 80
selector:
app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
EOF
kubectl apply -f nginx.yaml

7. Check that the Service Has been registered to Service Directory

export REGION=insert_your_regiongcloud beta service-directory services resolve nginx \
--location=$REGION \
--namespace=default

Your output should look like X.Y.Z.W should match the IP of the LoadBalancer

service:
createTime: '2022-07-21T11:45:59.522716Z'
endpoints:
- address: X.Y.Z.W
createTime: '2022-07-21T11:46:00.553537Z'
name: projects/PROJECT_ID/locations/REGION/namespaces/default/services/nginx/endpoints/CLUSTER
network: projects/532287715784/locations/global/networks/default
port: 80
updateTime: '2022-07-21T11:46:00.553537Z'
name: projects/PROJECT_ID/locations/REGION/namespaces/default/services/nginx
updateTime: '2022-07-21T11:45:59.522716Z'

So far, we have enabled the Service Directory integration with GKE and used it to register a Service of the type LoadBalancer. We used gcloud to query the API and check if the IP of the Service was registered.

Service Directory itself can be queried using Cloud DNS, let’s see how this works in practice.

  1. Create a Cloud DNS managed zone for Service Directory.
gcloud dns managed-zones create mycompany-dot-com \
--dns-name mycompany.com. \
--description mycompanyzone \
--visibility private \
--networks default \
--service-directory-namespace https://servicedirectory.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/namespaces/default

2. Run a client workload on the cluster.

kubectl apply -f https://raw.githubusercontent.com/boredabdel/useful-k8s-stuff/main/netshoot-pod.yaml

3. Try to resolve the Internal LoadBalancer

Remote exec into the pod and try to resolve nginx.mycompany.com

kubectl exec -it POD_NAME -- bashdig nginx.mycompany.com

The output should contain the IP address of the LoadBalancer.

Congrats you used only native Google Cloud, GKE, Cloud DNS, and Service Directory to automatically register load balancers to Service Directory and resolve their IP via DNS.

Cleanup

export PROJECT_ID=`gcloud config get-value project --quiet`
export ZONE=insert_your_zone
gcloud container clusters delete gke-sd \
--project $PROJECT_ID \
--zone $ZONE

Let me know what you think about this article on Twitter or LinkedIn.

Use my discount code to get your tickets for KubeCon North America

  • $300 off Individual** rates:
    Use code ABDEL_IN
    All-access pass: $779 KubeCon Only pass: $400
  • $700 off Individual** rates:
    Use code ABDEL_C
    All-access pass: $1279 KubeCon Only pass: $900

--

--

Abdellfetah SGHIOUAR
Google Cloud - Community

Google Cloud Engineer with a focus on Serverless, Kubernetes, and Devops Methodologies. A supporter and contributor to OSS. Podcast Host @cloudcareers.dev