What we’ll cover in this post?
This post in general is about how we can leverage kubernetes controllers and the kubernetes resource model to automate and manage heavy lifting of tricky deployment and management scenarios.
Controller are often explained in layman terms as software-SRE’s. The good thing about that is:
- They are always available and ready to step in when things go wrong.
- They have all the operational logic in the form of code related to the steps to take when certain things don’t work.
- They don’t need to go over runbooks/playbooks or other documentations to fix issues.
- Since they are always watching resources and their states (as we’ll see later in this post), they don’t have to spend time
to find where the errors/misconfiguration happened.
This can give us a few positive outcomes:
- Humans don’t have to be on hectic on-call schedules.
- In cases of errors, the software itself is healing the system, hence the MTTR(mean time to response) is considerably decreased.
“With great power comes great responsibilities”. With all the advantages of implementing controllers, they also come with high cost of implementation details.
Implementing controllers is generally more complex than traditional systems.
Now that we know a bit about controllers, we’ll take a use case and try to implement a controller to solve it.
Tl;Dr
If you want to directly go ahead and checkout the code implementation and test it on your local machine, I have the entire thing residing
in this – github.com/ishankhare07/launchpad repo. There is a detailed readme which walks you through
a Makefile that:
- Sets up a local kind cluster
- Installs Istio on it
- Deploys demo workloads(Deployments)
- Applies the right Istio CRD’s – a
VirtualService
and a DestinationRule
to achieve a traffic split. - Check and verify the traffic split.
It then also walks you through the controller way of doing things and also the magic of Software-SRE’s that they bring along.
Use case
The use case we are trying to tackle here is doing a traffic split using a a service mesh. Since this blog post can only have limited scope, what we will NOT talk about is:
- Service mesh in general or service mesh comparison. We’ll just go ahead and use Istio.
- We’ll not go into the details of how Istio or any other service mesh works.
The specific scenario we would want to cover is a Traffic Split
/Canary
. More details can be found by giving a look at this awesome example site – istiobyexample.dev/canary – by @askmeegs. Borrowing the following screenshot from that same article, demonstrating what we want to achieve:
The page also details how we can achieve this by creating Istio specific CRD’s. We’ll look at both the approaches and see the advantages first-hand where controllers come in.
Below are the Istio CRD’s that we would want to create in the cluster in order for Istio to do things for us:
DestinationRule
1
2
3
4
5
6
7
8
9
10
11
12
13
| apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: helloworld
spec:
host: helloworld
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
|
VirtualService
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: helloworld
spec:
hosts:
- helloworld
http:
- route:
- destination:
host: helloworld
subset: v1
weight: 80
- destination:
host: helloworld
subset: v2
weight: 20
|
Too much going around the core of this post, lets dive in now.
This post uses the awesome kubebuilder framework for implementing the controller. If you’re unfamiliar about it and want an intro
I’ve written a previous article about it on my blog which you can read here – Writing a kubernetes controller in Go with kubebuilder
The Deep Dive
We’ll start by initializing our project for the controller using go mod
and the kubebuilder
framework
1
2
3
4
5
6
7
8
9
10
| $ go mod init github.com/ishankhare07/launchpad
$ kubebuilder init --domain ishankhare.dev
# create scaffolding for the project
$ kubebuilder create api --group labs --version v1alpha1 --kind Workload
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing scaffold for you to edit...
|
The CRD
Next step is to define the CRD which we want to use for defining our traffic split to our controller. We want roughly the following yaml spec for it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| apiVersion: labs.ishankhare.dev/v1alpha1
kind: Workload
metadata:
name: workload-sample
spec:
host: helloworld
targets:
- name: helloworld-v1
namespace: default
trafficSplit:
hosts:
- helloworld
subset:
name: v1
labels:
version: v1
destinationHost: helloworld
weight: 80
- name: helloworld-v2
namespace: default
trafficSplit:
hosts:
- helloworld
subset:
name: v2
labels:
version: v2
destinationHost: helloworld
weight: 20
|
I know there’s room for improvement and this can be refined further, but for the purpose of this post, this should do fine. Lets go about defining this spec in our api/v1alpha1/workload_types.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
| ...
const (
PhasePending = "PENDING"
PhaseCreated = "CREATED"
)
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// Workload is the Schema for the workloads API
type Workload struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec WorkloadSpec `json:"spec,omitempty"`
Status WorkloadStatus `json:"status,omitempty"`
}
type WorkloadSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file
// Host is the name of the service from the service registry
Host string `json:"host,omitempty"`
Targets []DeploymentTarget `json:"targets,omitempty"`
}
// WorkloadStatus defines the observed state of Workload
type WorkloadStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
VirtualServiceState string `json:"virtualServiceState,omitempty"`
DestinationRuleState string `json:"destinationRuleState,omitempty"`
}
type DeploymentTarget struct {
Name string `json:"name,omitempty"`
Namespace string `json:"namespace,omitempty"`
TrafficSplit TrafficSplit `json:"trafficSplit,omitempty"`
}
type TrafficSplit struct {
Hosts []string `json:"hosts,omitempty"`
Subset Subset `json:"subset,omitempty"`
DestinationHost string `json:"destinationHost,omitempty"`
Weight int `json:"weight,omitempty"`
}
type Subset struct {
Name string `json:"name,omitempty"`
Labels map[string]string `json:"labels,omitempty"`
}
func (w *Workload) GetIstioResourceName() string {
return w.Spec.Host // strings.Split(w.Spec.Targets[0].Name, "-")[0]
}
func (w *Workload) GetHosts() []string {
// need hosts as a set data structure
hostSet := make(map[string]bool)
hosts := []string{}
for _, target := range w.Spec.Targets {
for _, host := range target.TrafficSplit.Hosts {
hostSet[host] = true
}
}
for host := range hostSet {
hosts = append(hosts, host)
}
return hosts
}
|
This should be enough to define our define the yaml that we presented before. If you’re following along the code in the repo, the above is defined in api/v1alpha1/workload_types.go. We also add some additional helper methods on the objects that will be useful later.
The controller
Let’s jump to controllers/workload_controller.go. I’m only going to show the relevant sections of the code here in favour of brevity. For detailed code one can go and see the linked file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
| ...
func (r *WorkloadReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
reqLogger := r.Log.WithValues("workload", req.NamespacedName)
// your logic here
reqLogger.Info("====== Reconciling Workload =======")
instance := &labsv1alpha1.Workload{}
err := r.Get(context.TODO(), req.NamespacedName, instance)
if err != nil {
// object not found, could have been deleted after
// reconcile request, hence don't requeue
if errors.IsNotFound(err) {
return ctrl.Result{}, nil
}
// error reading the object, requeue the request
return ctrl.Result{}, err
}
reqLogger.Info("Workload created for following targets.")
for _, target := range instance.Spec.Targets {
reqLogger.Info("target",
"name", target.Name,
"namespace", target.Namespace,
"hosts", fmt.Sprintf("%v", target.TrafficSplit.Hosts))
reqLogger.Info("subset",
"subset name", target.TrafficSplit.Subset.Name,
"subset labels", fmt.Sprintf("%v", target.TrafficSplit.Subset.Labels),
"weight", target.TrafficSplit.Weight)
}
// check current status of owned resources
if instance.Status.VirtualServiceState == "" {
instance.Status.VirtualServiceState = labsv1alpha1.PhasePending
}
if instance.Status.DestinationRuleState == "" {
instance.Status.DestinationRuleState = labsv1alpha1.PhasePending
}
// check virtual service status
vsResult, err := r.checkVirtualServiceStatus(instance, req, reqLogger)
if err != nil {
return vsResult, err
}
// check destination rule status
destRuleResult, err := r.checkDestinationRuleStatus(instance, req, reqLogger)
if err != nil {
return destRuleResult, err
}
// update status
err = r.Status().Update(context.TODO(), instance)
if err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
|
Here we are simply logging what the controller “sees” is requested to be created as “desired state” through the CRD.
Next we are calling the relevant functions checkVirtualServiceStatus
and checkDestinationRuleStatus
. There implementation is explained below, with just the following intention:
If the VirtualService
or DestinationRule
doesn’t exist yet, try to create it with the correct spec.
The VirtualService check
In the same file let’s implement the checkVirtualServiceStatus
method, please scroll to the right for extra comments explaining each section:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
| ...
func (r *WorkloadReconciler) checkVirtualServiceStatus(instance *labsv1alpha1.Workload, req ctrl.Request, logger logr.Logger) (ctrl.Result, error) {
switch instance.Status.VirtualServiceState {
case labsv1alpha1.PhasePending:
logger.Info("Virtual Service", "PHASE:", instance.Status.VirtualServiceState) // If still in pending state (initial condition)
logger.Info("Transitioning state to create Virtual Service") // transition to the creating state
instance.Status.VirtualServiceState = labsv1alpha1.PhaseCreated //
case labsv1alpha1.PhaseCreated:
logger.Info("Virtual Service", "PHASE:", instance.Status.VirtualServiceState) //
query := &istiov1alpha3.VirtualService{} // create an empty query object
//
// check if virtual service already exists //---------\
lookupKey := types.NamespacedName{ // \
Name: instance.GetIstioResourceName(), // --- construct the query object
Namespace: instance.GetNamespace(), // /
} //---------/
err := r.Get(context.TODO(), lookupKey, query) // query result
if err != nil && errors.IsNotFound(err) { //
logger.Info("virtual service not found but should exist", "lookup key", lookupKey) //
logger.Info(err.Error()) //
// virtual service got deleted or hasn't been created yet //
// create one now //
vs := spawn.CreateVirtualService(instance) // create the actual virtual service as defined by Istio
err = ctrl.SetControllerReference(instance, vs, r.Scheme) // Set owner references on the created object
if err != nil { //
logger.Error(err, "Error setting controller reference") //
return ctrl.Result{}, err //
} //
//
err = r.Create(context.TODO(), vs) // request the API server to create the actual object in
if err != nil { // the cluster i.e. persist in the etcd store
logger.Error(err, "Unable to create Virtual Service") //
return ctrl.Result{}, err //
} //
//
logger.Info("Successfully created virtual service") //
} else if err != nil { //
logger.Error(err, "Unable to create Virtual Service") //
return ctrl.Result{}, err //
} else { //
// don't requeue, it will happen automatically when //
// virtual service status changes //
return ctrl.Result{}, nil //
} //
// more fields related to virtual service status can be checked
// see more at https://pkg.go.dev/istio.io/api/meta/v1alpha1#IstioStatus
}
return ctrl.Result{}, nil
}
|
The line vs := spawn.CreateVirtualService(instance)
defines another function to create the actual Istio
VirtualService
for us which we will define further in the spawn
package.
The DestinationRule check
Similar to the virtual service, we’ll create the checkDestinationRuleStatus
method in the same file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
| ...
func (r *WorkloadReconciler) checkDestinationRuleStatus(instance *labsv1alpha1.Workload, req ctrl.Request, logger logr.Logger) (ctrl.Result, error) {
switch instance.Status.DestinationRuleState {
case labsv1alpha1.PhasePending:
logger.Info("Destination Rule", "PHASE:", instance.Status.DestinationRuleState)
logger.Info("Transitioning state to create Destination Rule")
instance.Status.DestinationRuleState = labsv1alpha1.PhaseCreated
case labsv1alpha1.PhaseCreated:
logger.Info("Destination Rule", "PHASE:", instance.Status.DestinationRuleState)
query := &istiov1alpha3.DestinationRule{}
// check if destination rule already exists
lookupKey := types.NamespacedName{
Name: instance.GetIstioResourceName(),
Namespace: instance.GetNamespace(),
}
err := r.Get(context.TODO(),
lookupKey, query)
if err != nil && errors.IsNotFound(err) {
logger.Info("destination rule not found but should exist", "req key", lookupKey)
logger.Info(err.Error())
// destination rule got deleted or hasn't been created yet
// create one now
dr := spawn.CreateDestinationRule(instance)
err := ctrl.SetControllerReference(instance, dr, r.Scheme)
if err != nil {
logger.Error(err, "Error setting controller reference")
return ctrl.Result{}, err
}
err = r.Create(context.TODO(), dr)
if err != nil {
logger.Error(err, "Unable to create Destination Rule")
return ctrl.Result{}, err
}
logger.Info("Successfully created Destination Rule")
} else if err != nil {
logger.Error(err, "Unable to create destination rule")
return ctrl.Result{}, err
} else {
// don't requeue, it will happen automatically when
// destination rule status changes
return ctrl.Result{}, nil
}
}
return ctrl.Result{}, nil
}
|
The line dr := spawn.CreateDestinationRule(instance)
defines another function to create the actual Istio
DestinationRule
for us, which we will define further in the spawn
package.
Creating the Istio CRD’s (defining the ‘spawn’) package
Let’s create the following two files in the pkg
directory:
- destination-rule.go
- virtual-service.go
The structure should be as follows:
1
2
3
4
5
| $ tree ./pkg
./pkg
└── spawn
├── destination-rule.go
└── virtual-service.go
|
In the file virtual-service.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
| package spawn
import (
labsv1alpha1 "github.com/ishankhare07/launchpad/api/v1alpha1"
istionetworkingv1alpha3 "istio.io/api/networking/v1alpha3" // ---------\
// - import the required Istio structures (CRD's)
istiov1alpha3 "istio.io/client-go/pkg/apis/networking/v1alpha3" // ---------/
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func CreateVirtualService(workload *labsv1alpha1.Workload) *istiov1alpha3.VirtualService {
vs := &istiov1alpha3.VirtualService{
ObjectMeta: metav1.ObjectMeta{
Name: workload.GetIstioResourceName(),
Namespace: "default",
},
}
routeDestination := []*istionetworkingv1alpha3.HTTPRouteDestination{}
for _, target := range workload.Spec.Targets {
// extract route from CRD
route := &istionetworkingv1alpha3.HTTPRouteDestination{
Destination: &istionetworkingv1alpha3.Destination{
Host: target.TrafficSplit.DestinationHost,
Subset: target.TrafficSplit.Subset.Name,
},
Weight: int32(target.TrafficSplit.Weight),
}
// append routes into route destination
routeDestination = append(routeDestination, route)
}
// append route destination to VirtualService.Routes
vs.Spec.Http = append(vs.Spec.Http, &istionetworkingv1alpha3.HTTPRoute{
Route: routeDestination,
})
// set hosts
vs.Spec.Hosts = workload.GetHosts()
return vs
}
|
We import the right structures from the Istio’s upstream repo and construct the object based on the values provided in our original Workload
CRD.
In the similar way we define the destination rule object generation in the file destination-rule.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| package spawn
import (
labsv1alpha1 "github.com/ishankhare07/launchpad/api/v1alpha1"
istionetworkingv1alpha3 "istio.io/api/networking/v1alpha3"
istiov1alpha3 "istio.io/client-go/pkg/apis/networking/v1alpha3"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func CreateDestinationRule(workload *labsv1alpha1.Workload) *istiov1alpha3.DestinationRule {
dr := &istiov1alpha3.DestinationRule{
ObjectMeta: metav1.ObjectMeta{
Name: workload.GetIstioResourceName(),
Namespace: workload.GetNamespace(),
},
}
for _, target := range workload.Spec.Targets {
dr.Spec.Subsets = append(dr.Spec.Subsets, &istionetworkingv1alpha3.Subset{
Name: target.TrafficSplit.Subset.Name,
Labels: target.TrafficSplit.Subset.Labels,
})
}
dr.Spec.Host = workload.Spec.Host
return dr
}
|
Telling the controller to ‘watch’ the resources we own
One last trick left in the implementation of our controller is, when we set the Owner references for each created VirtualService
and DestinationRule
in these lines:
1
2
3
4
5
6
| ...
err = ctrl.SetControllerReference(instance, vs, r.Scheme)
...
err := ctrl.SetControllerReference(instance, dr, r.Scheme)
...
|
We are essentially telling kubernetes that these objects are “owned” by THIS controller
We can then ask kubernetes to send is events related to these “OWNED” resources whenever anything happens with them (delete/update)
To do this we go back to the end of the controllers/workload_controller.go
and add the following:
1
2
3
4
5
6
7
| func (r *WorkloadReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&labsv1alpha1.Workload{}).
Owns(&istiov1alpha3.VirtualService{}). // we own both these type of objects
Owns(&istiov1alpha3.DestinationRule{}). // hence send us events related to these
Complete(r)
}
|
Are we done?
One last thing. Since Istio
’s VirtualService
and DestinationRule
are external kubernetes obejcts (for which we have registered the CRD’s while installing istio), we are still using raw golang objects / structs and passing it to the controller-runtime
client.
The client needs some additional information to “map the golang obejcts with the registered crds”.
Note that the golang objects we created don’t have information like apiVersion
, kind
Hence to provide this information we need to register something called as a scheme
.
A Scheme basically maps the Go object to what we call GVK(GroupVersionKind) in kubernetes. This is usually auto generated when CRD’s are defined using any framework like kubebuilder or operatorsdk. There’s even one generated for our Workload
kind which you can take a look at here – api/v1alpha1/groupversion_info.go
1
2
3
4
5
6
7
8
9
10
11
12
| ...
var (
// GroupVersion is group version used to register these objects
GroupVersion = schema.GroupVersion{Group: "labs.ishankhare.dev", Version: "v1alpha1"}
// SchemeBuilder is used to add go types to the GroupVersionKind scheme
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}
// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)
|
A similar scheme is defined by the Istio
library, which we can register in our main.go
1
2
3
4
5
6
7
8
9
10
11
12
| import (
...
istioscheme "istio.io/client-go/pkg/clientset/versioned/scheme"
...
)
...
func init() {
...
_ = istioscheme.AddToScheme(scheme)
}
|
Now we are done! You can follow along the Controller-demo to see the magic happen!
Some final highlights
As pointed out the final sections of the demo, even if someone/something accidentally or deliberately deletes either of the VirtualService
or the DestinationRule
, the controller immediately steps in and recreates it for us.
One more trick left to showcase. If you were to go and delete either the VirtualService or the DestinationRule now, since the controller is watching over them, it will instantly recreate it with the correct configuration. This simulates an actual human error which have triggered alerts for the team in the past only to realise that somewhere, somebody did a kubectl delete vs helloworld. Go ahead, try it. Then try to run kubectl get vs,dr
again.
Lastly I’d like to mention the awesome work being done by folks at @crossplane and @hasheddan who are building some really cool stuff leveraging the controllers and the KRM(kubernetes resource model). If you’re more interested, checkout the following resources:
- Kubernetes as a Framework for Control Planes.
- Crossplane team integrating GKE Autopilot before the night ends on live stream.
What’s next?
In a follow-up post I’ll talk about how we can bring Admission Webhooks, particularly Validating Admission Webhooks into the picture and make sure nobody’s able to create our Workload
CRD with wrong values – like a traffic split that doesn’t add up to a 100% for eg. (80:30
or 40:30
)
This post was original posted on my personal blog ishankhare.dev