In this post we'll implement a simple kubernetes controller using the kubebuilder
The installation instructions can be found on kubebuilder installation docs
Since kubebuilder internally relies on kustomize, hence we need to install that from the instructions here
Now we're all set up to start building our controller
I have initialized an empty go project
$ go mod init cnat
Next we'll initialize the controller project
kubebuilder init --domain ishankhare.dev
Now we'll ask kubebuilder to setup all the necessary scaffolding for our project
$ kubebuilder create api --group cnat --version v1alpha1 --kind At
Kubebuilder will ask us whether to create Resource
and Controller
directories, to which we can answer y
each, since we want these to be created
Create Resource [y/n] y Create Controller [y/n] y Writing scaffold for you to edit...
If we now run make install
, kubebuilder should generate the base CRDs
under config/crd/bases
and a few other files for us.
Running make run
should now allow us to launch the operator locally. This is really helpful for local testing while we will be implementing and debugging the business logic of our operator code. The live logs will give us insights about how our codebase changes reflect to what's happening with the operator.
In the file api/v1alpha1/at_types.go
add the following comment above At
struct.
// +kubebuilder:object:root=true // +kubebuilder:subresource:status // At is the Schema for the ats API type At struct { . . .
This will enable us to use privilege separation between spec
and status
fields.
A little background on what we're going to build here. Our controller is basically a 'Cloud Native At' (cnat) for short – a cloud native version of the Unix's at
command. We provide a CRD
manifest containing 2 main things:
Our controller will basically wait and watch till the appropriate time, then would spawn a Pod
and run the given command in that Pod
. All this will it will also keep updating the Status
of that CRD
which we can view with kubectl
So let's begin
CRD
outlineThis section basically describes how we want our CRD
to look like. Given the two points we listed above, we can basically represent them as:
apiVersion: cnat.ishankhare.dev/v1alpha1 kind: At metadata: name: at-sample spec: schedule: "2020-11-16T10:12:00Z" command: "echo hello world!"
This is enough information for our controller to decide 'when' and 'what' to execute.
CRD
Now that we know what we expect out of the CRD
let's implement that. Goto the file api/v1alpha1/at_types.go
and add the following fields to the code:
const ( PhasePending = "PENDING" PhaseRunning = "RUNNING" PhaseDone = "DONE" )
Also we edit the existing AtSpec
and AtStatus
structs as follows:
type AtSpec struct { Schedule string `json:"schedule,omitempty"` Command string `json:"command,omitempty"` } type AtStatus struct { Phase string `json:"phase,omitempty"` }
We save this as we run
$ make manifests go: creating new go.mod: module tmp go: found sigs.k8s.io/controller-tools/cmd/controller-gen in sigs.k8s.io/controller-tools v0.2.5 /Users/ishankhare/godev/bin/controller-gen "crd:trivialVersions=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases $ make install # this will install the generated CRD into k8s
We should now be able to see an exhaustive CustomResourceDefinition
generated for us at config/crd/bases/cnat.ishankhare.dev_ats.yaml
. For people new to CRD
's, they will basically help kubernetes identify what our kind: At
means. Unlike builtin resource types like Deployments
, Pods
etc. we need to define custom types for kubernetes through a CustomResourceDefinition
. Let's try to create an At
resource:
$ cat >sample-at.yaml<<EOF apiVersion: cnat.ishankhare.dev/v1alpha1 kind: At metadata: name: at-sample spec: schedule: "2020-11-16T10:12:00Z" command: "echo hello world!" EOF
If we now do a kubectl apply -f
to the previously shown CRD:
$ kubectl apply -f sample-at.yaml $ kubectl get at NAME AGE at-sample 124m
So now kubernetes recognizes our custom type, but it still does NOT know what to do with it. The logic part of the controller is still missing, but we have the skeleton ready to start writing our logic around it.
Now we're coming to the fun part – implementing the logic for our operator. This will basically tell kubernetes what to do with our generated CRD. As few say its like "making kubernetes do tricks!".
Most of the scaffolding as been already generated for us by kubebuilder in main.go
and controllers/
. Inside the controllers/at_controller.go
file we have the function Reconcile
. Consider this is the reconcile loop's body.
A small refresher – a controller basically runs a periodic loop over the said objects and watches for any changes to those objects. When changes happen the loop body is triggered and the relevant logic can be executed. This function Reconcile
is that logical piece, the entry-point for all our controller logic.
Let's start adding a basic structuring around it:
func (r *AtReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) { reqLogger := r.Log.WithValues("at", req.NamespacedName) reqLogger.Info("=== Reconciling At") instance := &cnatv1alpha1.At{} err := r.Get(context.TODO(), req.NamespacedName, instance) if err != nil { if errors.IsNotFound(err) { // object not found, could have been deleted after // reconcile request, hence don't requeue return ctrl.Result{}, nil } // error reading the object, requeue the request return ctrl.Result{}, err } // if no phase set, default to Pending if instance.Status.Phase == "" { instance.Status.Phase = cnatv1alpha1.PhasePending } // state transition PENDING -> RUNNING -> DONE return ctrl.Result{}, nil }
Here we basically create a logger and an empty instance of the At
object, which will allow use to query kubernetes for existing At
objects and read/write to their status etc. The function is supposed to return the result of the reconciliation logic and an error. The ctrl.Result
object can optionally tell the reconciler to either Requeue
immediately (bool) or RequeueAfter
with a time.Duration
– again these are optional.
Now lets come to the important part, the state diagram. The image below explains what basically happens when we create a new At
:
Now, lets start implementing this logic in the function:
... // state transition PENDING -> RUNNING -> DONE switch instance.Status.Phase { case cnatv1alpha1.PhasePending: reqLogger.Info("Phase: PENDING") diff, err := schedule.TimeUntilSchedule(instance.Spec.Schedule) if err != nil { reqLogger.Error(err, "Schedule parsing failure") return ctrl.Result{}, err } reqLogger.Info("Schedule parsing done", "Result", fmt.Sprintf("%v", diff)) if diff > 0 { // not yet time to execute, wait until scheduled time return ctrl.Result{RequeueAfter: diff * time.Second}, nil } reqLogger.Info("It's time!", "Ready to execute", instance.Spec.Command) // change state instance.Status.Phase = cnatv1alpha1.PhaseRunning
The schedule.TimeUntilSchedule
function is simple to implement like so:
package schedule import "time" func TimeUntilSchedule(schedule string) (time.Duration, error) { now := time.Now().UTC() layout := "2006-01-02T15:04:05Z" scheduledTime, err := time.Parse(layout, schedule) if err != nil { return time.Duration(0), err } return scheduledTime.Sub(now), nil }
The next case
in our switch
body is for RUNNING
phase. This is the most complicated one and I've added comments wherever I can to explain it better.
case cnatv1alpha1.PhaseRunning: reqLogger.Info("Phase: RUNNING") pod := spawn.NewPodForCR(instance) err := ctrl.SetControllerReference(instance, pod, r.Scheme) if err != nil { // requeue with error return ctrl.Result{}, err } query := &corev1.Pod{} // try to see if the pod already exists err = r.Get(context.TODO(), req.NamespacedName, query) if err != nil && errors.IsNotFound(err) { // does not exist, create a pod err = r.Create(context.TODO(), pod) if err != nil { return ctrl.Result{}, err } // Successfully created a Pod reqLogger.Info("Pod Created successfully", "name", pod.Name) return ctrl.Result{}, nil } else if err != nil { // requeue with err reqLogger.Error(err, "cannot create pod") return ctrl.Result{}, err } else if query.Status.Phase == corev1.PodFailed || query.Status.Phase == corev1.PodSucceeded { // pod already finished or errored out` reqLogger.Info("Container terminated", "reason", query.Status.Reason, "message", query.Status.Message) instance.Status.Phase = cnatv1alpha1.PhaseDone } else { // don't requeue, it will happen automatically when // pod status changes return ctrl.Result{}, nil }
We set the ctrl.SetControllerReference
that basically tells kubernetes runtime that the created Pod
is "owned" by this At
instance. This is later going to come handy for us when watching on resources created by our Controller.
We also use another external function here spawn.NewPodForCR
which is basically responsible for spawning the new Pod
for our At
custom resource spec. This is implemented as follows:
package spawn import ( cnatv1alpha1 "cnat/api/v1alpha1" "strings" corev1 "k8s.io/api/core/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" ) func NewPodForCR(cr *cnatv1alpha1.At) *corev1.Pod { labels := map[string]string{ "app": cr.Name, } return &corev1.Pod{ ObjectMeta: metav1.ObjectMeta{ Name: cr.Name, Namespace: cr.Namespace, Labels: labels, }, Spec: corev1.PodSpec{ Containers: []corev1.Container{ { Name: "busybox", Image: "busybox", Command: strings.Split(cr.Spec.Command, " "), }, }, RestartPolicy: corev1.RestartPolicyOnFailure, }, } }
The last bits of code deal with the DONE
phase and the default
case of switch
. We also do the updation of our status
subresource outside the switch as a common part of code:
case cnatv1alpha1.PhaseDone: reqLogger.Info("Phase: DONE") // reconcile without requeuing return ctrl.Result{}, nil default: reqLogger.Info("NOP") return ctrl.Result{}, nil } // update status err = r.Status().Update(context.TODO(), instance) if err != nil { return ctrl.Result{}, err } return ctrl.Result{}, nil
Lastly we utilize the SetControllerReference
by adding the following to the SetupWithManager
function in the same file:
err := ctrl.NewControllerManagedBy(mgr). For(&cnatv1alpha1.At{}). Owns(&corev1.Pod{}). Complete(r) if err != nil { return err } return nil
The part specifying Owns(&corev1.Pod{})
is important and tells the controller manager that pods created by this controller also needs to be watched for changes.
If you want to have a look at the entire final code implementation, head over the this github repo – ishankhare07/kubebuilder-controller.
With this in place, we are now done with implementation of our controller, we can run and test it on a cluster in our current kube context using make run
$ make run go: creating new go.mod: module tmp go: found sigs.k8s.io/controller-tools/cmd/controller-gen in sigs.k8s.io/controller-tools v0.2.5 /Users/ishankhare/godev/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..." go fmt ./... go vet ./... /Users/ishankhare/godev/bin/controller-gen "crd:trivialVersions=true" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases go run ./main.go 2020-11-18T05:05:45.531+0530 INFO controller-runtime.metrics metrics server is starting to listen {"addr": ":8080"} 2020-11-18T05:05:45.531+0530 INFO setup starting manager 2020-11-18T05:05:45.531+0530 INFO controller-runtime.manager starting metrics server {"path": "/metrics"} 2020-11-18T05:05:45.531+0530 INFO controller-runtime.controller Starting EventSource {"controller": "at", "source": "kind source: /, Kind="} 2020-11-18T05:05:45.732+0530 INFO controller-runtime.controller Starting EventSource {"controller": "at", "source": "kind source: /, Kind="} 2020-11-18T05:05:46.232+0530 INFO controller-runtime.controller Starting Controller {"controller": "at"} 2020-11-18T05:05:46.232+0530 INFO controller-runtime.controller Starting workers {"controller": "at", "worker count": 1} 2020-11-18T05:05:46.232+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.232+0530 INFO controllers.At Phase: PENDING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.232+0530 INFO controllers.At Schedule parsing done {"at": "cnat/at-sample", "Result": "-14053h23m46.232658s"} 2020-11-18T05:05:46.232+0530 INFO controllers.At It's time! {"at": "cnat/at-sample", "Ready to execute": "echo YAY!"} 2020-11-18T05:05:46.436+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:46.443+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.443+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.718+0530 INFO controllers.At Pod Created successfully {"at": "cnat/at-sample", "name": "at-sample"} 2020-11-18T05:05:46.718+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:46.722+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.722+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.722+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:46.778+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.778+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.778+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:46.846+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:46.847+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:46.847+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:49.831+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:49.831+0530 INFO controllers.At Phase: RUNNING {"at": "cnat/at-sample"} 2020-11-18T05:05:49.831+0530 INFO controllers.At Container terminated {"at": "cnat/at-sample", "reason": "", "message": ""} 2020-11-18T05:05:50.054+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"} 2020-11-18T05:05:50.056+0530 INFO controllers.At === Reconciling At {"at": "cnat/at-sample"} 2020-11-18T05:05:50.056+0530 INFO controllers.At Phase: DONE {"at": "cnat/at-sample"} 2020-11-18T05:05:50.056+0530 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "at", "request": "cnat/at-sample"}
If you're interested to see the output command we can run:
$ kubectl logs -f at-sample hello world!