Introduction to MLOps with Kubernetes

Soft introduction to kubernetes

Apr 13, 2025

Start writing today. Use the button below to create a Substack of your own

Mastering the transition from development to production is a key milestone in your journey as an ML engineer.

That means knowing how to take your Python code from your local machine (your development environment) and deploy it to your company’s production infrastructure — whether that's AWS Lambda, ECS, Cloud Functions, Kubernetes, or another platform.

This skill is incredibly valuable because, in real-world machine learning and LLM development:

90% of the work is engineering — setting up reliable workflows for developing, testing, deploying, and monitoring your code (like fine-tuning scripts, inference services, or vLLM servers).
Only 10% is the actual science. Sure, Kaggle notebooks are great for experimentation and prototyping, but they represent a small slice of the full project lifecycle.

If you understand the science and master the engineering, you're unstoppable.

Today, I’ll walk you through the essential steps for deploying a Python application from your local environment into a Kubernetes cluster.

What is Kubernetes?

Kubernetes is the industry-standard platform for container orchestration. It helps you deploy, scale, and manage both:

Jobs – e.g., model training pipelines or fine-tuning scripts that produce specialized LLMs.
Services – e.g., real-time inference APIs using FastAPI, or scalable LLM servers like vLLM.

Think of Kubernetes as the autopilot for your applications.
You define what you need — the destination and a few rules — and Kubernetes takes care of navigating the complexity to get there and keep things running smoothly.

For example, you might say:

"I want three instances of my app running at all times."
"Allocate this amount of CPU and memory."

Kubernetes handles the rest.

Now, let me walk you through how to deploy your first application to Kubernetes — step by step.

Grab the code here

Steps

Install the tools
Create a local Kubernetes cluster with `kind`
Write the business logic of your app
Containerise your app with Docker
Build the Docker image and run it locally (optional)
Push the Docker image to the local Kubernetes cluster
Deploy the app as a Kubernetes service
Test it works
Run the whole thing in one go

Step 1 > Install the tools

uv to create the project and manage Python dependencies.
Docker to build and run docker images.
Kind to create a local Kubernetes cluster.
kubectl to interact with the Kubernetes cluster.

Step 2 > Create a local Kubernetes cluster with kind

We will use kind to create a local Kubernetes cluster. It will be a simple cluster that will run entirely on your machine, using as Kubernetes nodes simple Docker containers.

A local cluster like the one we are creating here is useful for development and CI pipelines, where you need a minimal cluster to run integration tests for your applications.

We will create a cluster consisting of

1 control plane node -> where the core Kubernetes components run
2 worker nodes -> where the apps we will deploy will run.

The configuration file for the cluster is the following:

# kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "CriticalAddonsOnly=true,eks-k8s-version=1.29"

  - role: worker
    kubeadmConfigPatches:
      - |
        kind: JoinConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "CriticalAddonsOnly=true,eks-k8s-version=1.29"

  - role: worker
    labels:
      "CriticalAddonsOnly": "true"
      "eks-k8s-version": "1.29"

Create the cluster with the name you want (e.g. cluster-123) using the above configuration:

kind create cluster --config kind.yaml --name cluster-123

Set the kubectl context to the cluster we just created, so you can interact with the cluster using kubectl:

kubectl config use-context kind-cluster-123

Get the list of nodes in the cluster:

kubectl get nodes

NAME                        STATUS   ROLES           AGE   VERSION
cluster-123-control-plane   Ready    control-plane   15m   v1.32.2
cluster-123-worker          Ready    <none>          14m   v1.32.2
cluster-123-worker2         Ready    <none>          14m   v1.32.2

Step 3 > Write the business logic of your app

In this case, we will create a simple FastAPI app that returns the current time when you hit the /health endpoint.

We will use uv to create the project, which is the most ergonomic way to create and package your Python code.

Create the boilerplate code with:
```
uv init sample_api
```
Add FastAPI to the project:
```
uv add fastapi[standard]
```
Rename the hello.py file to api.py and copy this code:
```
from fastapi import FastAPI
from datetime import datetime

app = FastAPI()

@app.get('/health')
async def health():
    return {
        'status': 'healthy',
        'timestamp': datetime.now().isoformat()
    }
```
Step 4 > Containerise your app with Docker
We write a multi-stage Dockerfile to reduce the final image size.
It has 2 stages:
- builder -> where we install the project dependencies with uv and copy the code
- runner -> where we run the FastAPI app
Step 5 > Build the Docker image and run it locally (optional)
To build the image, run the following command:

docker build -t simple-api:v1.0.0 .

And to run it locally, run the following command:

docker run -it -p 5005:5000 simple-api:v1.0.0

Observe how we forward the container's port 5000 to the host's port 5005.

At this point, you should be able to hit the /health endpoint at http://localhost:5005/health and get the current time.

curl http://localhost:5005/health

{"status":"healthy","timestamp":"2025-02-21T15:22:45.453524"}

Let's look at concentrating on the real challenge Kubernetes cluster.

Step 6 > Push the Docker image to the local Kubernetes cluster

Before we can deploy our app to the cluster, we need to push the Docker image to the image registry of the cluster.

To do that, we will use the kind CLI to load the image into the cluster.

kind load docker-image simple-api:v1.0.0

Step 7 > Deploy the app as a Kubernetes service

Now that we have the image in the cluster, we can deploy the app as a Kubernetes service.

We will need to create 2 resources:

a deployment.yaml -> which I have gone ahead to define the pods that will run the app. In our case, we will have 3 replicas of the app.
a service.yaml -> which I have gone ahead to define how to access the app from outside the cluster

Don't worry about the manifests for now. Kubernetes YAML files are notoriously verbose and hard to read. And if you are scared of them, you are not alone. I am scared of them too.

To deploy the app, we use the kubectl CLI to apply the Kubernetes manifests:

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

You can check the status of the deployment with:

Step 8 > Test it

To test that the app is working, we can use the kubectl CLI to port-forward the service to our local machine:

kubectl port-forward svc/simple-api 5005:5000

In a production Kubernetes cluster, you don’t do port-forwarding to expose your services to outside traffic. Instead, you define an ingress manifest like this

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: simple-api-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.example.com  # Replace with your desired host
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: simple-api
            port:
              number: 5000

Wanna learn to build and deploy real-time ML systems to production?

We shall be running a course soon on how to engineer a project from scratch to production. It will entail how to design and create a project for monitoring live API sources