Kubernetes Single-Container ML Inference

Minikube Deployment with Service Tunneling

Deploying a single-container machine-learning inference service on Kubernetes using Minikube with service tunneling for local access. This project demonstrates production-like Kubernetes networking, service abstraction, and operational behavior for ML model serving.

Docker Image: rajesharigala/no-ui-placement-ml-model:v1-arm64 GitHub: Kube-config-minikube-local Tech Stack

Project Summary

Comprehensive Project Overview

Project Category

Kubernetes · MLOps · DevOps (Primary: Kubernetes Infrastructure & Deployment)

Industry/Domain

Information Technology (AI/ML Infrastructure & Platform Engineering)

Domain Focus

Machine Learning Inference Deployment on Kubernetes

Key Technologies & Concepts

Core Technologies Used

Kubernetes ML Inference Keywords

Kubernetes (Deployment, Service, Pod) Minikube (Local Kubernetes, Tunneling) Kubectl CLI (Cluster Interaction) Containerized ML Inference Docker Image (Docker Hub, Image Pull) Kubernetes YAML (Deployment.yaml, Service.yaml) Service Types (LoadBalancer, ClusterIP) Minikube Service Tunnel (Local Access) Namespace Management (Default, Custom) Resource Limits (CPU, Memory) ReplicaSet & Pod Lifecycle API-based Model Access (REST Inference) Local Development & Testing Environment Kubernetes Networking (Service → Pod Routing) Infrastructure as Code (Declarative Manifests)

Problem & Objective

What problem did this project solve?

Problems Solved

Running ML inference service reliably in local development with Kubernetes
Cloud-managed load balancers unavailable in local environment
Pods are ephemeral - direct container port access not stable
Need for production-like Kubernetes networking in development

Primary Objectives

Deploy containerized ML inference application as single-container Pod
Expose application using Kubernetes Service (LoadBalancer type)
Access externally through Minikube service tunneling
Ensure stable, production-like network access in local environment

Solution & Architecture

Architectural Overview

Solution Overview

The solution deploys a containerized ML inference application as a single-container Pod managed by a Kubernetes Deployment on a local Minikube cluster. The application is exposed using a Kubernetes Service (LoadBalancer type) and accessed externally through Minikube service tunneling.

Entire setup is defined using declarative Kubernetes YAML manifests, ensuring reproducibility, scalability readiness, and alignment with real-world Kubernetes infrastructure patterns.

This approach provides stable, production-like network access while running locally, with the ability to seamlessly transition to cloud environments.

1

Minikube Start

2

Apply Deployment

3

Create Service

4

Minikube Tunnel

5

Access API

Key Components

Kubernetes (Local Cluster via Minikube)
Minikube (Single-node Kubernetes runtime)
Docker (Container image runtime)
Kubernetes Deployment (Pod lifecycle management)
Kubernetes Service - LoadBalancer (Application exposure)
Minikube Service Tunneling (Local external access)
Kubectl CLI (Cluster and resource management)
Containerized ML Inference Application (REST API)

Scalability & Reliability: Although implemented as single-replica deployment, the solution uses Kubernetes Deployment enabling horizontal scaling by increasing replica count. Kubernetes ensures self-healing by automatically restarting failed Pods. Resource limits (CPU and memory) prevent container overuse.

AI/DevOps Details

MLOps Implementation & Automation

AI/ML Focus

DevOps & MLOps Infrastructure for Machine Learning Inference Deployment

Primary focus on Kubernetes-based deployment, networking, and operationalization
Emphasis on ML inference service rather than model training
Production-style model serving architecture

Automation & Orchestration

Pre-trained ML model containerized as REST-based API
Kubernetes-native operational workflows
Declarative Deployment and Service manifests
Automated Pod creation, lifecycle management, self-healing
Service discovery and load balancing

Tools & Technologies

Docker (Containerization of ML inference application)
Kubernetes (Container orchestration and workload management)
Minikube (Local Kubernetes cluster for development)
Kubectl (CLI for deploying and managing resources)
Docker Hub (Container image registry for image retrieval)

Monitoring & Optimization

Basic monitoring using kubectl commands (get pods, describe pod, logs)
Resource optimization via CPU and memory limits in Deployment
Foundation for production monitoring (Prometheus, Grafana)
Runtime visibility through container logs

Skills & Technologies Used

Technical Proficiency Demonstrated

Primary Skills

Kubernetes Deployment & Services — Intermediate
Container Orchestration (Pods, ReplicaSets, Networking) — Intermediate
Docker Containerization — Intermediate
Minikube (Local Kubernetes Operations) — Intermediate
Kubectl CLI & Resource Management — Intermediate
Infrastructure as Code (Kubernetes YAML) — Intermediate

Secondary Tools / Frameworks

Python (REST-based ML inference API)
Flask (Lightweight web framework for model serving)
Docker Hub (Container image registry)
Postman (API testing and validation)
Linux / macOS Terminal (CLI-based operations)

Programming Languages

Infrastructure as Code YAML configuration
Python (ML inference API development)
CLI commands (Minikube, Kubectl, GitHub)

Cloud & DevOps Tools

Kubernetes Minikube Docker Kubectl Docker Hub

Kubernetes YAML Manifests

Declarative Infrastructure as Code

Deployment Manifest

                            # Kubernetes Deployment Manifest

                            # API version for Deployments

                            apiVersion: apps/v1

                            kind: Deployment

                            metadata:

                              name: recruitment-rank-app

                            spec:

                              selector:

                                matchLabels:

                                  app: recruitment-rank-app

                              template:

                                metadata:

                                  labels:

                                    app: recruitment-rank-app

                                spec:

                                  containers:

                                  - name: placement-app

                                    image: rajesharigala/no-ui-placement-ml-model:v1-arm64

                                    resources:

                                      limits:

                                        memory: "128Mi"

                                        cpu: "500m"

                                    ports:

                                    - containerPort: 9696

Key Notes: This Deployment creates and manages Pods automatically. No replicas specified → defaults to 1. Labels connect Deployment ↔ Pods ↔ Service. Resource limits prevent node exhaustion. Port 9696 matches the Service targetPort.

Service Manifest

                            # Kubernetes Service Manifest

                            apiVersion: v1

                            kind: Service

                            metadata:

                              name: recruitment-rank-app

                            spec:

                              type: LoadBalancer

                              selector:

                                app: recruitment-rank-app

                              ports:

                              - protocol: "TCP"

                                port: 80

                                targetPort: 9696

Key Notes: Service provides stable IP and DNS name. Decouples access from ephemeral Pod IPs. Selector links Service → Pods using labels. port (80) is external-facing. targetPort (9696) is container's listening port. LoadBalancer works locally via Minikube tunneling.

Challenges & Outcomes

Technical challenges and resolutions

Technical Challenges

Exposing Kubernetes-hosted application locally without cloud-managed load balancer
Configuring Kubernetes Services, selectors, and port mappings correctly
Managing ephemeral nature of Pods for stable access
Diagnosing container startup and networking issues

Resolutions

Used Minikube service tunneling for local external access
Carefully aligned Service selectors with Pod labels and container ports
Deployed using Kubernetes Deployment for automatic Pod recreation
Debugged with kubectl logs, describe, and resource status checks

Why LoadBalancer Shows <pending> in Minikube?

Kubernetes assumes: "A cloud provider exists that can create an external load balancer."

AWS → ELB/ALB
Azure → Azure Load Balancer
GCP → Cloud Load Balancer

In Minikube, LoadBalancer services remain in <pending> state because there is no cloud provider to provision an external load balancer. Minikube solves this using service tunneling, which locally exposes the service while preserving production-like Kubernetes networking behavior.

Kubernetes Commands Reference

Essential kubectl and Minikube commands

minikube start

Starts the Minikube local Kubernetes cluster

minikube status

Checks Minikube cluster status

kubectl get all

Lists all Kubernetes resources in current namespace

kubectl get namespaces

Lists all namespaces in the cluster

kubectl create -f deployment.yaml

Creates resources from deployment manifest

kubectl create -f service.yaml

Creates service from service manifest

kubectl get deployments

Lists all deployments in current namespace

kubectl get services

Lists all services in current namespace

minikube service <service-name>

Opens service in browser via Minikube tunnel

kubectl logs <pod-name>

Shows logs from a specific pod

kubectl describe pod <pod-name>

Shows detailed information about a pod

kubectl delete deployment <name>

Deletes a deployment and its pods

kubectl delete service <name>

Deletes a service

minikube stop

Stops the Minikube cluster

Model Inference Results

Prediction examples from the deployed ML service

Successful Prediction

Prediction	Probability
True	0.88671875

High probability indicates strong prediction confidence for positive outcome.

Negative Prediction

Prediction	Probability
False	0.33984375

Lower probability indicates prediction against the positive outcome.

Accessing the Application

Access the deployed ML inference service through Minikube tunnel:

                        # Start Minikube tunnel for service access

                        minikube service recruitment-rank-app

This opens the service in your default browser with a local URL like 127.0.0.1:59256/predict

Postman Inference Method: The ML inference API can also be tested using Postman with JSON payloads sent to the service endpoint for prediction requests.

Architecture & YAML Mapping

Architecture to Kubernetes YAML construct mapping

Architecture Block	Kubernetes YAML Construct
Client (Browser / Postman)	External consumer (outside cluster)
API Entry Point	Service
Service Type	spec.type: LoadBalancer
Service Port	spec.ports.port: 80
Target Container Port	spec.ports.targetPort: 9696
Traffic Routing	spec.selector
Stable Virtual IP	Service abstraction
Workload Controller	Deployment
Pod Lifecycle Management	Deployment
Pod Template	spec.template
Pod Labels	spec.template.metadata.labels
Selector Matching	spec.selector.matchLabels
Container Definition	spec.template.spec.containers
Container Image	containers.image
Resource Limits	containers.resources.limits
Application Port	containers.ports.containerPort
Self-Healing	Deployment (ReplicaSet)
Local LoadBalancer Access	minikube service <service-name>

Assets & References

Code, diagrams, study material

GitHub Repository

Kubernetes configuration files, manifests, and deployment scripts for the ML inference service.

Access Repository

Study Material Resources

Click the button below to open the study materials

Request Study Material