Kubernetes Single-Container ML Inference

Minikube Deployment with Service Tunneling

Deploying a single-container machine-learning inference service on Kubernetes using Minikube with service tunneling for local access. This project demonstrates production-like Kubernetes networking, service abstraction, and operational behavior for ML model serving.

Docker Image: rajesharigala/no-ui-placement-ml-model:v1-arm64 GitHub: Kube-config-minikube-local Tech Stack

Project Summary

Comprehensive Project Overview

Project Category

Kubernetes · MLOps · DevOps (Primary: Kubernetes Infrastructure & Deployment)

Industry/Domain

Information Technology (AI/ML Infrastructure & Platform Engineering)

Domain Focus

Machine Learning Inference Deployment on Kubernetes

Key Technologies & Concepts

Core Technologies Used

Kubernetes ML Inference Keywords

Kubernetes (Deployment, Service, Pod) Minikube (Local Kubernetes, Tunneling) Kubectl CLI (Cluster Interaction) Containerized ML Inference Docker Image (Docker Hub, Image Pull) Kubernetes YAML (Deployment.yaml, Service.yaml) Service Types (LoadBalancer, ClusterIP) Minikube Service Tunnel (Local Access) Namespace Management (Default, Custom) Resource Limits (CPU, Memory) ReplicaSet & Pod Lifecycle API-based Model Access (REST Inference) Local Development & Testing Environment Kubernetes Networking (Service → Pod Routing) Infrastructure as Code (Declarative Manifests)

Problem & Objective

What problem did this project solve?

Problems Solved

  • Running ML inference service reliably in local development with Kubernetes
  • Cloud-managed load balancers unavailable in local environment
  • Pods are ephemeral - direct container port access not stable
  • Need for production-like Kubernetes networking in development

Primary Objectives

  • Deploy containerized ML inference application as single-container Pod
  • Expose application using Kubernetes Service (LoadBalancer type)
  • Access externally through Minikube service tunneling
  • Ensure stable, production-like network access in local environment

Solution & Architecture

Architectural Overview

Solution Overview

The solution deploys a containerized ML inference application as a single-container Pod managed by a Kubernetes Deployment on a local Minikube cluster. The application is exposed using a Kubernetes Service (LoadBalancer type) and accessed externally through Minikube service tunneling.

Entire setup is defined using declarative Kubernetes YAML manifests, ensuring reproducibility, scalability readiness, and alignment with real-world Kubernetes infrastructure patterns.

This approach provides stable, production-like network access while running locally, with the ability to seamlessly transition to cloud environments.

kubernetes
1
Minikube Start
2
Apply Deployment
3
Create Service
4
Minikube Tunnel
5
Access API

Key Components

  • Kubernetes (Local Cluster via Minikube)
  • Minikube (Single-node Kubernetes runtime)
  • Docker (Container image runtime)
  • Kubernetes Deployment (Pod lifecycle management)
  • Kubernetes Service - LoadBalancer (Application exposure)
  • Minikube Service Tunneling (Local external access)
  • Kubectl CLI (Cluster and resource management)
  • Containerized ML Inference Application (REST API)

Scalability & Reliability: Although implemented as single-replica deployment, the solution uses Kubernetes Deployment enabling horizontal scaling by increasing replica count. Kubernetes ensures self-healing by automatically restarting failed Pods. Resource limits (CPU and memory) prevent container overuse.

AI/DevOps Details

MLOps Implementation & Automation

AI/ML Focus

DevOps & MLOps Infrastructure for Machine Learning Inference Deployment

  • Primary focus on Kubernetes-based deployment, networking, and operationalization
  • Emphasis on ML inference service rather than model training
  • Production-style model serving architecture

Automation & Orchestration

  • Pre-trained ML model containerized as REST-based API
  • Kubernetes-native operational workflows
  • Declarative Deployment and Service manifests
  • Automated Pod creation, lifecycle management, self-healing
  • Service discovery and load balancing

Tools & Technologies

  • Docker (Containerization of ML inference application)
  • Kubernetes (Container orchestration and workload management)
  • Minikube (Local Kubernetes cluster for development)
  • Kubectl (CLI for deploying and managing resources)
  • Docker Hub (Container image registry for image retrieval)

Monitoring & Optimization

  • Basic monitoring using kubectl commands (get pods, describe pod, logs)
  • Resource optimization via CPU and memory limits in Deployment
  • Foundation for production monitoring (Prometheus, Grafana)
  • Runtime visibility through container logs

Skills & Technologies Used

Technical Proficiency Demonstrated

Primary Skills

  • Kubernetes Deployment & Services — Intermediate
  • Container Orchestration (Pods, ReplicaSets, Networking) — Intermediate
  • Docker Containerization — Intermediate
  • Minikube (Local Kubernetes Operations) — Intermediate
  • Kubectl CLI & Resource Management — Intermediate
  • Infrastructure as Code (Kubernetes YAML) — Intermediate

Secondary Tools / Frameworks

  • Python (REST-based ML inference API)
  • Flask (Lightweight web framework for model serving)
  • Docker Hub (Container image registry)
  • Postman (API testing and validation)
  • Linux / macOS Terminal (CLI-based operations)

Programming Languages

  • Infrastructure as Code YAML configuration
  • Python (ML inference API development)
  • CLI commands (Minikube, Kubectl, GitHub)

Cloud & DevOps Tools

Kubernetes Minikube Docker Kubectl Docker Hub

Kubernetes YAML Manifests

Declarative Infrastructure as Code

Deployment Manifest

# Kubernetes Deployment Manifest
# API version for Deployments
apiVersion: apps/v1
kind: Deployment
metadata:
  name: recruitment-rank-app
spec:
  selector:
    matchLabels:
      app: recruitment-rank-app
  template:
    metadata:
      labels:
        app: recruitment-rank-app
    spec:
      containers:
      - name: placement-app
        image: rajesharigala/no-ui-placement-ml-model:v1-arm64
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
        ports:
        - containerPort: 9696

Key Notes: This Deployment creates and manages Pods automatically. No replicas specified → defaults to 1. Labels connect Deployment ↔ Pods ↔ Service. Resource limits prevent node exhaustion. Port 9696 matches the Service targetPort.

Service Manifest

# Kubernetes Service Manifest
apiVersion: v1
kind: Service
metadata:
  name: recruitment-rank-app
spec:
  type: LoadBalancer
  selector:
    app: recruitment-rank-app
  ports:
  - protocol: "TCP"
    port: 80
    targetPort: 9696

Key Notes: Service provides stable IP and DNS name. Decouples access from ephemeral Pod IPs. Selector links Service → Pods using labels. port (80) is external-facing. targetPort (9696) is container's listening port. LoadBalancer works locally via Minikube tunneling.

Challenges & Outcomes

Technical challenges and resolutions

Technical Challenges

  • Exposing Kubernetes-hosted application locally without cloud-managed load balancer
  • Configuring Kubernetes Services, selectors, and port mappings correctly
  • Managing ephemeral nature of Pods for stable access
  • Diagnosing container startup and networking issues

Resolutions

  • Used Minikube service tunneling for local external access
  • Carefully aligned Service selectors with Pod labels and container ports
  • Deployed using Kubernetes Deployment for automatic Pod recreation
  • Debugged with kubectl logs, describe, and resource status checks

Why LoadBalancer Shows <pending> in Minikube?

Kubernetes assumes: "A cloud provider exists that can create an external load balancer."

  • AWS → ELB/ALB
  • Azure → Azure Load Balancer
  • GCP → Cloud Load Balancer

In Minikube, LoadBalancer services remain in <pending> state because there is no cloud provider to provision an external load balancer. Minikube solves this using service tunneling, which locally exposes the service while preserving production-like Kubernetes networking behavior.

Kubernetes Commands Reference

Essential kubectl and Minikube commands

minikube start
Starts the Minikube local Kubernetes cluster
minikube status
Checks Minikube cluster status
kubectl get all
Lists all Kubernetes resources in current namespace
kubectl get namespaces
Lists all namespaces in the cluster
kubectl create -f deployment.yaml
Creates resources from deployment manifest
kubectl create -f service.yaml
Creates service from service manifest
kubectl get deployments
Lists all deployments in current namespace
kubectl get services
Lists all services in current namespace
minikube service <service-name>
Opens service in browser via Minikube tunnel
kubectl logs <pod-name>
Shows logs from a specific pod
kubectl describe pod <pod-name>
Shows detailed information about a pod
kubectl delete deployment <name>
Deletes a deployment and its pods
kubectl delete service <name>
Deletes a service
minikube stop
Stops the Minikube cluster

Model Inference Results

Prediction examples from the deployed ML service

Successful Prediction

Prediction Probability
True 0.88671875

High probability indicates strong prediction confidence for positive outcome.

Negative Prediction

Prediction Probability
False 0.33984375

Lower probability indicates prediction against the positive outcome.

Accessing the Application

Access the deployed ML inference service through Minikube tunnel:

# Start Minikube tunnel for service access
minikube service recruitment-rank-app

This opens the service in your default browser with a local URL like 127.0.0.1:59256/predict

Postman Inference Method: The ML inference API can also be tested using Postman with JSON payloads sent to the service endpoint for prediction requests.

Architecture & YAML Mapping

Architecture to Kubernetes YAML construct mapping

Architecture Block Kubernetes YAML Construct
Client (Browser / Postman) External consumer (outside cluster)
API Entry Point Service
Service Type spec.type: LoadBalancer
Service Port spec.ports.port: 80
Target Container Port spec.ports.targetPort: 9696
Traffic Routing spec.selector
Stable Virtual IP Service abstraction
Workload Controller Deployment
Pod Lifecycle Management Deployment
Pod Template spec.template
Pod Labels spec.template.metadata.labels
Selector Matching spec.selector.matchLabels
Container Definition spec.template.spec.containers
Container Image containers.image
Resource Limits containers.resources.limits
Application Port containers.ports.containerPort
Self-Healing Deployment (ReplicaSet)
Local LoadBalancer Access minikube service <service-name>

Assets & References

Code, diagrams, study material

GitHub Repository

Kubernetes configuration files, manifests, and deployment scripts for the ML inference service.

Access Repository

Study Material Resources

Click the button below to open the study materials

Request Study Material

Study Material - Kubernetes ML Inference

Kubernetes Deployment Architecture
Complete architecture diagram and setup guide for Kubernetes deployments and services
Download
Kubernetes YAML Configuration Guide
Official documentation and best practices for Kubernetes YAML manifests
Download
Minikube & Local Kubernetes Setup
Complete guide to setting up and using Minikube for local Kubernetes development
Download
Kubectl CLI Reference Guide
Comprehensive reference for kubectl commands and their usage patterns
Download
MLOps on Kubernetes Guide
Complete guide to deploying ML models on Kubernetes for production serving
Download
Kubernetes Networking & Services
Deep dive into Kubernetes networking, services, and ingress controllers
Download
Production Kubernetes Patterns
Enterprise architecture patterns for scalable Kubernetes deployments
Download
Infrastructure as Code Best Practices
Complete framework for implementing Infrastructure as Code with Kubernetes
Download