Dockerized ML Inference Service

Containerized Machine Learning Model Serving

Built and containerized a backend machine learning inference service using Flask and Gunicorn, with model invocation validated via terminal and Postman. This project demonstrates production-ready ML model serving through Docker containerization.

Project Summary

Comprehensive Project Overview

Project Category

Model Serving & Inference (Containerized Execution Layer)

Industry/Domain

Platform Engineering / MLOps Infrastructure

MLOps Focus

MLOps - Model Serving & Inference Systems

Key Technologies & Concepts

Core Technologies Used

Docker ML Inference Keywords

Docker Engine / Docker Desktop Dockerfile (Image Blueprint) Python Slim Base Image Image Build (docker build) Image Tagging & Versioning (v1) Docker Images & Layers Container Runtime (docker run) Port Mapping & Networking Environment Variables (ENV) ENTRYPOINT vs CMD Gunicorn (Production WSGI Server) Flask API (Model Inference Endpoint) Pickle Model Serialization DictVectorizer (Feature Transformation) Stateless Model Serving REST API Invocation Terminal-based Invocation Postman API Testing Docker Registry (Docker Hub)

Problem & Objective

What problem did this project solve?

Problems Solved

  • Machine learning model inference was not production-ready due to dependency coupling
  • Inconsistent runtime environments across development and deployment
  • Lack of standardized API for invoking predictions

Primary Objectives

  • Package a trained machine learning model and its inference logic into a reproducible Docker container
  • Expose a stable API for consistent prediction serving across environments
  • Enable model invocation through multiple methods (terminal scripts, Postman)

Solution & Architecture

Architectural Overview

Solution Overview

The solution involves packaging a pre-trained ML model with a Flask-based REST API inside a Docker container. This creates a portable, consistent inference service that can be deployed anywhere Docker runs.

Docker containerization ensures reproducibility and eliminates "works on my machine" issues by encapsulating all dependencies, runtime, and the model itself in a single deployable unit.

The container exposes a REST API endpoint for prediction requests and can be invoked via multiple methods including terminal scripts and Postman. The image is stored in Docker Hub for distribution.

Dockerized ML Inference Architecture Diagram
1
Source Code + Model
2
Dockerfile
3
Docker Image
4
Container Runtime
5
Inference API

Key Components

  • Python-based Machine Learning Model (pre-trained classification model)
  • Flask API with inference endpoint (/predict)
  • Gunicorn as production WSGI server
  • Dockerfile for image building
  • Docker Image & Container for execution
  • Docker Engine / Docker Desktop for container runtime
  • Docker Hub for container registry and distribution

Scalability & Reliability: Stateless model-serving container enables horizontal scaling when deployed under an orchestrator. Gunicorn provides reliable, multi-worker request handling for concurrent inference requests.

Skills & Technologies Used

Technical Proficiency Demonstrated

Primary Skills

  • Docker & Containerization - Intermediate
  • Machine Learning Model Serving - Intermediate
  • Flask-based REST API Development - Intermediate
  • Python Runtime & Dependency Management - Intermediate

Secondary Tools / Frameworks

  • Gunicorn (Production WSGI Server)
  • Postman (API Testing & Validation)
  • Pickle (Model Serialization)
  • DictVectorizer (Feature Encoding)
  • Linux Command Line

Programming Languages

  • Python (Flask API, model inference logic)
  • Dockerfile configuration syntax

Docker & DevOps Tools

Docker Engine Docker CLI Docker Hub (Container Registry) Containerized Execution

AI/DevOps Details

MLOps Implementation & Automation

AI/ML Focus

AI/ML-focused project - containerized machine learning model inference service, emphasizing production-style model serving rather than model training.

  • Deployed a pre-trained machine learning classification model serialized using Pickle
  • Implemented automated inference workflow via Flask-based REST API
  • Focus on on-demand prediction requests and model serving

CI/CD & Containerization

  • Docker Engine / Docker Desktop for container runtime
  • Docker CLI for image building and container management
  • Docker Hub as container registry for distribution
  • Containerized execution ready for CI/CD integration

Monitoring & Optimization

  • Application and inference logs accessed via Docker container logs for runtime visibility
  • Optimized container size and startup time using Python slim base image
  • Stateless architecture for potential horizontal scaling

Dockerfile Execution Steps

Detailed containerization process

Dockerfile Implementation

Production-Grade Implementation: The Dockerfile follows best practices for ML model serving including minimal base image, explicit dependency management, and production-ready server configuration.

Step Dockerfile Instruction Purpose
1. Base Image FROM python:3.8.12-slim Python slim base image for minimal container size
2. Working Directory WORKDIR /app Set working directory inside container
3. Copy Dependencies COPY ["Pipfile", "Pipfile.lock", "."] Copy dependency management files
4. Install Dependencies RUN pipenv install --deploy --system Install Python dependencies
5. Copy Code + Model COPY ["*.py", "project_one_model.pkl", "./"] Copy source code and serialized ML model
6. Expose Port EXPOSE 9696 Expose API port for external access
7. Entrypoint Command ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "predict:app"] Production server command (not overridable)
FROM python:3.8.12-slim
WORKDIR /app
COPY ["Pipfile", "Pipfile.lock", "."]
RUN pipenv install --deploy --system
COPY ["*.py", "project_one_model.pkl", "./"]
EXPOSE 9696
ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "predict:app"]

Challenges & Outcomes

Technical challenges and resolutions

Technical Challenges

  • Packaging a machine learning model and its dependencies into a lightweight, production-ready Docker image
  • Ensuring the inference service was accessible from outside the container via correct network binding and port configuration
  • Managing model serialization and feature transformation consistency

Resolutions & Outcomes

  • Used Python slim base image with explicit dependency installation for lightweight container
  • Configured application to bind to 0.0.0.0 and exposed correct port in both application and Dockerfile
  • Successfully created reproducible, portable ML inference service
  • Achieved consistent model invocation across different environments

Docker Commands Reference

Essential Docker CLI commands used in the project

docker version
Displays Docker client, server, and engine version details
docker images
Lists all Docker images available locally
docker build -t <image:tag>
Builds a Docker image from Dockerfile in current directory
docker pull <image:tag>
Pulls an image from a container registry
docker run -it -p 9696:9696 <image>
Creates and runs a container interactively with port mapping
docker run -d -p 9696:9696 <image>
Runs a container in detached mode (background)
docker run -d --name ml-app -p 9696:9696 <image>
Runs container with custom name and port mapping
docker ps
Lists currently running containers
docker ps -a
Lists all containers (running and stopped)
docker stop <container-id>
Stops a running container
docker logs <container-id>
Fetches logs from a container
docker exec -it <container-id> /bin/bash
Opens interactive shell inside running container
docker login
Authenticates with Docker Hub or registry
docker push <repo/image:tag>
Pushes Docker image to container registry

Model Invocation Examples

How to access and use the inference service

Terminal Invocation

Python script to invoke the model via REST API:

import requests

candidate = [{
  "gender": "M",
  "ssc_p": 41.0,
  "ssc_b": "Central",
  ... (other features)
}]

url = "http://localhost:9696/predict"
response = requests.post(url=url, json=candidate)

if response.status_code == 200:
  output = response.json()
  print(f"Candidate evaluation output: {output}")

Flask API Endpoint

Key Flask application code for the inference endpoint:

# Load the serialized model
with open("project_one_model.pkl", "rb") as f_in:
  dv, model = pickle.load(f_in)

# Create Flask app
app = Flask('Predict')

# Define prediction endpoint
@app.route('/predict', methods=['POST'])
def predict():
  candidate = request.get_json()
  X = dv.transform(candidate)
  y_pred = model.predict_proba(X)[0, 1]
  placement = y_pred > 0.5
  return jsonify({
    'placement': bool(placement),
    'placement_probability': float(y_pred)
  })

Assets & References

Code, diagrams, study material

GitHub Repository

Source code repository containing the Dockerfile, Flask API, and ML model for the inference service.

Access Repository

Study Material Resources

Click the button below to open the study materials

Request Study Material

Study Material - Docker ML Inference

Dockerized ML Inference Architecture
Complete architecture diagram and setup guide for containerized ML model serving
Download
Dockerfile Best Practices Guide
Official documentation and best practices for Dockerfile configuration
Download
ML Model Serving with Flask & Gunicorn
Detailed guide to implementing production ML inference APIs
Download
Advanced Docker Configurations
Premium materials for multi-stage builds, security hardening, and optimization
Download
MLOps Model Deployment Guide
Complete guide to deploying ML models in production using containers
Download
Docker Security & Best Practices
Security guidelines and best practices for containerized applications
Download
Production ML Inference Patterns
Enterprise architecture patterns for scalable ML model serving
Download
Docker CLI Reference Guide
Complete reference for Docker commands and their usage
Download