Dockerized ML Inference Service

Containerized Machine Learning Model Serving

Built and containerized a backend machine learning inference service using Flask and Gunicorn, with model invocation validated via terminal and Postman. This project demonstrates production-ready ML model serving through Docker containerization.

Project Summary

Comprehensive Project Overview

Project Category

Model Serving & Inference (Containerized Execution Layer)

Industry/Domain

Platform Engineering / MLOps Infrastructure

MLOps Focus

MLOps - Model Serving & Inference Systems

Key Technologies & Concepts

Core Technologies Used

Docker ML Inference Keywords

Docker Engine / Docker Desktop Dockerfile (Image Blueprint) Python Slim Base Image Image Build (docker build) Image Tagging & Versioning (v1) Docker Images & Layers Container Runtime (docker run) Port Mapping & Networking Environment Variables (ENV) ENTRYPOINT vs CMD Gunicorn (Production WSGI Server) Flask API (Model Inference Endpoint) Pickle Model Serialization DictVectorizer (Feature Transformation) Stateless Model Serving REST API Invocation Terminal-based Invocation Postman API Testing Docker Registry (Docker Hub)

Problem & Objective

What problem did this project solve?

Problems Solved

Machine learning model inference was not production-ready due to dependency coupling
Inconsistent runtime environments across development and deployment
Lack of standardized API for invoking predictions

Primary Objectives

Package a trained machine learning model and its inference logic into a reproducible Docker container
Expose a stable API for consistent prediction serving across environments
Enable model invocation through multiple methods (terminal scripts, Postman)

Solution & Architecture

Architectural Overview

Solution Overview

The solution involves packaging a pre-trained ML model with a Flask-based REST API inside a Docker container. This creates a portable, consistent inference service that can be deployed anywhere Docker runs.

Docker containerization ensures reproducibility and eliminates "works on my machine" issues by encapsulating all dependencies, runtime, and the model itself in a single deployable unit.

The container exposes a REST API endpoint for prediction requests and can be invoked via multiple methods including terminal scripts and Postman. The image is stored in Docker Hub for distribution.

Dockerized ML Inference Architecture Diagram

1

Source Code + Model

2

Dockerfile

3

Docker Image

4

Container Runtime

5

Inference API

Key Components

Python-based Machine Learning Model (pre-trained classification model)
Flask API with inference endpoint (/predict)
Gunicorn as production WSGI server
Dockerfile for image building
Docker Image & Container for execution
Docker Engine / Docker Desktop for container runtime
Docker Hub for container registry and distribution

Scalability & Reliability: Stateless model-serving container enables horizontal scaling when deployed under an orchestrator. Gunicorn provides reliable, multi-worker request handling for concurrent inference requests.

Skills & Technologies Used

Technical Proficiency Demonstrated

Primary Skills

Docker & Containerization - Intermediate
Machine Learning Model Serving - Intermediate
Flask-based REST API Development - Intermediate
Python Runtime & Dependency Management - Intermediate

Secondary Tools / Frameworks

Gunicorn (Production WSGI Server)
Postman (API Testing & Validation)
Pickle (Model Serialization)
DictVectorizer (Feature Encoding)
Linux Command Line

Programming Languages

Python (Flask API, model inference logic)
Dockerfile configuration syntax

Docker & DevOps Tools

Docker Engine Docker CLI Docker Hub (Container Registry) Containerized Execution

AI/DevOps Details

MLOps Implementation & Automation

AI/ML Focus

AI/ML-focused project - containerized machine learning model inference service, emphasizing production-style model serving rather than model training.

Deployed a pre-trained machine learning classification model serialized using Pickle
Implemented automated inference workflow via Flask-based REST API
Focus on on-demand prediction requests and model serving

CI/CD & Containerization

Docker Engine / Docker Desktop for container runtime
Docker CLI for image building and container management
Docker Hub as container registry for distribution
Containerized execution ready for CI/CD integration

Monitoring & Optimization

Application and inference logs accessed via Docker container logs for runtime visibility
Optimized container size and startup time using Python slim base image
Stateless architecture for potential horizontal scaling

Dockerfile Execution Steps

Detailed containerization process

Dockerfile Implementation

Production-Grade Implementation: The Dockerfile follows best practices for ML model serving including minimal base image, explicit dependency management, and production-ready server configuration.

Step	Dockerfile Instruction	Purpose
1. Base Image	FROM python:3.8.12-slim	Python slim base image for minimal container size
2. Working Directory	WORKDIR /app	Set working directory inside container
3. Copy Dependencies	COPY ["Pipfile", "Pipfile.lock", "."]	Copy dependency management files
4. Install Dependencies	RUN pipenv install --deploy --system	Install Python dependencies
5. Copy Code + Model	COPY ["*.py", "project_one_model.pkl", "./"]	Copy source code and serialized ML model
6. Expose Port	EXPOSE 9696	Expose API port for external access
7. Entrypoint Command	ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "predict:app"]	Production server command (not overridable)

                        FROM python:3.8.12-slim

                        WORKDIR /app

                        COPY ["Pipfile", "Pipfile.lock", "."]

                        RUN pipenv install --deploy --system

                        COPY ["*.py", "project_one_model.pkl", "./"]

                        EXPOSE 9696

                        ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "predict:app"]

Challenges & Outcomes

Technical challenges and resolutions

Technical Challenges

Packaging a machine learning model and its dependencies into a lightweight, production-ready Docker image
Ensuring the inference service was accessible from outside the container via correct network binding and port configuration
Managing model serialization and feature transformation consistency

Resolutions & Outcomes

Used Python slim base image with explicit dependency installation for lightweight container
Configured application to bind to 0.0.0.0 and exposed correct port in both application and Dockerfile
Successfully created reproducible, portable ML inference service
Achieved consistent model invocation across different environments

Docker Commands Reference

Essential Docker CLI commands used in the project

docker version

Displays Docker client, server, and engine version details

docker images

Lists all Docker images available locally

docker build -t <image:tag>

Builds a Docker image from Dockerfile in current directory

docker pull <image:tag>

Pulls an image from a container registry

docker run -it -p 9696:9696 <image>

Creates and runs a container interactively with port mapping

docker run -d -p 9696:9696 <image>

Runs a container in detached mode (background)

docker run -d --name ml-app -p 9696:9696 <image>

Runs container with custom name and port mapping

docker ps

Lists currently running containers

docker ps -a

Lists all containers (running and stopped)

docker stop <container-id>

Stops a running container

docker logs <container-id>

Fetches logs from a container

docker exec -it <container-id> /bin/bash

Opens interactive shell inside running container

docker login

Authenticates with Docker Hub or registry

docker push <repo/image:tag>

Pushes Docker image to container registry

Model Invocation Examples

How to access and use the inference service

Terminal Invocation

Python script to invoke the model via REST API:

                            import requests

                            candidate = [{

                              "gender": "M",

                              "ssc_p": 41.0,

                              "ssc_b": "Central",

                              ... (other features)

                            }]

                            url = "http://localhost:9696/predict"

                            response = requests.post(url=url, json=candidate)

                            if response.status_code == 200:

                              output = response.json()

                              print(f"Candidate evaluation output: {output}")

Flask API Endpoint

Key Flask application code for the inference endpoint:

                            # Load the serialized model

                            with open("project_one_model.pkl", "rb") as f_in:

                              dv, model = pickle.load(f_in)

                            # Create Flask app

                            app = Flask('Predict')

                            # Define prediction endpoint

                            @app.route('/predict', methods=['POST'])

                            def predict():

                              candidate = request.get_json()

                              X = dv.transform(candidate)

                              y_pred = model.predict_proba(X)[0, 1]

                              placement = y_pred > 0.5

                              return jsonify({

                                'placement': bool(placement),

                                'placement_probability': float(y_pred)

                              })

Assets & References

Code, diagrams, study material

GitHub Repository

Source code repository containing the Dockerfile, Flask API, and ML model for the inference service.

Access Repository

Study Material Resources

Click the button below to open the study materials

Request Study Material