Dockerized ML Inference Service
Containerized Machine Learning Model Serving
Built and containerized a backend machine learning inference service using Flask and Gunicorn, with model invocation validated via terminal and Postman. This project demonstrates production-ready ML model serving through Docker containerization.
Project Summary
Comprehensive Project Overview
Project Category
Model Serving & Inference (Containerized Execution Layer)
Industry/Domain
Platform Engineering / MLOps Infrastructure
MLOps Focus
MLOps - Model Serving & Inference Systems
Key Technologies & Concepts
Core Technologies Used
Docker ML Inference Keywords
Problem & Objective
What problem did this project solve?
Problems Solved
- Machine learning model inference was not production-ready due to dependency coupling
- Inconsistent runtime environments across development and deployment
- Lack of standardized API for invoking predictions
Primary Objectives
- Package a trained machine learning model and its inference logic into a reproducible Docker container
- Expose a stable API for consistent prediction serving across environments
- Enable model invocation through multiple methods (terminal scripts, Postman)
Solution & Architecture
Architectural Overview
Solution Overview
The solution involves packaging a pre-trained ML model with a Flask-based REST API inside a Docker container. This creates a portable, consistent inference service that can be deployed anywhere Docker runs.
Docker containerization ensures reproducibility and eliminates "works on my machine" issues by encapsulating all dependencies, runtime, and the model itself in a single deployable unit.
The container exposes a REST API endpoint for prediction requests and can be invoked via multiple methods including terminal scripts and Postman. The image is stored in Docker Hub for distribution.
Key Components
- Python-based Machine Learning Model (pre-trained classification model)
- Flask API with inference endpoint (/predict)
- Gunicorn as production WSGI server
- Dockerfile for image building
- Docker Image & Container for execution
- Docker Engine / Docker Desktop for container runtime
- Docker Hub for container registry and distribution
Scalability & Reliability: Stateless model-serving container enables horizontal scaling when deployed under an orchestrator. Gunicorn provides reliable, multi-worker request handling for concurrent inference requests.
Skills & Technologies Used
Technical Proficiency Demonstrated
Primary Skills
- Docker & Containerization - Intermediate
- Machine Learning Model Serving - Intermediate
- Flask-based REST API Development - Intermediate
- Python Runtime & Dependency Management - Intermediate
Secondary Tools / Frameworks
- Gunicorn (Production WSGI Server)
- Postman (API Testing & Validation)
- Pickle (Model Serialization)
- DictVectorizer (Feature Encoding)
- Linux Command Line
Programming Languages
- Python (Flask API, model inference logic)
- Dockerfile configuration syntax
Docker & DevOps Tools
AI/DevOps Details
MLOps Implementation & Automation
AI/ML Focus
AI/ML-focused project - containerized machine learning model inference service, emphasizing production-style model serving rather than model training.
- Deployed a pre-trained machine learning classification model serialized using Pickle
- Implemented automated inference workflow via Flask-based REST API
- Focus on on-demand prediction requests and model serving
CI/CD & Containerization
- Docker Engine / Docker Desktop for container runtime
- Docker CLI for image building and container management
- Docker Hub as container registry for distribution
- Containerized execution ready for CI/CD integration
Monitoring & Optimization
- Application and inference logs accessed via Docker container logs for runtime visibility
- Optimized container size and startup time using Python slim base image
- Stateless architecture for potential horizontal scaling
Dockerfile Execution Steps
Detailed containerization process
Dockerfile Implementation
Production-Grade Implementation: The Dockerfile follows best practices for ML model serving including minimal base image, explicit dependency management, and production-ready server configuration.
| Step | Dockerfile Instruction | Purpose |
|---|---|---|
| 1. Base Image | FROM python:3.8.12-slim | Python slim base image for minimal container size |
| 2. Working Directory | WORKDIR /app | Set working directory inside container |
| 3. Copy Dependencies | COPY ["Pipfile", "Pipfile.lock", "."] | Copy dependency management files |
| 4. Install Dependencies | RUN pipenv install --deploy --system | Install Python dependencies |
| 5. Copy Code + Model | COPY ["*.py", "project_one_model.pkl", "./"] | Copy source code and serialized ML model |
| 6. Expose Port | EXPOSE 9696 | Expose API port for external access |
| 7. Entrypoint Command | ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "predict:app"] | Production server command (not overridable) |
WORKDIR /app
COPY ["Pipfile", "Pipfile.lock", "."]
RUN pipenv install --deploy --system
COPY ["*.py", "project_one_model.pkl", "./"]
EXPOSE 9696
ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "predict:app"]
Challenges & Outcomes
Technical challenges and resolutions
Technical Challenges
- Packaging a machine learning model and its dependencies into a lightweight, production-ready Docker image
- Ensuring the inference service was accessible from outside the container via correct network binding and port configuration
- Managing model serialization and feature transformation consistency
Resolutions & Outcomes
- Used Python slim base image with explicit dependency installation for lightweight container
- Configured application to bind to 0.0.0.0 and exposed correct port in both application and Dockerfile
- Successfully created reproducible, portable ML inference service
- Achieved consistent model invocation across different environments
Docker Commands Reference
Essential Docker CLI commands used in the project
Model Invocation Examples
How to access and use the inference service
Terminal Invocation
Python script to invoke the model via REST API:
candidate = [{
"gender": "M",
"ssc_p": 41.0,
"ssc_b": "Central",
... (other features)
}]
url = "http://localhost:9696/predict"
response = requests.post(url=url, json=candidate)
if response.status_code == 200:
output = response.json()
print(f"Candidate evaluation output: {output}")
Flask API Endpoint
Key Flask application code for the inference endpoint:
with open("project_one_model.pkl", "rb") as f_in:
dv, model = pickle.load(f_in)
# Create Flask app
app = Flask('Predict')
# Define prediction endpoint
@app.route('/predict', methods=['POST'])
def predict():
candidate = request.get_json()
X = dv.transform(candidate)
y_pred = model.predict_proba(X)[0, 1]
placement = y_pred > 0.5
return jsonify({
'placement': bool(placement),
'placement_probability': float(y_pred)
})
Assets & References
Code, diagrams, study material
GitHub Repository
Source code repository containing the Dockerfile, Flask API, and ML model for the inference service.
Access Repository