MLOps Deep Dive
Dan Sun
Before we start …
Technologies are ever changing and different people have different opinions. The goal here is to focus on the fundamentals which barely change over time (hopefully).
Terminologies in not just ML but in software development in general are overloaded. This discussion tries to:
(1) includes all searchable and accurate terms
(2) provides short yet precise definitions of these terms
(3) always sticks to one term while explaining a specific concept to avoid terminological confusion and ensure terminological consistency.
MLOps stands for Machine Learning Operations
MLOps = Data Engineering + Machine Learning + DevOps (e.g., MLInfra)
MLOps covers aspects related to automating, testing, managing, monitoring, and maintaining ML workflows in production environments.
It is a mess with regards to tons of toolings, practices, and industry standards. However, it is not that hard to figure out what to learn or look at if we clearly see the core components of ML systems.
Data Engineering
Knowing Modern Data Architecture (MDI, also referred to as Modern Data Stack (MDS)) is crucial as data is the building block of machine learning. What on earth is MDI or MDS then? Think of it as a collection of tools that allow you to do all possible actions you would want on your data.
(e.g., Spark Streaming/Kafka/Flink)
Machine Learning
This is more referred to as ML Experimentation where ML scientists/researchers do research to (1) perform EDA in an interactive notebook environments (e.g., Jupyter Notebook), (2) create prototype model architectures (e.g., a 2-layer LSTM model), (3) implement transformation and training routines.
DevOps - Automated Deployment & Serving
ML Deployment - release executable ML applications (for later ML Serving) from ML experimentation by pushing (deploying) them to pre-determined target envs (e.g., cloud-based servers, on-premise servers, edge devices). ML Serving - expose target envs to receive live traffic (e.g., unseen data, prediction requests) and trigger ML inference pipeline to generate predictions (inference results) in order to serve end users.
DevOps - Automated Monitoring
Monitoring ML systems include (1) model effectiveness (identifying different types of drifts before model performance degradation), and (2) model serving efficiency (for online serving - low latency, for offline serving - high throughput)