1 of 43

2 of 43

Sergio Méndez

Building MLOps POCs and Sandbox Environments Using K3s and Argo

3 of 43

About me

  • Operating Systems Professor
  • Cloud Native enthusiast
  • DevOps at Yalo
  • Cloud Native Guatemala Organizer
  • Linkerd Hero

Sergio Méndez

4 of 43

5 of 43

Basic Concepts

6 of 43

A POC is used to evaluate a solution in different contexts, effort, learning curve, costs, etc.

With a POC you can compare different solutions including technologies to solve a problem, in the way that you can determine using small experiments, which one is best.

POC(Proof of Concept)

7 of 43

Is an environment similar to production where you can run untrusted software without risk, is a controlled environment that can be used for testing, POCs with similar production data.

Sandbox Environment

8 of 43

“The word edge in this context means literal geographic distribution. Edge computing is computing that’s done at or near the source of the data, instead of relying on the cloud at one of a dozen data centers to do all the work. It doesn’t mean the cloud will disappear. It means the cloud is coming to you.”

Ref: https://www.theverge.com/circuitbreaker/2018/5/7/17327584/edge-computing-cloud-google-microsoft-apple-amazon

Edge Computing

9 of 43

Edge Computing Use Cases

  • Machine Learning
  • IoT Application
  • Data Processing
  • Games
  • Any workload

10 of 43

"DevOps is the collaboration between software developers and IT operations with the goal of automating the process of software delivery and infrastructure changes. It creates a culture and environment where building, testing and releasing software can happen rapidly, frequently, and more reliably.”

Ref: https://pivotal.io/de/cloud-native

DevOps

11 of 43

In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion. Some amount of buffer storage is often inserted between elements.

Ref: https://en.wikipedia.org/wiki/Pipeline_(computing)

Pipeline

12 of 43

Machine Learning Model Operationalization Management (MLOps), we want to provide an end-to-end machine learning development process to design, build and manage reproducible, testable, and evolvable ML-powered software.

Ref: https://ml-ops.org/

MLOps

13 of 43

Ref: INNOQ, https://ml-ops.org/

14 of 43

Differences between DevOps & MLOps

15 of 43

  • DevOps
    • Developers, Operators
      • SRE/DevOps/Cloud Engineers
  • MLOps
    • Data Scientists, Data/MLOps Engineers

Ref: https://ml-ops.org/

Different Users

16 of 43

  • DevOps
    • Github repositories, source files in multiple programming languages
  • MLOps
    • ML models, Datasets

Different Artifacts

17 of 43

  • DevOps
    • Frontend/Backend Frameworks, Languages, DBs
      • VueJS, Python, Java, MySQL, etc
  • MLOps
    • ML models, Data
      • Python, Scikit Learn, TensorFlow, Jupyter Notebooks,etc
      • Apache Spark, Hadoop, etc

Different Technologies

18 of 43

  • DevOps
    • QA for software, fast deployments, CI/CD
  • MLOps
    • Automate model generation that predicts or add business value and optimize model deployments for predictions

Different Goals

19 of 43

  • Static experiments, ready to be passed to a production environment, the code is not going to change and can be scheduled periodically
  • Dynamic experiments, under development by Data Scientists, the code constantly change and is unstable

MLOps Experiment Types

20 of 43

  • Package the logic
  • Freeze library versions
  • Portability for the experiments
  • Share across teams
  • GitOps implementation

Containers benefits for MLOps

21 of 43

MLOps you have to organize people, experiments and data in order to get valuable predictions

22 of 43

Technologies for

POCs and ML Pipelines

23 of 43

What is K3s

Ref: https://k3s.io/

Is a certified Kubernetes distribution built for IoT and Edge Computing.

24 of 43

Features

Ref: https://k3s.io/

  • Include everything in a single binary.
  • Support different backends like: sqlite3, MySQL and etcd.
  • Traefik as the default ingress.
  • Helm controller integrated.
  • Flannel for networks.
  • Containerd to manage the containers.

25 of 43

If you are using ARM devices you have to recompile your libraries and applications, but you will be ready for Edge Computing

26 of 43

Companies are starting to migrate their workloads into Edge Computing platforms to reduce costs.

Ref: DevOps y el camino de baldosas amarillas, José Juan Mora Pérez

27 of 43

Argo Workflows

What is Argo Workflows?

Ref: https://argoproj.github.io/projects/argo/

“Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD.”

28 of 43

Argo Workflows

Features

Ref: https://argoproj.github.io/projects/argo/

  • Each step is executed as a Container
  • Workflows declaration as a DAG (directed acyclic graph)
  • Kubernetes Native and Cloud Agnostic
  • CI/CD with less complexity

29 of 43

ArgoCD

What is ArgoCD?

Ref: https://argo-cd.readthedocs.io/en/stable/

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes.

30 of 43

Features

ArgoCD

Ref: https://argo-cd.readthedocs.io/en/stable/

ArgoCD supports the next configuration formats:

  • Kustomize
  • helm charts
  • ksonnet
  • jsonnet files
  • YAML/json manifests
  • Custom configurations with plugins

31 of 43

Don’t forget Argo Events and Rollouts

32 of 43

33 of 43

  • Apache Airflow
  • Apache Beans
  • Luigi
  • MLFlow
  • Other ML solutions

Argo could replace

34 of 43

Or it’s a good complement for your current solution

35 of 43

  • Spend less money for POCs
  • Ready for Edge Computing and IoT
  • Cloud Agnostic
  • Lightweight and high scalable
  • No Vendor Lock-in using open source

Benefits

36 of 43

Demonstration

37 of 43

  • K3s Installation with Argo Workflows and ArgoCD
  • Machine Learning pipeline using Argo Workflows and Argo CD
  • Code that you can use as a quickstart

On this demo you will see

38 of 43

Demo Architecture

1. ETL

2. Model Training

3. Model Deploy

4.Inference

Argo

Workflows

ArgoCD

Model Serve

Deployment

InputCSV

Output

CSV

RAW

Predictions

InputCSV

Processed

scores.model

http://mlops.tk/model1/predict

request

response

39 of 43

Resources

  • https://k3s.io/
  • https://argoproj.github.io/projects/argo/
  • https://argo-cd.readthedocs.io/en/stable/

40 of 43

Slides:

https://b.link/KubeconEU2021-k3s-argo

41 of 43

Repository:

https://github.com/sergioarmgpl�/mlops-argo-k3s

42 of 43

Personal email

sergioarm.gpl@gmail.com

Social networks

@sergioarmgpl

43 of 43

Thanks