1 of 15

FeatureStore (s MLOps)

DATAMESH

06/2022 Jiří Steuer

1

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

2 of 15

Co je Feature item?

Jsou to zdrojová data, agregované data, prediktory, score, které jsou využívány nebo mají potenciál být využity při tvorbě vašich modelů

Jak se uvedené liší od dat/informací držených v datových skladech/data laku?

    • Jde o procenta data z datových skladu (<5%), u kterých víme nebo jsme si ověřili predictive power v rámci modelu

2

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

3 of 15

Co je FeatureStore?

To je nová vrstva mezi daty a modelem, která umožní sdílet, objevovat a používat features pro tvorbu AI/ML modelu (jde o integrální část MLOps řešení)

MLOps + FeatureStore

Kvalita dat, Čištění, Transformace, Nápočty prediktoru, atd.

Data

DWH/DataLake

HDFS/S3/…

Federated query

FeatureStore

Consumers

Next best offer, Risk based pricing

Exekuce modelu

Zdrojové aplikace

Raw Data

Feature

Enineering

3

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

4 of 15

Co je typicky součástí FeatureStoru?

Metarepository

    • On-line řešení (real-time exekuci ML/AI, real-time uložiště)
    • Off-line řešení (tvorba/testování/ladění modelu)

Historizace, verzování obsahu

FeatureStore je integrální součást MLOps se zaměřením na

    • Data ingestion (event streaming/processing, messaging)
    • CI/CD - Continuous Integration, Continuous Delivery
    • CT/CM - Continuous Training, Continuous Monitoring

4

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

5 of 15

Proč FeatureStore s MLOps? �Data science pohled

Time to market

    • Zkrácení přípravy dat (2 týdny, parsing dat)
    • Nasazování dat (2 týdny, deployment proces/testy)

Vylepšení výkonnosti modelu

    • Jednoduchá práce s novými zdroji dat (2-6 měsíce zpoždění, odstranění jednoho zdroje dat)
    • Sdílení modelu (přes oddělení/země, odstranění duplicitní práce, atd.)

Snížení závislosti na IT

    • Streamování dat jako default (2-8 týdny, bez IT prioritizace)
    • Velká část vývoje modelu bez velkého zapojení IT (data science self–service)
    • Odstraňování blockerů (jeden zdroj dat)

Real-time scénáře

    • Jednoduchý přechod do on-line
    • Zaměření se na chování klienta (uchování hodnot, complex event processing, atd.)

Kvality modelu

    • Monitoring modelu a posun/degradace modelu v čase

5

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

6 of 15

Proč FeatureStore s MLOps?�IT pohled

Snížení pracnosti na straně IT

    • Tvorba metamodelu (KV model) mimo IT
    • Podpora verzování, historizace

Oddělení tématu dat (data at rest) od tématu výpočetního clusteru

    • Jiná škálovatelnost z pohledu manipulace s dat vs výpočety dat
    • HA/SLA pro real-time

Dodání jako PaaS/SaaS

    • Minimalistický support ze strany IT

Performance

    • Scale-up/out pro data sciency (CPU/GPU, python, spark, dask, …)
    • On-prem/Cloud včetně možnosti exit strategie

Podpora dalších témat

    • Virtualizace dat/Federated query (Trino/Presto)
    • Hyperparametrizace
    • Adaptable ML

6

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

7 of 15

ANNEX

HISTORIE

7

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

8 of 15

with MLOps

8

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

9 of 15

Copyright © 2021 Accenture. All rights Reserved.

9

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

10 of 15

ANNEX

HISTORIE

10

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

11 of 15

Gartner Hyper Cycle | From 8/2021

11

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

12 of 15

Milestones | Feature Store topic evolution

MAY

IGUAZIO

MLRun

JUNE

IGUAZIO

Nuclio

12

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

13 of 15

ANNEX

CASE STUDIES & REFERENCES

13

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

14 of 15

References| Feature Store Case Studies from Iguazio

14

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.

15 of 15

References| Feature Store Real Deliveries

Hopsworks Sweedbank�Key Results:

  • Reduce the costs of alerts - Effective deep anomaly detection model that reduced the number of false-positives by 99% compared to a rule-based engine.
  • Faster Feature Engineering - Increased processing performance of data preparation and exploration, and building efficient feature pipelines that produced over 40TBs of feature data.
  • Faster Model Training - Distributed, multi-GPU training and parallel hyperparameter tuning.

Iguazio Ecolab�“Prior to implementing this solution, model deployment

times exceeded 12 months. Thanks to Iguazio and

Microsoft Azure, by 2020, these had been reduced to between 30 and 90 days.” Gregory Hayes, Data Science Director, Ecolab

Tecton Tide��Deploys Models 2x Faster with Tecton

  • 50% improvement
  • Time to deploy new models to production decreased to just 1 month from 2 – 4 months.
  • 9 months - Engineering time recovered from continuing to build internal feature store.
  • 3 engineers - Resources repurposed from maintaining internal feature store.

Tecton Atlassion��Atlassian has improved over 200,000 customer interactions per day with Tecton:

  • Accelerated time to build and deploy new features from 1-3 months to 1 day
  • Introduced management of ‘features as code’ and brought DevOps-like practices to feature engineering
  • Started building a central hub of production-ready features which will enable feature re-use and collaboration
  • Improved the prediction accuracy of existing models by 2%
  • Improved the accuracy of online features from 95%–97% accurate to 99.9% accurate
  • Freed up 2–3 FTE from maintaining the Atlassian feature store to focus on other priorities

Iguazio NetApp

  • 6-12x improvement in time to develop and deploy new AI services
  • 50% reduction in operating costs
  • 16x storage capacity reduction
  • 3-6x fewer compute nodes
  • End-to-end platform for analytics and AI

15

This item's classification is Internal. It was created by and is in property of the Home Credit Group. Do not distribute outside of the organization.