Flyte
Ketan Umare
Jan 24th, 2022
What is Flyte?
Kubernetes-native
Workflow Automation Platform
for Business-critical
Machine Learning and
Data Processes
at Scale
What is Flyte?
What is Flyte?
Define: Workflow
The sequence of industrial, administrative, or other processes through which a piece of work passes from initiation to completion.
Not quite a DAG!
Directed Acyclic Graphs imply no loops/repeats. But complex processes may have repetitions. Runtime is still a DAG.
What is Flyte?
Goal
Flyte Platform Services
User Team 1
User Team 2
Platform Team
Python, java, pytorch, pyspark, tensorflow, etc
K8s, Cloud, Auto-scaling, cluster management, resource management, integrations, performance, monitoring, logs, spark-core, golang, databases etc
Data-scientists, ml engineers, data engineers, product SWEs
Platform/Infra engineers, devOps, SRE
ML
Ops
Challenges of ML Orchestration
What Flyte solves.
Challenge 1
Develop Incrementally & Constantly Iterate @ Scale
Scale the Job
e.g. one region -> all regions, more GPUs
Start with one Job, run it locally
e.g. spark job, a training job, a query etc
Create a pipeline, test it locally
e.g. Fetch data -> train model -> calculate metrics
Execute the pipeline on demand, at scale
e.g. Run a pipeline with parameters
Run the pipeline on a schedule or in-response to an event
e.g. Run every hour
Retrieve results for jobs / pipelines
1
3
5
2
4
6
Challenge 2
Tame Infrastructure, Self-serve & Efficient
Challenge 3
Parameterize Executions & Dynamism
Input: x=m1
Input: x=n
Input: x=m2
Challenge 4
Memoize, Recover & Reproduce
Input: x=m1
Input: x=m1
Challenge 5
Collaboration & Organizational Scaling
critical/complex algorithm
PipelineA
PipelineB
dataA
dataB
Team A
Team B
PipelineC
dataC
Composite Pipeline
Challenge 6
Extend Simply
Flyte
Vendor A
Inhouse
Vendor B
Consistent API
Organizations want flexibility
Control costs (Migrate vendors, bring capabilities inhouse)
Users velocity and existing code should just work!
3
Users want flexibility
Add simple python extensions (Airflow operators)
Maybe only for their teams
1
Platform wants to keep adding new capabilities
Distributed training support, Spark, Streaming etc
Continue adding and controlling roll-out of features
2
Flytekit makes it easy to add new user customizations
Flyte also allows you to run just your own containers
Flyte backend plugins are independently deployed, maintained and are in the hosted service
Flyte control plane makes it possible to switch plugin associations and OSS makes it possible to migrate
User Level Concepts
Building Blocks: Tasks
Task
inputs
outputs
Workflows
inputs
Task
inputs
outputs
Workflow
inputs
outputs
Task
inputs
outputs
Task
inputs
outputs
Outputs
Write Your Code
@task(cache=True, cache_version=”1.0”)
def pay_multiplier(df: pandas.DataFrame, scalar: int) -> pandas.DataFrame:
df["col"] = 2 * df["col"]
return df
@task
def total_spend(df: pyspark.DataFrame) -> int:
return df.agg(F.sum("col")).collect()[0][0]
@workflow
def calculate_spend(emp_df: pandas.DataFrame) -> int:
return total_spend(df=pay_multiplier(df=emp_df, scalar=2))
# Execute
pay_multiplier(df=pandas.DataFrame())
calculate_spend(emp_df=pandas.DataFrame())
Get Ready to Scale!
@task(limits=Resources(cpu="2", mem="150Mi"))
def pay_multiplier(df: pandas.DataFrame, scalar: int) -> pandas.DataFrame:
df["col"] = 2 * df["col"]
return df
@task(task_config=Spark(
spark_conf={"spark.driver.memory": "1000M"}
), retries=2)
def total_spend(df: pyspark.DataFrame) -> int:
return df.agg(F.sum("col")).collect()[0][0]
@workflow
def calculate_spend(emp_df: pandas.DataFrame) -> int:
return total_spend(df=pay_multiplier(df=emp_df, scalar=2))
LaunchPlan.get_or_create(name="...",
workflow=calculate_spend,
schedule=FixedRate(duration=timedelta(minutes=10)),
notifications=[
Email(
phases=[WorkflowExecutionPhase.FAILED],
recipients_email=[...])]),
)
Workflow Modalities
@workflow - deferred evaluation: deferred to launch!
@dynamic - deferred to the “return” statement
@dynamic(cache=True, cache_version="0.1", limits=Resources(mem="600Mi"))
def parallel_fit_predict(
multi_train: typing.List[pd.DataFrame],
multi_val: typing.List[pd.DataFrame],
multi_test: typing.List[pd.DataFrame],
) -> typing.List[typing.List[float]]:
preds = []
for loc, train, val, test in zip(LOCATIONS, multi_train, multi_val, multi_test):
model = fit(loc=loc, train=train, val=val)
preds.append(predict(test=test, model_ser=model))
return preds
@workflow
def calculate_spend(emp_df: pandas.DataFrame) -> int:
return total_spend(df=pay_multiplier(df=emp_df, scalar=2))
Execute and interact
# Simply run the script from command line
pyflyte run my.mod:wf [--remote]
# Fetch & Create Execution
flytectl get tasks -p proj -d dev core.advanced.run_merge_sort.merge --version v2 --execFile execution_spec.yaml
flytectl create execution --execFile execution_spec.yaml -p proj -d dev --targetProject proj
# Fetch results
flytectl get execution -p proj -d dev oeh94k9r2r --details -o yaml
Jupyter Notebook
Golang cli
UX Overview - UI (rendered graphs)
UX Overview - UI (error traces)
UX Overview - Customized visualizations
@task
def t2() -> Annotated[pd.DataFrame, TopFrameRenderer(10)]:
return iris_df
Concepts
Projects | Logical grouping & tenant isolation |
Launch plans | Customize invocation behavior, multiple schedules, notifications |
Execute One task Independently | Build & Debug iteratively |
Static DAG compilation | Get errors before execution |
Language and Framework Independence | Code in python, java, scala. Execute arbitrary language code. |
Local Execution | Implement before you scale |
Programmable and inspectable | Retrieve & compare historical results. Create your own centralized artifact repository. |
API driven execution | On-Demand, Scheduled and event triggered Execution |
User Interaction Model
User Journey
Retrieve & Replay
Ideate & Iterate
Productionize
Write Code, Not YAML!
Leave the infrastructure to platform
{
"id": { "resourceType": 1, "project": "flytesnacks", "domain": "development", "name": "core.flyte_basics.hello_world.say_hello", "version": "v0.2.247" },
"type": "python-task",
"container": {
"args": [ "pyflyte-execute", "--inputs", "{{.input}}", "--output-prefix", "{{.outputPrefix}}", "--raw-output-data-prefix", "{{.rawOutputDataPrefix}}", "--resolver", "flytekit.core.python_auto_container.default_task_resolver", "--", "task-module", "core.flyte_basics.hello_world", "task-name", "say_hello" ],
"image": "ghcr.io/flyteorg/flytecookbook:core",
}
@task
def square(n: int) -> int:
return n * n
$ pyflyte --pkgs myapp.workflows package --image ghcr.io/flyteorg/flytecookbook:core
Why Registration?
Before Serialize
After Serialize
@task
def say_hello() -> str:
return "hello world"
@workflow
def my_wf() -> str:
res = say_hello()
return res
"id": { "resourceType": "WORKFLOW", "project": "flytesnacks", "domain": "development", "name": "core.flyte_basics.hello_world.my_wf", "version": "v0.2.247" },
"tasks": [{"template": { TaskSpec 1 }}]
"closure": { "compiledWorkflow": { "primary": { "template": {
}
}
….
$ pyflyte --pkgs myapp.workflows package --image ghcr.io/flyteorg/flytecookbook:core
$ docker build -t ghcr.io/flyteorg/flytecookbook:core .
$ docker push ghcr.io/flyteorg/flytecookbook:core
Serialize Package
GRPC / REST
"id": { "resourceType": "WORKFLOW", "project": "flytesnacks", "domain": "development", "name": "core.flyte_basics.hello_world.my_wf", "version": "v0.2.247" },
"tasks": [{"template": { TaskSpec 1 }}]
"closure": { "compiledWorkflow": { "primary": { "template": {
}
}
$ flytectl register files -p flytesnacks -d development –version v1.0.0 flyte-packages.tar.gz —archive
Process of Registration
Before Serialize
After Fast Serialize
@task
def say_hello() -> str:
return "hello flyte"
@workflow
def my_wf() -> str:
res = say_hello()
return res
"id": { "resourceType": "WORKFLOW", "project": "flytesnacks", "domain": "development", "name": "core.flyte_basics.hello_world.my_wf", "version": "v0.2.247" },
"tasks": [{"template": { TaskSpec 1 }}]
"closure": { "compiledWorkflow": { "primary": { "template": {
}
}
….
$ pyflyte --pkgs myapp.workflows package --fast --image ghcr.io/flyteorg/flytecookbook:core
Code artifact
Fast Serialize Package
GRPC
Upload
"id": { "resourceType": "WORKFLOW", "project": "flytesnacks", "domain": "development", "name": "core.flyte_basics.hello_world.my_wf", "version": "v0.2.247" },
"tasks": [{"template": { TaskSpec… }}]
"closure": { "compiledWorkflow": { "primary": { "template": {
}
}
$ flytectl register files -p flytesnacks -d development –version v1.0.0 flyte-packages.tar.gz —archive
Blob store
Code artifact
Fast Registration
Code artifact
Re-use container
Blob store
Swiss Army Knife: Sandbox Environment
$ flytectl demo start --version v0.19.1
MLOps meets DevOps
Domains + Registration: DevOps Power!
1
2
3
Domains + Registration: DevOps Power!
1
2
3
Domains + Registration: DevOps Power!
1
2
3
Extensibility & Flexibility
Why?
Extensibility & Flexibility
Use Cases for Extending Flyte
Extensibility & Flexibility
Use case
Python flytekit plugin
User container plugin
Prebuilt container plugin
Flytekit Type Transformer
Backend Plugin
Meta DSL on flytekit
K8s Plugin
WebAPI plugin
Fancy plugin
Golang? Multi-language support? High performance plugin
Python? Custom extensions, try before invest in backend plugin
Library, with user defined
extensions
Provide prebuilt container — plug & play
Custom specialized domain specific types
Write new experiences for your users
New language SDK
Flyte Service API
Contributions :D
Customized interface, no-code solutions etc
Extensibility & Flexibility
FlyteKit Type Transformers
Allows you to create domain specific types and let Flyte understand them. They can be configured to be auto-loaded.
FlyteKit-Only Task Plugins
Allows syntactic sugar to be provided like a library. Flyte executes a python container, so you can potentially do anything in Flytekit itself.
FlyteKit Data Persistence Plugins
Allows you to persist data to various stores by automatically using URIs, e.g., s3://, gcs://, bq://...
$ pip install flytekitplugins-*
$ pip install flytekitplugins-data-*
Extensibility & Flexibility
DSLs in Other Languages
Use core protobuf to write SDK in any new language!
JAVA/Scala already available (incubating); they are contributed by Spotify
case class GreetTaskInput(name: String)
case class GreetTaskOutput(greeting: String)
class GreetTask
extends SdkRunnableTask(
SdkScalaType[GreetTaskInput],
SdkScalaType[GreetTaskOutput]
) {
override def run(input: GreetTaskInput): GreetTaskOutput = GreetTaskOutput(s"Welcome, ${input.name}!")
}
object GreetTask {
def apply(name: SdkBindingData): SdkTransform =
new GreetTask().withInput("name", name)
}
Extensibility & Flexibility
Backend Plugins
True power of Flyte!
Powerful, stateful plugins: starting multiple containers for a cluster, calling external APIs, performing complex auth flows, etc.
Unified API across languages
Maintain easily: patch-fix without deploying code fixes to users
Migrate seamlessly
@task(
task_config=MPIJob(
num_workers=2,
num_launcher_replicas=1,
slots=1,
),
retries=3, cache=True, cache_version="0.1",
requests=Resources(cpu='1', mem="300Mi"),
limits=Resources(cpu='2'),
)
def horovod_train_task(batch_size: int, buffer_size: int, dataset_size: int) -> FlyteDirectory:
hvd.init()
...
But wait, what about platform folks?
Architecture overview
Flyte Component Layer Cake
Built by Platform Engineers for Platform Teams!
Serverless | Provide a serverless environment for your users (central service) |
Platform Builders | Extend & customize everything. |
Incremental and recoverable | Best in class support for memoization and complete recovery from any transient failures |
Low footprint | Backend written in performant Golang |
OAuth2 & SSO | Oauth2 and SSO support available natively in Open source |
Observe and Audit | Published monitoring dashboard templates, extensive documentation |
Isolated & Secure Execution | Execute with separate Permissions, administer and manage quotas, queues etc |
gRPC | Fully documented Service specification |
Features for the Platform Folks
Sneak Peek…
UnionML
The easiest way to build and deploy machine learning microservices
UnionML bridges the gap between research and production by providing a unified interface for datasets, model training, and batch, online, and streaming prediction.
Dataset & Model
Dataset & Model
Serve Seamlessly
Flyte Community
Flyte Community
Flyte Overview
https://github.com/flyteorg/flyte/discussions
https://groups.google.com/u/0/a/flyte.org/g/users
https://www.getrevue.co/profile/flyte
https://www.addevent.com/calendar/kE355955
Collaborators, Contributors & Integrations
Flyte Overview
The Open Source Community
Flyte team understands well that community is a key factor in building a successful open source project. The team gave me and many others first class onboarding experience which created trust to build a closer collaboration with them.
-- Pradithya Aria Pura, Tech Lead - ML Platform, GoJek
It is a privilege to be part of an open-source community as vibrant and committed as the one Ketan and team have built around Flyte. The diversity of problems the community is tackling, along with the commitment and engagement from the core team is something special, and the team here at Striveworks is excited to be a part of it.
-- Jim Rebesco, co-founder, Striveworks`
Flyte is quickly becoming a workhorse at Freenome. The regular cadence of high quality backward-compatible releases, quick turnarounds on bug fixes, realtime support on Slack, and a vibrant community of helpful power users across many industries have really supercharged our adoption.
-- Jeev Balakrishnan, Tech Lead ML Platform, Freenome
Contributing to flyte.org projects was an incredibly streamlined, supportive, and collaborative experience. Many open-source projects are difficult to work with due to lack of communication, long delays, and inflexibility about making larger changes and supporting new use cases. Haytham and Ketan were both extremely helpful as far as helping me workshop my original idea, giving quick feedback and adding me to the Github organization so I could run CI builds. They really encourage new contributors and new ideas while maintaining a product vision and high code quality.
-- Claire Mcginty, Engineer, Spotify
Recap
Roadmap
Observability
Reproducibility
Few more items
Roadmap