Celery and Kubernetes for a fast, scalable and robust workflow orchestration�
Param Rajani
SDE2 @ GoDaddy
2 of 28
About Me
Working at GoDaddy
3 years of experience in data engineering
Built products in Fintech and GenAI domains
Designed and maintained product architectures on both Serverless and Kubernetes
3 of 28
Pretext, Lets Talk Orchestration
Define Logical Steps
Breakdown Complex Use cases
Error Handling, Retries,
Build Data Pipelines and Logical Journeys
4 of 28
Celery: A Pythonic Take On Workflow Orchestration
Topics well Explore in this talk:
Intro to celery
Celery Canvas Workflows
Compute and scaling with K8s
Observability
5 of 28
What is celery?
6 of 28
Heres An Example. Classic Fire and Forget
7 of 28
The Working Of The Worker
8 of 28
Celery Is Much More Than A Background Worker
Celery Can Orchestrate complex workflows
Supports Chained, Parallel and Conditional Execution
Tasks can be routed to different workers based on resource needs
9 of 28
Celery Canvas Workflows�
10 of 28
11 of 28
12 of 28
Task Routing
13 of 28
14 of 28
CELERY + K8S
COMPUTE FLEXIBILITY AND SCALING
15 of 28
PAIRED WITH KUBERNETES
Deploy Each Worker as a separate POD
Flexibility with CPU and RAM Requirements
Flexibility with scaling Policies
16 of 28
Interesting Design Decisions
Group Multiple Tasks Into a Single worker
Less Control on Scaling
Easy To manage Scaling , �focused on a single queue
More Ram Consumption
Each Task Independent Worker
More Controlled Scaling
17 of 28
KUBERNETES DEPLOYMENT FILE
18 of 28
OBSERVABILITY BEST PRACTICES
19 of 28
Choosing The Right Concurrency Model
Prefork Worker Pool
Celery’s default pool is prefork. It forks separate OS processes for each worker.�This is great for CPU-bound tasks like:
Running pandas data transformations
ML model inference
Anything that eats up memory and processor cycles.
gevent: The I/O-bound multitasker
On the other hand, gevent is built for I/O-bound tasks. It uses greenlets — lightweight, cooperative threads that can handle many tasks at once, as long as they aren’t CPU-heavy.�This works best for workers that:
Just make decisions
Fetch data from APIs
Push messages around
Wait on queues or databases”
20 of 28
Task Queue Routing
21 of 28
Caching ML Models Celery
Celery workers cache models on startup → no per-task reload
Eliminates redundant I/O → saves seconds per task
Fully asynchronous + parallel on Kubernetes
Similar to avoiding cold starts in Lambda-style systems