1 of 28

Celery and Kubernetes for a fast, scalable and robust workflow orchestration�

Param Rajani

SDE2 @ GoDaddy

2 of 28

About Me

  • Working at GoDaddy
  • 3 years of experience in data engineering
  • Built products in Fintech and GenAI domains
  • Designed and maintained product architectures on both Serverless and Kubernetes

3 of 28

Pretext, Lets Talk Orchestration

  • Define Logical Steps
  • Breakdown Complex Use cases
  • Error Handling, Retries,
  • Build Data Pipelines and Logical Journeys

4 of 28

Celery: A Pythonic Take On Workflow Orchestration

  • Topics well Explore in this talk:
    • Intro to celery
    • Celery Canvas Workflows
    • Compute and scaling with K8s
    • Observability

5 of 28

What is celery?

6 of 28

Heres An Example. Classic Fire and Forget

7 of 28

The Working Of The Worker

8 of 28

Celery Is Much More Than A Background Worker

  • Celery Can Orchestrate complex workflows
  • Supports Chained, Parallel and Conditional Execution
  • Tasks can be routed to different workers based on resource needs

9 of 28

Celery Canvas Workflows�

10 of 28

11 of 28

12 of 28

Task Routing

13 of 28

14 of 28

CELERY + K8S

COMPUTE FLEXIBILITY AND SCALING

15 of 28

PAIRED WITH KUBERNETES

  • Deploy Each Worker as a separate POD
  • Flexibility with CPU and RAM Requirements
  • Flexibility with scaling Policies

16 of 28

Interesting Design Decisions

Group Multiple Tasks Into a Single worker

  • Less Control on Scaling
  • Easy To manage Scaling , �focused on a single queue
  • More Ram Consumption

Each Task Independent Worker

  • More Controlled Scaling

17 of 28

KUBERNETES DEPLOYMENT FILE

18 of 28

OBSERVABILITY BEST PRACTICES

19 of 28

Choosing The Right Concurrency Model

Prefork Worker Pool

  • Celery’s default pool is prefork. It forks separate OS processes for each worker.�This is great for CPU-bound tasks like:
  • Running pandas data transformations
  • ML model inference
  • Anything that eats up memory and processor cycles.

gevent: The I/O-bound multitasker

  • On the other hand, gevent is built for I/O-bound tasks. It uses greenlets — lightweight, cooperative threads that can handle many tasks at once, as long as they aren’t CPU-heavy.�This works best for workers that:
  • Just make decisions
  • Fetch data from APIs
  • Push messages around
  • Wait on queues or databases”

20 of 28

Task Queue Routing

21 of 28

Caching ML Models Celery

  • Celery workers cache models on startup → no per-task reload
  • Eliminates redundant I/O → saves seconds per task
  • Fully asynchronous + parallel on Kubernetes
  • Similar to avoiding cold starts in Lambda-style systems

22 of 28

Observability With Sentry Real-time Error Alerts

23 of 28

Task Signals Example

24 of 28

Custom EFK Stack

25 of 28

Error Detection + Queue Monitoring = Reliable Systems

  • Sentry tracks real-time Celery task failures
  • SQS metrics reveal queue health: backlog, delays, consumption
  • Dashboards justify autoscaling decisions
  • Combined view enables proactive debugging

26 of 28

27 of 28

Feature

Celery + Kubernetes

SFN + Lambdas

1. Cost Drivers

EC2, EKS Services, Network Management, DevOps Engineer

Lambda Execution cost, �SFN State transition cost

2. Concurrency

Virtually Unlimited �

Default 1,000 concurrent Lambdas �Can be increased to an extent

3. Scaling Behavior

Define Scaling policies�Analysing Load tests�Good for high RPS

Effortless Scaling,�Good for most MVPs

4. Control Over Compute

Fine-grained control over RAM, CPU

Limited to Lambda Specs

Ideal For

Teams needing full control, high throughput DevOps-savvy setups

Rapid prototyping, event-driven workflows Teams prioritizing ease over control

28 of 28

KEY TAKEAWAYS & SUMMARY

  • Powerfull tool for workflows
  • K8s controls Compute and Scaling
  • Observability in systems ensures reliability and error tracking.