ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Cloud Composer sizing guide
2
Last update date: October 04th, 2021
3
4
This sizing template is intended for Cloud Composer instances running on Airflow 1.x.
5
To start using this template, make a copy of this spreasheet by clicking on the menu File > Make a copy. Please input values for cells highlighted in yellow to generate the recommended configuration.
6
7
Workload definition
8
Provide DAG and task information during the peak load.
9
Number of running DAGs15<-- User input required
10
Median tasks / DAG5
11
Median task duration [sec]60
12
Desired median DAG run duration [min]20
13
Median DAG run duration [sec]1200<-- DAG required perdiod [min] * 60 [sec/min]
14
15
Intermediate calculations - informative only, no action required
16
Total tasks per period75<-- DAGs * Tasks / DAG
17
Cummulative tasks duration [sec]4500<-- Total tasks per period * Task duration
18
Required parallel tasks3.75<-- Cumulative tasks duration / DAG required period
19
vCPUs for tasks3.75<-- Required parallel tasks
20
vCPU buffer0.75<-- vCPUs for tasks * 0.20
21
Required vCPUs for workers5<-- vCPUs for tasks + vCPU buffer
22
23
Worker nodes requirements
24
Provide in this section your selection of vCPUs and number of workers nodes to meet the required compute power for tasks.
25
vCPUs per worker node4<-- Choose number of vCPUs per machine.
26
Dedicated worker nodes2<-- Input number of dedicated worker nodes
27
Total vCPUs for workers8<-- When the compute power is below the required vCPUs for workers, the cell will be highlighted in red.
28
29
Cloud Composer instance configuration
30
This section outputs the total number of nodes for the Composer instance and the machine type. You can use this information when you create a Composer instance.
31
Node configuration
32
Node count3
33
Machine typen1-standard-4<-- This calculation is based on the default machine type. For all supported machine types see https://cloud.google.com/composer/pricing#machine-type
34
35
Configuration overrides (airflow.cfg parameters)
36
This section outputs recommendations for Airflow configuration overrides based on the details you have provided.
37
[scheduler] parsing_processes3<-- estimate = vCPU -1 per node. Notes: DAG processing may block on file I/O. This setting was renamed in Airflow 1.10.14. In earlier versions, it was defined as max_threads.
38
[celery] worker_concurrency2<-- estimate = min(cumulative task duration / DAG scheduling period / airflow workers , num_cpu_per_node * 6)
39
[core] parallelism6<-- estimate = worker_concurrency * num_airflow_workers
40
[core] dag_concurrency6<-- estimate = parallelism
41
42
43
44
The recommendations are based on general best practices, which must be customized after testing the behaviour of your specific workload.
45
These values serve as a good starting point, you may want to further tweak your environment with other machine types and sizing , especially when you have tight workflow SLAs. For example, inter-task dependencies may also restrict the amount of parallelism you could have in the system.
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100