https://medium.com/nerd-for-tech/an-overview-of-pipeline-parallelism-and-its-research-progress-7934e5e6d5b8
Pipeline Parallelism
Reza Jahadi Aye Sandar Thwe Xueyun Ye
Professor: Dr. Ehsan Atoofian
Narayanan, Deepak, et al. “PipeDream: generalized pipeline parallelism for DNN training.” Proceedings of the 27th ACM Symposium on Operating Systems Principles. 2019.
MNIST Dataset
handwritten digits
0
1
2
3
4
5
6
7
8
9
60,000 training images
10,000 Test images
Convolution
Convolution
Dense Layer
Dense Layer
Output
Input
1@28×28
16@14×14
32@7×7
1×200
1×10
1×200
Convolutional Neural Network
Model Parallelism
Batch size = 1
under-utilization of computing resources
Narayanan, Deepak, et al. “PipeDream: generalized pipeline parallelism for DNN training.” Proceedings of the 27th ACM Symposium on Operating Systems Principles. 2019.
Not clear model optimal splitting among different machines
Convolution
Convolution
Dense Layer
Dense Layer
Output
Input
1@28×28
16@14×14
32@7×7
1×200
1×10
1×200
Task Channel Model
Pipeline Parallelism
Accuracy
Number of Processors = Number of Layers
Pipeline Parallelism
Conv 1 (P0)
Conv 2 (P1)
Full 2 (P3)
Full 1 (P2)
Output (P4)
1st image
2nd image
3rd image
2nd image
3rd image
4th image
5th image
1st image
3rd image
4th image
2nd image
1st image
2nd image
1st image
step 1
step 2
step 3
step 4
step 5
…
Batch size (i)=
5*Test dataset size
P
1st image
ith image
step i
Pipeline Parallelism
Data Parallelism
Results
| Sequential Code Exe Time (s) | Parallel Code Exe Time with 20 CPU (s) | Speedup with 20 CPU |
Pipelining | 43.715269 | 5.89032 | 7.421543991 |
Data Parallelism | 45.51055 | 2.647105 | 17.19257453 |
Communication Overhead
(Pipeline MPI_Send / MPI_Receive)
Communication Overhead
(MPI_Reduce)
log p, where:
p = number of processors