1
GPIC: An Advanced Particle-In-Cell Code Using GPU Acceleration and its Application in Magnetic Reconnection
Shiyong Huang (黄狮勇)
Wuhan University, China
Collaborators: Qiyang Xiong, Zhigang Yuan,
Kui Jiang, Jian Zhang, from Wuhan University
Bharatkumar Sharma, Lvlin Kuang, from NVIDIA
2
Outline
3
Introduction of Particle-in-Cell Simulation
4
Fundamental of PIC Simulation
Introduction
Development
Application
Performance
Conclusion
Introduction of Particle-in-Cell Method
General Concept of Particle-in-Cell
Common Steps For Solver:
5
Fundamental of PIC Simulation
Introduction
Development
Application
Performance
Conclusion
Introduction of Particle-in-Cell Method
Explicit Numerical Solver of Collisionless Electromagnetic Scheme
Mesh Grid: Yee staggered grid.
[Yee, 1966]
Field:
Solver: Faraday’s law and Ampere’s law in discrete form.
Particle:
Solver: Newton-Lorentz law.
Implement: Buneman-Boris Rotation.
[Boris, 1970; Buneman, 1976]
Overtime:
Solver: Leap-frog Method (Second-order in Time).
6
Traditional HPC PIC Simulation
Introduction
Development
Application
Performance
Conclusion
Introduction of Particle-in-Cell Method
High-Performance Computing of PIC Simulation – MPI (Message Passing Interface)
Y Direction
X Direction
[0, 0] [0]
[1, 0] [1]
[2, 0] [2]
[3, 0] [3]
[0, 1] [4]
[1, 1] [5]
[2, 1] [6]
[3, 1] [7]
[0, 2] [8]
[1, 2] [9]
[2, 2] [A]
[3, 2] [B]
CPU Index: [x, y] [Linear]
Global Simulation Area
Decomposed Into Several Parts
Each CPU handles the computing of corresponding subarea.
Supercomputer
Firstly, it's expensive; Secondly, it's too slow!
7
Development of PIC Simulation Using GPU Computing
8
HPC PIC of GPU Computing
Introduction
Development
Application
Performance
Conclusion
Development of PIC Simulation Using GPU Computing
General Computing of GPU Device – Thread & Block
Simulation domain
9
Scheme Design of PIC on GPU
Introduction
Development
Application
Performance
Conclusion
Development of PIC Simulation Using GPU Computing
Multiple Thread Dealing With Single Grid
Three-Level Data Exchange Strategy
10
Scheme Design of PIC on GPU
Introduction
Development
Application
Performance
Conclusion
Development of PIC Simulation Using GPU Computing
Multi-GPU Computing Pattern
Each GPU holds the identical field data and different compositions of particle data.
11
Scheme Design of PIC on GPU
Introduction
Development
Application
Performance
Conclusion
Development of PIC Simulation Using GPU Computing
Summary of GPIC (GPU-PIC) Program
Computing Platform:
NVIDIA HPC SDK
Language:
CUDA Fortran
(.f90, .f08)
Math Library:
Thrust, cuRand, cuTensor
Communication Library:
HPC-X, NCCL(NVIDIA Collective Communication Library)
Compiler:
nvfortran/mpif90
Supportive:
All NVIDIA Series GPUs (Capability > 2.5, CUDA Version > 6.0)
12
Examples of GPIC Simulations
Introduction
Development
Application
Performance
Conclusion
Development of PIC Simulation Using GPU Computing
Magnetic Reconnection, Grid: 9600x3200, PPC: 320
Perpendicular Shock, Grid: 28800x2000, PPC: 160
Plasma Turbulence, Grid: 2400x2400, PPC: 3200
13
Performance of GPIC
Introduction
Development
Application
Performance
Conclusion
Development of PIC Simulation Using GPU Computing
Peak Performance of Single GPU Device
0
200X
400X
600X
800X
Time Per 10,000 Iterations –Relative Performance
1X
122X
724X
CPU Only
V100
A100
CPU Only: Intel Xeon Gold 6248 @ 2.50 GHz | V100: NVIDIA TESLA V100-SXM2-16GB | A100: NVIDIA A100-SXM4-40GB
Acceleration Rate on Multiple GPU Devices
Internal Link: NVLink 600GB/s; External Link: NVIDIA Connect-X 6, Infiniband, EDR, 100GB/s
Computing Speed up
Up to 724 times faster than the CPUs-PIC, and 5% cost of previous CPUs-PIC.
14
Application in Magnetic Reconnection
15
Instruments and Methods
Introduction
Development
Application
Performance
Conclusion
Application in Magnetic Reconnection
MMS Spacecrafts Observation
[Burch et al., 2016]
GPIC Simulation Program
[Xiong, Huang, et al., 2023, 2024]
Data Resolutions
16
Crater Structure Location
Introduction
Development
Application
Performance
Conclusion
Application in Magnetic Reconnection (Ⅰ) – Crater Structure behind RF
Simulation results are highly consistent with observations!
17
Formation of Crater Structure
Introduction
Development
Application
Performance
Conclusion
Application in Magnetic Reconnection (Ⅰ) – Crater Structure behind RF
Evolving Process of Crater Structure in Two-Dimensional Presentation:
18
Appearance of Turbulent Outflow
Introduction
Development
Application
Performance
Conclusion
Application in Magnetic Reconnection (Ⅱ) –Turbulent Reconnection Outflow
Status of Turbulent Outflow Under Different Guide Field Level
19
Energy Conversion in Turbulent Outflow
Introduction
Development
Application
Performance
Conclusion
Application in Magnetic Reconnection (Ⅱ) –Turbulent Reconnection Outflow
Energy Conversion and Magnetic Topology in Turbulent Outflow
O-type
X-type
O-type
X-type
O-type
X-type
O-type
X-type
20
Energy Conversion in Turbulent Outflow
Introduction
Development
Application
Performance
Conclusion
Application in Magnetic Reconnection (Ⅱ) –Turbulent Reconnection Outflow
Evidence From MMS Observations
(122 Events are Captured.)
O-type
X-type
O-type
X-type
21
Introduction
Development
Application
Performance
Conclusion
Summary
[1] S. Y. Huang, Q. Y. Xiong, Z. G. Yuan, et al. (2024), Crater Structure Behind Reconnection Front. Geophys. Res. Lett., 51, e2023GL106581.
[2] S. Y. Huang, J. Zhang, Q. Y. Xiong, Z. G. Yuan, et al. (2023), Kinetic-scale Topological Structures Associated with Energy Dissipation in the Turbulent Reconnection Outflow, The Astrophysical Journal, 958, 189, https://doi.org/10.3847/1538-4357/acf847
[3] Q. Y. Xiong, S. Y. Huang, J. Zhang, et al. (2024) Guide Field Dependence of Energy Conversion and Magnetic Topologies in Reconnection Turbulent Outflow. Geophys. Res. Lett., 51, e2024GL109356
[4] Q. Y. Xiong, S. Y. Huang, Z. G. Yuan, et al. (2024) GPIC: A Set of High-Efficiency CUDA Fortran Code Using GPU for Particle-in-cell simulation in space physics. Computer Phys. Comm., 295, 108994.
[5] Q. Y. Xiong, S. Y. Huang, Z. G. Yuan, et al. (2023) A Scheme of Full Kinetic Particle-in-cell Algorithms for GPU Acceleration Using CUDA Fortran Programming. Astrophys. J. Supp. S., 264, 3.
References:
22
Thank You !