1 of 22

1

GPIC: An Advanced Particle-In-Cell Code Using GPU Acceleration and its Application in Magnetic Reconnection

Shiyong Huang (黄狮勇)

Wuhan University, China

Collaborators: Qiyang Xiong, Zhigang Yuan,

Kui Jiang, Jian Zhang, from Wuhan University

Bharatkumar Sharma, Lvlin Kuang, from NVIDIA

2 of 22

2

Outline

  • Introduction of PIC

  • Development of GPIC

  • Performance of GPIC

  • Application in MR

  • Conclusions

3 of 22

3

Introduction of Particle-in-Cell Simulation

4 of 22

4

Fundamental of PIC Simulation

Introduction

Development

Application

Performance

Conclusion

Introduction of Particle-in-Cell Method

General Concept of Particle-in-Cell

Common Steps For Solver:

  1. Particles are forced by the local fields;
  2. Currents/Charges are contributed by the particles;
  3. Solve the field according to the relation.
  • A limited spatial area is meshed using certain grid resolution for field;
  • Using finite number of macro-particles to represent the certain density plasma in real space;
  • The system evolves self-consistent with time following physical laws.

5 of 22

5

Fundamental of PIC Simulation

Introduction

Development

Application

Performance

Conclusion

Introduction of Particle-in-Cell Method

Explicit Numerical Solver of Collisionless Electromagnetic Scheme

Mesh Grid: Yee staggered grid.

[Yee, 1966]

Field:

Solver: Faraday’s law and Ampere’s law in discrete form.

 

 

Particle:

Solver: Newton-Lorentz law.

 

Implement: Buneman-Boris Rotation.

[Boris, 1970; Buneman, 1976]

Overtime:

Solver: Leap-frog Method (Second-order in Time).

6 of 22

6

Traditional HPC PIC Simulation

Introduction

Development

Application

Performance

Conclusion

Introduction of Particle-in-Cell Method

High-Performance Computing of PIC Simulation – MPI (Message Passing Interface)

Y Direction

X Direction

[0, 0] [0]

[1, 0] [1]

[2, 0] [2]

[3, 0] [3]

[0, 1] [4]

[1, 1] [5]

[2, 1] [6]

[3, 1] [7]

[0, 2] [8]

[1, 2] [9]

[2, 2] [A]

[3, 2] [B]

CPU Index: [x, y] [Linear]

Global Simulation Area

Decomposed Into Several Parts

  • Field-Decomposition Method:

Each CPU handles the computing of corresponding subarea.

Supercomputer

Firstly, it's expensive; Secondly, it's too slow! 

7 of 22

7

Development of PIC Simulation Using GPU Computing

8 of 22

8

HPC PIC of GPU Computing

Introduction

Development

Application

Performance

Conclusion

Development of PIC Simulation Using GPU Computing

General Computing of GPU Device – Thread & Block

  • A single GPU device contains numerous “Block”, and each Block contains numerous “Thread”;

  • Each Thread can execute computing instructions independently.

Simulation domain

9 of 22

9

Scheme Design of PIC on GPU

Introduction

Development

Application

Performance

Conclusion

Development of PIC Simulation Using GPU Computing

Multiple Thread Dealing With Single Grid

Three-Level Data Exchange Strategy

10 of 22

10

Scheme Design of PIC on GPU

Introduction

Development

Application

Performance

Conclusion

Development of PIC Simulation Using GPU Computing

Multi-GPU Computing Pattern

  • Field-Duplication Method:

Each GPU holds the identical field data and different compositions of particle data.

11 of 22

11

Scheme Design of PIC on GPU

Introduction

Development

Application

Performance

Conclusion

Development of PIC Simulation Using GPU Computing

Summary of GPIC (GPU-PIC) Program

Computing Platform:

NVIDIA HPC SDK

Language:

CUDA Fortran

(.f90, .f08)

Math Library:

Thrust, cuRand, cuTensor

Communication Library:

HPC-X, NCCL(NVIDIA Collective Communication Library)

Compiler:

nvfortran/mpif90

Supportive:

All NVIDIA Series GPUs (Capability > 2.5, CUDA Version > 6.0)

12 of 22

12

Examples of GPIC Simulations

Introduction

Development

Application

Performance

Conclusion

Development of PIC Simulation Using GPU Computing

Magnetic Reconnection, Grid: 9600x3200, PPC: 320

Perpendicular Shock, Grid: 28800x2000, PPC: 160

Plasma Turbulence, Grid: 2400x2400, PPC: 3200

13 of 22

13

Performance of GPIC

Introduction

Development

Application

Performance

Conclusion

Development of PIC Simulation Using GPU Computing

Peak Performance of Single GPU Device

0

200X

400X

600X

800X

Time Per 10,000 Iterations –Relative Performance

1X

122X

724X

CPU Only

V100

A100

CPU Only: Intel Xeon Gold 6248 @ 2.50 GHz | V100: NVIDIA TESLA V100-SXM2-16GB | A100: NVIDIA A100-SXM4-40GB

Acceleration Rate on Multiple GPU Devices

Internal Link: NVLink 600GB/s; External Link: NVIDIA Connect-X 6, Infiniband, EDR, 100GB/s

Computing Speed up

Up to 724 times faster than the CPUs-PIC, and 5% cost of previous CPUs-PIC.

14 of 22

14

Application in Magnetic Reconnection

15 of 22

15

Instruments and Methods

Introduction

Development

Application

Performance

Conclusion

Application in Magnetic Reconnection

MMS Spacecrafts Observation

[Burch et al., 2016]

GPIC Simulation Program

[Xiong, Huang, et al., 2023, 2024]

Data Resolutions

  • FGM: 128 Hz;
  • EDP: 8196 Hz;
  • FPI: 150 ms for electron; 30 ms for ions.

 

 

16 of 22

16

Crater Structure Location

Introduction

Development

Application

Performance

Conclusion

Application in Magnetic Reconnection () – Crater Structure behind RF

  • The crater structure locates at the position in-between outer EDR (electron diffusion region) and RF;

  • All four MMS spacecrafts cross the crater structure successively mainly along N direction.

 

Simulation results are highly consistent with observations!

17 of 22

17

Formation of Crater Structure

Introduction

Development

Application

Performance

Conclusion

Application in Magnetic Reconnection () – Crater Structure behind RF

Evolving Process of Crater Structure in Two-Dimensional Presentation:

 

18 of 22

18

Appearance of Turbulent Outflow

Introduction

Development

Application

Performance

Conclusion

Application in Magnetic Reconnection () –Turbulent Reconnection Outflow

Status of Turbulent Outflow Under Different Guide Field Level

  • Four runs are performed using different guide field level;

  • Under larger guide field, reconnection outflow can be more chaotic, and more intense currents are generated.

 

19 of 22

19

Energy Conversion in Turbulent Outflow

Introduction

Development

Application

Performance

Conclusion

Application in Magnetic Reconnection () –Turbulent Reconnection Outflow

Energy Conversion and Magnetic Topology in Turbulent Outflow

  • The turbulent outflow with larger guide field can attain higher PVI and current, associated with larger energy conversion;
  • Using the geometrical invariants, it is found that the larger guide field can promote the generation of O-type topology.

O-type

X-type

O-type

X-type

O-type

X-type

O-type

X-type

20 of 22

20

Energy Conversion in Turbulent Outflow

Introduction

Development

Application

Performance

Conclusion

Application in Magnetic Reconnection () –Turbulent Reconnection Outflow

Evidence From MMS Observations

(122 Events are Captured.)

  • Well-consistent with the simulation results.

O-type

X-type

O-type

X-type

21 of 22

21

Introduction

Development

Application

Performance

Conclusion

Summary

  • GPU computing can be applied in fully kinetic PIC simulation (GPIC), and it can amazingly speed up computing process.
  • A novel crater structure is found behind reconnection front via GPIC simulations and in-situ observations, which is caused by the high-speed electron outflow.
  • Both simulations and observations show that Larger guide field can promote the generation of O-type topology structures and energy conversion in turbulent outflow.

[1] S. Y. Huang, Q. Y. Xiong, Z. G. Yuan, et al. (2024), Crater Structure Behind Reconnection Front. Geophys. Res. Lett., 51, e2023GL106581.

[2] S. Y. Huang, J. Zhang, Q. Y. Xiong, Z. G. Yuan, et al. (2023), Kinetic-scale Topological Structures Associated with Energy Dissipation in the Turbulent Reconnection Outflow, The Astrophysical Journal, 958, 189, https://doi.org/10.3847/1538-4357/acf847

[3] Q. Y. Xiong, S. Y. Huang, J. Zhang, et al. (2024) Guide Field Dependence of Energy Conversion and Magnetic Topologies in Reconnection Turbulent Outflow. Geophys. Res. Lett., 51, e2024GL109356

[4] Q. Y. Xiong, S. Y. Huang, Z. G. Yuan, et al. (2024) GPIC: A Set of High-Efficiency CUDA Fortran Code Using GPU for Particle-in-cell simulation in space physics. Computer Phys. Comm., 295, 108994.

[5] Q. Y. Xiong, S. Y. Huang, Z. G. Yuan, et al. (2023) A Scheme of Full Kinetic Particle-in-cell Algorithms for GPU Acceleration Using CUDA Fortran Programming. Astrophys. J. Supp. S., 264, 3.

References:

22 of 22

22

Thank You !