1 of 31

High-performance Seismology CyberTraining 2023�Cross-correlation of �Distributed Acoustic Sensing on the Cloud

Yiyu Ni1*, Marine A. Denolle1, Chengxin Jiang2

1Department of Earth and Space Sciences, University of Washington

2Research School of Earth Sciences, the Australian National University

*niyiyu@uw.edu

May 10, 2023

2 of 31

Seismic Dense Array

2

  • Several thousand stations
  • Several hundred meter spacing
  • Several months recording

Long Beach Array

Lin et al., 2013

LArge-n Seismic Survey in Oklahoma (LASSO)

Dougherty et al., 2019

3 of 31

Distributed Acoustic Sensing

3

Zhan, 2019

4 of 31

Distributed Acoustic Sensing

4

Zhan, 2019; Nayak et al., 2021

  • Several thousand channels
  • Several months recording
  • Several meters spacing
  • Much less field work

5 of 31

Distributed Acoustic Sensing

5

Zhan, 2019; Nayak et al., 2021

6 of 31

Distributed Acoustic Sensing

6

Zhan, 2019; Nayak et al., 2021

7 of 31

Ambient Noise Seismology + Distributed Acoustic Sensing

7

DAS data

  • Big data

  • Better cyber-infrastructure and formats
  • Large-N

  • Efficient tools

  • Advanced computing infrastructure

8 of 31

Distributed Acoustic Sensing

8

Modified from Spica et al., 2023

Data volume rate: 10 to 1600 GB per day

6-7 years of DAS > 31 years of IRIS archive

9 of 31

DAS data format

9

Raw

Acquisition

Structure of a Sintela Onyx DAS HDF5 file

Attributes

Custom

RawData

(ncha, nt, int/float32)

RawData

(ncha, nt, int/float32)

RawDataTime

(nt, int64)

GpBits

(nt, uint8)

GpsStatus

(nt, uint8)

PpsOffset

(nt, uint32)

SampleCount

(nt, int64)

nt: #time sample

ncha: #channel

10 of 31

DAS data format

10

Time

Channel

00:00:00

Time

00:01:00

Time

······

23:59:00

1 minute: 1 file (50 MB) [nCha, fs*60]

1 day: 1440 files (70 GB) [nCha, fs*60*60*24]

1 month: 44640 files (2.2 TB) [nCha, fs*60*60*24*31]

sub-array

11 of 31

DAS data format

11

0x00000

0xFFFFF

One-minute DAS HDF5 file

Raw

Acquisition

Custom

GpBits

(nt, uint8)

GpsStatus

(nt, uint8)

PpsOffset

(nt, uint32)

SampleCount

(nt, int64)

RawData

(ncha, nt, int32)

RawDataTime

(nt, int64)

Attributes

  • Self-describing hierarchical structure

  • Support compression

  • Allow user-defined attributes

  • No upper limit of file size
  • Complex structure on disks

  • Hard for range request (sub-array request)

  • Expensive (or prohibitive) to make the full copy of the data

12 of 31

Object Storage for DAS

12

  1. Convert HDF5 into Cloud-optimized format

  • Deploy MinIO as a local object storage

13 of 31

Cloud-optimized format

13

00:00:00

00:01:00

23:59:00

Time

Channel

······

data chunk

data chunk

data chunk

Time

Channel

data tile

data tile

data tile

00:00:00

00:01:00

23:59:00

······

  • Compression

  • Key-value attributes

  • Self-describing hierarchical structure

  • Sub-array request

Time

Channel

00:00:00

00:01:00

23:59:00

······

14 of 31

Chunking

14

00:00:00

00:01:00

23:59:00

Time

Channel

······

data chunk

data chunk

data chunk

data chunk

data chunk

data chunk

15 of 31

Chunking

15

Small chunk

Large chunk

Slow writing but fast reading

Too much small objects

Channel

data chunk

Fast writing

More network overhead

Channel

data chunk

Optimize based on

  • Data set
  • Storage hardware
  • Reading pattern

16 of 31

Object Storage

16

  • Open-source

  • Easy to deploy

  • Good scalability

  • Access control

  • S3-compatible

17 of 31

17

Local Storage

Dept. B

Group B

Dept. C

Group C

Group A

Data Query

A PI level data server that serves the group and other departments, and other institutions.

18 of 31

18

Local Storage

Dept. B

Group B

Dept. C

Group C

Group A

A PI level data server that serves the group and other departments, and other institutions.

Data Query

Institution D

Institution E

Institution F

Remote

Local

19 of 31

19

Local Storage

Dept. B

Group B

Dept. C

Group C

Group A

Local

Cloud

……

Cloud Storage

Institution D

Institution E

Institution F

Remote

Local

A S3-compatible storage for developing and data sharing.

Data Query

20 of 31

20

Local Storage

Local

Cloud

Cloud Storage

Institution D

Institution E

Institution F

Remote

Local

S3 as a data center

Data Query

21 of 31

21

NoisePy for DAS

LArge-n Seismic Survey in Oklahoma (LASSO)

Dougherty et al., 2019

N stations

N(N+1)/2 station-pairs

N channels

N(N+1)/2 channel pairs

22 of 31

Application: SeaDAS-N

22

  • UW Seattle - Bothell
  • April 2022 – March 2023 (11 months)

  • 2089 channels (4.78 m) @ 100Hz
  • Raw data in minute-long HDF5 (72 GB/day)

UW - Photonic Sensing Facility

https://psf.uw.edu

23 of 31

23

NoisePy for DAS

source channel: 0

receiver channel: 0 – 2088

2089 channel-pair (xcorr)

source channel: 200

receiver channel: 200 – 2088

1889 channel-pair (xcorr)

source channel: 1000

receiver channel: 1000 – 2088

1089 channel-pair (xcorr)

source channel: 1800

receiver channel: 1800 – 2088

289 channel-pair (xcorr)

2089x2090/2 = 2.18 million corr

source channel: 2088

receiver channel: 2088 – 2088

1 channel-pair (auto-corr)

24 of 31

Application: SeaDAS-N�Client to query data

24

subarray

600 channels

25 of 31

Application: SeaDAS-N �Cross-correlation on AWS Batch

25

Containerized NoisPy4DAS

One-bit, spectrum whitening

496 vCPUs, 12h

< $ 100 EC2

One-month SeaDAS-N

Chunked in Zarr format

~ $ 1.3 per day

Hourly stacking CCF

~ $ 1.8 per day

Local

Cloud

1 2 3 4 5 6

7 8 9 10 11 12

Job array

26 of 31

Application: SeaDAS-N �Cross-correlation on AWS Batch

26

  • Virtual source gather
  • Hourly linear stacking
  • ~100 billion correlation

27 of 31

Application: SeaDAS-N �Cross-correlation on AWS Batch

27

  • Virtual source gather
  • Hourly linear stacking
  • ~100 billion correlation

28 of 31

Thanks

Follow project on GitHub

https://github.com/niyiyu/DASstore

https://github.com/niyiyu/NoisePy4DAS-SeaDAS/

29 of 31

Hands-on Google Colab

https://colab.research.google.com/

File -> Open notebook

-> GitHub section

-> URL: https://github.com/niyiyu/NoisePy4DAS-SeaDAS/

-> select the example notebook

30 of 31

Hands-on through Docker on AWS EC2

1. Launch a EC2 instance in us-west-2 (Oregon) region, connect to the instance, and install required packages (just like yesterday).

2. Run the command in the instance to pull the image

sudo systemctl start docker

sudo docker pull ghcr.io/niyiyu/noisepy4das-seadas:latest

3. Run the Jupyter notebook in the instance

sudo docker run --rm -p 8888:8888 ghcr.io/niyiyu/noisepy4das-seadas jupyter notebook --ip 0.0.0.0

4. Go to the Jupyter notebook through your local browser.

Tips to launch instances

https://seisscoped.org/HPS/softhardware/AWS_101.html

31 of 31

Hands-on through Docker on local (anyone?)

1. Launch a EC2 instance in us-west-2 (Oregon) region, connect to the instance, and install required packages (just like yesterday).

2. Run the command in the instance to pull the image

sudo systemctl start docker

sudo docker pull ghcr.io/niyiyu/noisepy4das-seadas:latest

3. Run the Jupyter notebook in the instance

sudo docker run --rm -p 8888:8888 ghcr.io/niyiyu/noisepy4das-seadas jupyter notebook --ip 0.0.0.0

4. Go to the Jupyter notebook through your local browser.

Tips to launch instances

https://seisscoped.org/HPS/softhardware/AWS_101.html