High-performance Seismology CyberTraining 2023�Cross-correlation of �Distributed Acoustic Sensing on the Cloud
Yiyu Ni1*, Marine A. Denolle1, Chengxin Jiang2
1Department of Earth and Space Sciences, University of Washington
2Research School of Earth Sciences, the Australian National University
*niyiyu@uw.edu
May 10, 2023
Seismic Dense Array
2
Long Beach Array
Lin et al., 2013
LArge-n Seismic Survey in Oklahoma (LASSO)
Dougherty et al., 2019
Distributed Acoustic Sensing
3
Zhan, 2019
Distributed Acoustic Sensing
4
Zhan, 2019; Nayak et al., 2021
Distributed Acoustic Sensing
5
Zhan, 2019; Nayak et al., 2021
Distributed Acoustic Sensing
6
Zhan, 2019; Nayak et al., 2021
Ambient Noise Seismology + Distributed Acoustic Sensing
7
DAS data
Distributed Acoustic Sensing
8
Modified from Spica et al., 2023
Data volume rate: 10 to 1600 GB per day
6-7 years of DAS > 31 years of IRIS archive
DAS data format
9
Raw
Acquisition
Structure of a Sintela Onyx DAS HDF5 file
Attributes
Custom
RawData
(ncha, nt, int/float32)
RawData
(ncha, nt, int/float32)
RawDataTime
(nt, int64)
GpBits
(nt, uint8)
GpsStatus
(nt, uint8)
PpsOffset
(nt, uint32)
SampleCount
(nt, int64)
nt: #time sample
ncha: #channel
DAS data format
10
Time
Channel
00:00:00
Time
00:01:00
Time
······
23:59:00
1 minute: 1 file (50 MB) [nCha, fs*60]
1 day: 1440 files (70 GB) [nCha, fs*60*60*24]
1 month: 44640 files (2.2 TB) [nCha, fs*60*60*24*31]
sub-array
DAS data format
11
0x00000
0xFFFFF
One-minute DAS HDF5 file
Raw
Acquisition
Custom
GpBits
(nt, uint8)
GpsStatus
(nt, uint8)
PpsOffset
(nt, uint32)
SampleCount
(nt, int64)
RawData
(ncha, nt, int32)
RawDataTime
(nt, int64)
Attributes
Object Storage for DAS
12
Cloud-optimized format
13
00:00:00
00:01:00
23:59:00
Time
Channel
······
data chunk
data chunk
data chunk
Time
Channel
data tile
data tile
data tile
00:00:00
00:01:00
23:59:00
······
Time
Channel
00:00:00
00:01:00
23:59:00
······
Chunking
14
00:00:00
00:01:00
23:59:00
Time
Channel
······
data chunk
data chunk
data chunk
data chunk
data chunk
data chunk
Chunking
15
Small chunk
Large chunk
Slow writing but fast reading
Too much small objects
Channel
data chunk
Fast writing
More network overhead
Channel
data chunk
Optimize based on
Object Storage
16
17
Local Storage
Dept. B
Group B
Dept. C
Group C
Group A
Data Query
A PI level data server that serves the group and other departments, and other institutions.
18
Local Storage
Dept. B
Group B
Dept. C
Group C
Group A
A PI level data server that serves the group and other departments, and other institutions.
Data Query
Institution D
Institution E
Institution F
Remote
Local
19
Local Storage
Dept. B
Group B
Dept. C
Group C
Group A
Local
Cloud
……
Cloud Storage
Institution D
Institution E
Institution F
Remote
Local
A S3-compatible storage for developing and data sharing.
Data Query
20
Local Storage
Local
Cloud
Cloud Storage
Institution D
Institution E
Institution F
Remote
Local
S3 as a data center
Data Query
21
NoisePy for DAS
LArge-n Seismic Survey in Oklahoma (LASSO)
Dougherty et al., 2019
N stations
N(N+1)/2 station-pairs
N channels
N(N+1)/2 channel pairs
Application: SeaDAS-N
22
UW - Photonic Sensing Facility
23
NoisePy for DAS
source channel: 0
receiver channel: 0 – 2088
2089 channel-pair (xcorr)
source channel: 200
receiver channel: 200 – 2088
1889 channel-pair (xcorr)
source channel: 1000
receiver channel: 1000 – 2088
1089 channel-pair (xcorr)
source channel: 1800
receiver channel: 1800 – 2088
289 channel-pair (xcorr)
2089x2090/2 = 2.18 million corr
source channel: 2088
receiver channel: 2088 – 2088
1 channel-pair (auto-corr)
Application: SeaDAS-N�Client to query data
24
subarray
600 channels
Application: SeaDAS-N �Cross-correlation on AWS Batch
25
Containerized NoisPy4DAS
One-bit, spectrum whitening
496 vCPUs, 12h
< $ 100 EC2
One-month SeaDAS-N
Chunked in Zarr format
~ $ 1.3 per day
Hourly stacking CCF
~ $ 1.8 per day
Local
Cloud
1 2 3 4 5 6
7 8 9 10 11 12
Job array
Application: SeaDAS-N �Cross-correlation on AWS Batch
26
Application: SeaDAS-N �Cross-correlation on AWS Batch
27
Thanks
Follow project on GitHub
https://github.com/niyiyu/DASstore
https://github.com/niyiyu/NoisePy4DAS-SeaDAS/
Hands-on Google Colab
https://colab.research.google.com/
File -> Open notebook
-> GitHub section
-> URL: https://github.com/niyiyu/NoisePy4DAS-SeaDAS/
-> select the example notebook
Hands-on through Docker on AWS EC2
1. Launch a EC2 instance in us-west-2 (Oregon) region, connect to the instance, and install required packages (just like yesterday).
2. Run the command in the instance to pull the image
sudo systemctl start docker
sudo docker pull ghcr.io/niyiyu/noisepy4das-seadas:latest
3. Run the Jupyter notebook in the instance
sudo docker run --rm -p 8888:8888 ghcr.io/niyiyu/noisepy4das-seadas jupyter notebook --ip 0.0.0.0
4. Go to the Jupyter notebook through your local browser.
Tips to launch instances
https://seisscoped.org/HPS/softhardware/AWS_101.html
Hands-on through Docker on local (anyone?)
1. Launch a EC2 instance in us-west-2 (Oregon) region, connect to the instance, and install required packages (just like yesterday).
2. Run the command in the instance to pull the image
sudo systemctl start docker
sudo docker pull ghcr.io/niyiyu/noisepy4das-seadas:latest
3. Run the Jupyter notebook in the instance
sudo docker run --rm -p 8888:8888 ghcr.io/niyiyu/noisepy4das-seadas jupyter notebook --ip 0.0.0.0
4. Go to the Jupyter notebook through your local browser.
Tips to launch instances
https://seisscoped.org/HPS/softhardware/AWS_101.html