ABSTRACT
The project SCOPED (Seismic COmputational Platform for Empowering Discovery) aims to develop cyberinfrastructure to enable such hybrid model-data research. We present an update on our platform development. Our software platform includes tools for data discovery (earthquake catalog building and ambient noise seismology) and theoretical seismology (wavefield simulations and inversion workflows for velocity Earth models). The platform includes containerized software that can run on HPC and cloud and tutorials to guide new users to run the software. There is also a preliminary implementation of the SCOPED Gateway that enables these containers as services over the web. The SCOPED seismoDB is a pilot cloud data lake to store heterogeneous data products in seismology (e.g., velocity models, seismic waveforms, earthquake metadata). We demonstrate the performance and new software using research examples.
PI TEAM
Carl Tape (ctape@alaska.edu); lead PI: full waveform modeling, earthquake source characteristics and uncertainties
Ebru Bozdag (bozdag@mines.edu) : full waveform inversion and tomography
SCOPED Update: a Cloud and HPC software platform for computational seismology
Marine Denolle (mdenolle@uw.edu) : ambient noise seismology and cloud computing
Felix Waldhauser (felixw@ldeo.columbia.edu) : precision seismology and earthquake catalogs
Ian Wang (iwang@tacc.utexas.edu) : high-performance seismology, filesystem, database
WAVEFIELD MODELING with spectral-element method simulations
Wavefield simulations within 3D Earth models provide synthetic seismograms that can be compared with recorded seismograms, either to better understand earthquake sources or to improve the subsurface characterization of Earth’s structure.
A schematic diagram of the proposed SCOPED Cyberinfrastructure that combines computation (left) and data (right). The SCOPED deliverables will be either fully functional (dark blue) or prototyped (light blue). The external components that SCOPED will interact with (gray) are either computing facilities, data archives, and users.
SCOPED Container Registry
We now have adjtomo, MTUQ, pysep, specufex, and MsPASS added to the SCOPED Container Registry built with the SCOPED Base Container. We also added an external software, SeisSol, to our container collection using the base container.
TRAINING WITH SCOPED
SCOPED contributed toward two training workshops: Specfem on October 5–7, 2022 (Bryant Chow, Carl Tape, ~ 186 participants), and High-Performance Seismology Cybertraining on May 9–12 (Marine Denolle, Alice Gabriel, Ian Wang, and SCEC colleagues, ~ 50 participants). These free workshops were attended by hundreds and included downloadable containerized software for users to gain research-level experience.
SEISMIC IMAGING - FULL WAVEFORM MODELING
WHOLE EARTH IMAGING
We participated in Texascale days on TACC’s Frontera system and successfully performed two iterations scaling up our simulations on ~8000 nodes with our current resolution. The main takeaways are: 1) GPU computing is necessary to further increase number of earthquakes or the resolution of simulations. 2) ADIOS library and compression help reduce the IO challenges.
Iterative workflow for seismic imaging using 3D wavefield simulations and adjoint methods (Chow et al., 2020). This workflow is automated with the help of Pyatoa software.
The seismic imaging workflow represents an optimization problem whereby the misfit between recorded and simulated seismograms is minimized while iterating toward a more accurate representation of the subsurface Earth properties (Vp, Vs, density).
PRECISION SEISMOLOGY: large-scale, high-resolution catalog production/analysis
SCOPED Gateway
The SCOPED Gateway has been created using Tapis. Currently, we have integrated Specfem3D_Globe, MTUQ and MsPASS as HPC applications added into the gateway. Ultimately, this will enable all SCOPED users to access containerized applications and execute them on either Cloud or HPC systems.
Above. Snapshot from a seismic wavefield simulation, showing an S wavefront entering into a deep sedimentary basin. Left. Cross sections of four 3D models used by Yuan Tian to investigate the influence of basin structures on the seismic wavefield.
LEFT: Azimuthally anisotropic global adjoint model GLAD-AZI-M50 (Bozdag, Örsvuran et al. in prep.)
RIGHT: 3D global tests on TACC’s Frontera system to explore the parameter trade-offs between wavespeeds and attenuation (Carmona et al. in revision for Geophys. J. Int.).
YouTube recordings
JupyterBook (in progress)
Establishing the SCOPED platform
We have established a base software container for SCOPED that contains basic Python packages, as well as ObsPy, pysep, and MsPASS (see below). The container can run on the gateway, offering both batch and interactive modes through Jupyter Lab.
MsPASS: Massive Parallel Analysis System for Seismologists:
Proposed Cloud Native Workflow
SeaDAS-N �Cross-correlation on AWS: example with one month of data
One-bit normalization
Spectrum whitening
8 billion correlation operation
$ 15 with 67 Spot C4 instances
50 channel, 1-minute chunk
$ 1.3 per day
$ 0.3 per day
Prototyped
In progress
Framework ready
https://specfem.org/training
DAS on the cloud: data format and cross correlations (Ni et al, 2023)
We developed a cloud-optimized framework for Distributed Acoustic Sensing data management and ambient noise seismology
SCOPED Codes
Find all of our open-source software
https://seisscoped.org/about/scoped/ (QR code next door)
Python, Julia, C++, Fortran, …
Local
Compute
HPC
Cloud
Pymongo
Aws CLI
Coming up:
Detection & Phase Pick & classification
(e.g., seisbench with phasenet)
Cloud Storage
raw data and config YAML files
Event association & location
(e.g., NoisePy)
Imaging or Monitoring
Atlas
Earthquake catalog workflows: read data once and produce small metadata on database or S3.
Ambient noise seismology workflows. Read data, produce a lot of data (time series) output on S3, then re-aggregate the data
Simple Storage System - Cloud store / Good for big data volumes (images-time series)
Cloud hosted database
Good for big “metadata”
Cloud hosted jupyterhub
Good for exploration, visualization
Cloud managed clusters
Good for big-data processing, embarrassing parallel
Hybrid CyberInfrastructure
HPC