1 of 1

ABSTRACT

The project SCOPED (Seismic COmputational Platform for Empowering Discovery) aims to develop cyberinfrastructure to enable such hybrid model-data research. We present an update on our platform development. Our software platform includes tools for data discovery (earthquake catalog building and ambient noise seismology) and theoretical seismology (wavefield simulations and inversion workflows for velocity Earth models). The platform includes containerized software that can run on HPC and cloud and tutorials to guide new users to run the software. There is also a preliminary implementation of the SCOPED Gateway that enables these containers as services over the web. The SCOPED seismoDB is a pilot cloud data lake to store heterogeneous data products in seismology (e.g., velocity models, seismic waveforms, earthquake metadata). We demonstrate the performance and new software using research examples.

PI TEAM

Carl Tape (ctape@alaska.edu); lead PI: full waveform modeling, earthquake source characteristics and uncertainties

Ebru Bozdag (bozdag@mines.edu) : full waveform inversion and tomography

SCOPED Update: a Cloud and HPC software platform for computational seismology

Marine Denolle (mdenolle@uw.edu) : ambient noise seismology and cloud computing

Felix Waldhauser (felixw@ldeo.columbia.edu) : precision seismology and earthquake catalogs

Ian Wang (iwang@tacc.utexas.edu) : high-performance seismology, filesystem, database

WAVEFIELD MODELING with spectral-element method simulations

Wavefield simulations within 3D Earth models provide synthetic seismograms that can be compared with recorded seismograms, either to better understand earthquake sources or to improve the subsurface characterization of Earth’s structure.

A schematic diagram of the proposed SCOPED Cyberinfrastructure that combines computation (left) and data (right). The SCOPED deliverables will be either fully functional (dark blue) or prototyped (light blue). The external components that SCOPED will interact with (gray) are either computing facilities, data archives, and users.

SCOPED Container Registry

We now have adjtomo, MTUQ, pysep, specufex, and MsPASS added to the SCOPED Container Registry built with the SCOPED Base Container. We also added an external software, SeisSol, to our container collection using the base container.

TRAINING WITH SCOPED

SCOPED contributed toward two training workshops: Specfem on October 5–7, 2022 (Bryant Chow, Carl Tape, ~ 186 participants), and High-Performance Seismology Cybertraining on May 9–12 (Marine Denolle, Alice Gabriel, Ian Wang, and SCEC colleagues, ~ 50 participants). These free workshops were attended by hundreds and included downloadable containerized software for users to gain research-level experience.

SEISMIC IMAGING - FULL WAVEFORM MODELING

WHOLE EARTH IMAGING

We participated in Texascale days on TACC’s Frontera system and successfully performed two iterations scaling up our simulations on ~8000 nodes with our current resolution. The main takeaways are: 1) GPU computing is necessary to further increase number of earthquakes or the resolution of simulations. 2) ADIOS library and compression help reduce the IO challenges.

Iterative workflow for seismic imaging using 3D wavefield simulations and adjoint methods (Chow et al., 2020). This workflow is automated with the help of Pyatoa software.

The seismic imaging workflow represents an optimization problem whereby the misfit between recorded and simulated seismograms is minimized while iterating toward a more accurate representation of the subsurface Earth properties (Vp, Vs, density).

PRECISION SEISMOLOGY: large-scale, high-resolution catalog production/analysis

SCOPED Gateway

The SCOPED Gateway has been created using Tapis. Currently, we have integrated Specfem3D_Globe, MTUQ and MsPASS as HPC applications added into the gateway. Ultimately, this will enable all SCOPED users to access containerized applications and execute them on either Cloud or HPC systems.

Above. Snapshot from a seismic wavefield simulation, showing an S wavefront entering into a deep sedimentary basin. Left. Cross sections of four 3D models used by Yuan Tian to investigate the influence of basin structures on the seismic wavefield.

LEFT: Azimuthally anisotropic global adjoint model GLAD-AZI-M50 (Bozdag, Örsvuran et al. in prep.)

RIGHT: 3D global tests on TACC’s Frontera system to explore the parameter trade-offs between wavespeeds and attenuation (Carmona et al. in revision for Geophys. J. Int.).

YouTube recordings

JupyterBook (in progress)

Establishing the SCOPED platform

We have established a base software container for SCOPED that contains basic Python packages, as well as ObsPy, pysep, and MsPASS (see below). The container can run on the gateway, offering both batch and interactive modes through Jupyter Lab.

MsPASS: Massive Parallel Analysis System for Seismologists:

  • Core implementation in Python and C++
  • Data management using MongoDB
  • Scalable parallel processing framework using Spark or Dask
  • All components are containerized that can be pulled by Docker or Singularity for distribution
  • Data provenance support at global and object levels

Proposed Cloud Native Workflow

SeaDAS-N �Cross-correlation on AWS: example with one month of data

One-bit normalization

Spectrum whitening

8 billion correlation operation

$ 15 with 67 Spot C4 instances

50 channel, 1-minute chunk

$ 1.3 per day

$ 0.3 per day

Prototyped

In progress

Framework ready

https://specfem.org/training

DAS on the cloud: data format and cross correlations (Ni et al, 2023)

We developed a cloud-optimized framework for Distributed Acoustic Sensing data management and ambient noise seismology

SCOPED Codes

Find all of our open-source software

https://seisscoped.org/about/scoped/ (QR code next door)

Python, Julia, C++, Fortran, …

Local

Compute

HPC

Cloud

  • Earth Imaging
  • Source Characterization
  • Ground Motions

Pymongo

Aws CLI

  • Data Mining

Coming up:

  • SCOPED Workshop - Seattle - May 2024
  • Containerized software using Docker

Detection & Phase Pick & classification

(e.g., seisbench with phasenet)

Cloud Storage

raw data and config YAML files

Event association & location

  1. cross correlation

  • Stacking

(e.g., NoisePy)

Imaging or Monitoring

Atlas

Earthquake catalog workflows: read data once and produce small metadata on database or S3.

Ambient noise seismology workflows. Read data, produce a lot of data (time series) output on S3, then re-aggregate the data

Simple Storage System - Cloud store / Good for big data volumes (images-time series)

Cloud hosted database

Good for big “metadata”

Cloud hosted jupyterhub

Good for exploration, visualization

Cloud managed clusters

Good for big-data processing, embarrassing parallel

Hybrid CyberInfrastructure

HPC

  • Ambient field Seismology
  • Quake Catalogs