1 of 68

NSDF-Services: Integrating Networking, Storage, and Computing Services into a Testbed for Democratization of Data Delivery

http://nationalsciencedatafabric.org/

Jakob Luettgau1, Heberth Martinez1, Paula Olaya1, Giorgio Scorzelli2, Glenn Tarcea3, Jay Lofstead4,

Christine R. Kirkpatrick5, Valerio Pascucci2, Michela Taufer1.

1University of Tennessee, 2University of Utah, 3University of Michigan, 4Sandia National Laboratories, 5University of California San Diego

NSF: 2138811 (NSDF) and 2028923 (SOMOSPIE); IBM; XSEDE: TG-CIS210128; Chameleon: CHI-210923

December 5th, 2023, Taormina, Italy.

16th IEEE/ACM International Conference on

Utility and Cloud Computing

2 of 68

Acknowledgements

2

This research is supported by the National Science Foundation (NSF) awards

#1841758, #2028923, #2103836, #2103845, #2138811, #2127548,

#2223704, #2330582, #2331152, #2334945.

DoE award DE-FE0031880, the Intel oneAPI Centers of Excellence at the University of Utah, the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the DoE and the NNSA, and UT-Battelle, and LLC under contract DE-AC05-00OR22725.

Results presented in this paper were obtained in part using resources from ACCESS TG-CIS210128; CloudLab PID-16202; Chameleon Cloud CHI-210923; Fabric; and IBM Shared University Research Award.

3 of 68

http://nationalsciencedatafabric.org/

Mission of National Science Data Fabric (NSDF):

We are building a holistic ecosystem to democratize data-driven scientific discovery by connecting an open network of institutions, including minority serving institutions, with a shared, modular, containerized data delivery environment.

4 of 68

4

�UMICH

UTK

JHU

UTAH

SDSC

Institutions and universities with resources to share

5 of 68

5

�UMICH

UTK

JHU

UTAH

SDSC

JSUMS

MS-CC

MMC

UTEP

Initiative to integrate minority serving institutions

6 of 68

6

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

Digital Rocks

Initiative to integrate scientific projects

7 of 68

7

XenonNT

IceCube

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

Digital Rocks

Initiative to integrate large scale projects

8 of 68

8

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

LLNL

CHPC

Digital Rocks

Initiative to integrate HPC resources

9 of 68

9

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

XSEDE/ACCESS

Jetstream2

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

Initiative to integrate research-oriented HPC and cloud resources

10 of 68

10

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

XSEDE/ACCESS

Jetstream2

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

Initiative to integrate public cloud resources

CloudBank �(AWS, Azure)

CyVerse

IBM Cloud

11 of 68

11

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

Initiative to integrate enterprise storage resources

Storage Providers (Seal, MinIO)

12 of 68

12

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

A data fabric must be accessible and tightly integrated to coordinate data movement between geographically distributed teams or organizations

13 of 68

13

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

A data fabric must be accessible and tightly integrated to coordinate data movement between geographically distributed teams or organizations

Computation

Storage

Network

14 of 68

14

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

A data fabric must be accessible and tightly integrated to coordinate data movement between geographically distributed teams or organizations

Computation

Suite of services to manage networking, computing, and storage resources across the academic and commercial cloud, lowering the barriers to cloud cyberinfrastructure (CI)

Storage

Network

15 of 68

15

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

The NSDF architecture integrates a suite of networking (both local and global), storage, and computing services.

16 of 68

16

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

Users access the services through NSDF’s entry points across different providers

The entry points enable

  • interoperability of different applications and storage solutions
  • fast data transfer and caching among data sources

The NSDF architecture integrates a suite of networking (both local and global), storage, and computing services.

17 of 68

17

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

The current NSDF testbed comprises 8 heterogeneous entry points in terms of their connections, type of institutions, and research

18 of 68

18

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

The NSDF architecture integrates a suite of

networking (both local and global),

computing, and storage services

Network

19 of 68

Networking Services

19

National research and education networks enable researchers to exchange data across institutions and domains

20 of 68

Networking Services

20

National research and education networks enable researchers to exchange data across institutions and domains

Can we access and move data efficiently across the entry points?

21 of 68

NSDF-Plugin

21

Minimum resources required to handle large NSDF data transfers

8 cores

30 GB RAM

60 GiB external storage

22 of 68

NSDF-Plugin: High Speed Network

22

23 of 68

NSDF-Plugin Performance

23

We use two benchmarks to measure the capabilities of our NSDF-Plugin service enabling the identification of areas for improvement

and detecting anomalous behaviors

We measure throughput (MiB/s), latency (ms), and package loss (percentage) using Round-Trip Time (RTT) - 3 months

We evaluate the performance of our testbed for large scale scientific data scenarios

24 of 68

NSDF-Plugin Performance

24

Throughput

Using iperf3 to understand the throughput constraints between NSDF entry points

Using owping to understand the latency constraints between NSDF entry points

Latency

Traceroute

Using traceroute to understand the routing pattern between NSDF entry points

Using the XRootD application to validate the throughput constraints between NSDF entry points

Throughput with XRootD

We use two benchmarks to measure the capabilities of our NSDF-Plugin service, enabling the identification of areas for improvement

and detecting anomalous behaviors

25 of 68

NSDF-Plugin Performance: Throughput

25

The NSDF testbed allows us to monitor throughput, latency, and routing between entry points over time, identifying areas for improvement

and detecting anomalous behaviors

Throughput

Using iperf3 to understand the throughput constraints between NSDF entry points

Using owping to understand the latency constraints between NSDF entry points

Latency

Traceroute

Using traceroute to understand the routing pattern between NSDF entry points

Using the XRootD application to validate the throughput constraints between NSDF entry points

Throughput with XRootD

26 of 68

NSDF-Plugin: Throughput Performance

26

We present the point-to-point throughput performance measurements across the entry points in our test bed

27 of 68

NSDF-Plugin: Throughput Performance

27

We collect throughput measurements between the entry points in the testbed routes over three months

28 of 68

NSDF-Plugin: Throughput Performance

28

We observe throughput asymmetry depending on the direction of the data transfer

Bi-directional throughput between Wisconsin to Utah

Wisconsin to Utah

Utah to Wisconsin

29 of 68

NSDF-Plugin: Throughput Performance

29

We observe variability across point-to-point pairs in our testbed

Bi-directional throughput between Clemson to Massachusetts

Clemson to Massachusetts

Massachusetts to Clemson

Bi-directional throughput between Wisconsin to Utah

Wisconsin to Utah

Utah to Wisconsin

30 of 68

NSDF-Plugin Performance: Latency

30

The NSDF testbed allows us to monitor throughput, latency, and routing between entry points over time, identifying areas for improvement

and detecting anomalous behaviors

Throughput

Using iperf3 to understand the throughput constraints between NSDF entry points

Using owping to understand the latency constraints between NSDF entry points

Latency

Traceroute

Using traceroute to understand the routing pattern between NSDF entry points

Using the XRootD application to validate the throughput constraints between NSDF entry points

Throughput with XRootD

31 of 68

NSDF-Plugin: Latency Performance

31

We present the point-to-point latency performance measurements across the entry points in our test bed

32 of 68

NSDF-Plugin: Performance Variability

32

Are the throughput and latency variabilities connected to path instabilities?

Throughput

33 of 68

NSDF-Plugin Performance: Traceroute

33

The NSDF testbed allows us to monitor throughput, latency, and routing between entry points over time, identifying areas for improvement

and detecting anomalous behaviors

Throughput

Using iperf3 to understand the throughput constraints between NSDF entry points

Using owping to understand the latency constraints between NSDF entry points

Latency

Traceroute

Using traceroute to understand the routing pattern between NSDF entry points

Using the XRootD application to validate the throughput constraints between NSDF entry points

Throughput with XRootD

34 of 68

NSDF-Plugin: Traceroute

34

We identify the network hops through which we transfer the data and measure the performance across our entry points

Throughput

35 of 68

NSDF-Plugin: Traceroute

35

We visualize the superposition of all observed routes for the eight entry points of the testbed

Perfsonar reported more than 210 network hops and about the half includes Internet2 (93) and ESnet (13)

36 of 68

NSDF-Plugin: Traceroute

36

We visualize the superposition of all observed routes for the eight entry points of the testbed

Only CloudLab Wisconsin shows alternating routing patterns.

We cannot attribute the variability to paths instability.

37 of 68

NSDF-Plugin Performance

37

The NSDF testbed allows us to monitor throughput, latency, and routing between entry points over time, identifying areas for improvement

and detecting anomalous behaviors

Throughput

Using iperf3 to understand the throughput constraints between NSDF entry points

Using owping to understand the latency constraints between NSDF entry points

Latency

Traceroute

Using traceroute to understand the routing pattern between NSDF entry points

Using the XRootD application to validate the throughput constraints between NSDF entry points

Throughput with XRootD

38 of 68

NSDF-Plugin: Throughput with XRootD

38

We move into a large data scale scientific scenario where we measure the throughput between clients and servers

at different entry points using XRootD

Results from Wisconsin to Utah:

  • The number of copy jobs plays a key role in optimization
  • Fewer streams result in higher performance

Critical to integrate parameters adaptability in our testbed

39 of 68

39

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

A data fabric must be accessible and tightly integrated to coordinate data movement between geographically distributed teams or organizations

Computation

The NSDF architecture integrates a suite of

networking (both local and global),

computing, and storage services

40 of 68

Computing Services: NSDF-Cloud

40

Can a unified API provide scalable resource management

across different providers?

  • We design computing services built on a unified API for handling diverse jobs across platforms
    • Parallel creation/deletion of many VMs by using command-line tools
    • Automatic generation of Ansible inventory files
    • Integration of credentials for multiple providers via configuration file
  • The NSDF-Cloud’s unified APIs, both Python and CLI tools, consist of:

nsdf-cloud

create nodes

get nodes

delete nodes

AWS

Chameleon

CloudLab

Vultr

Jetstream2

41 of 68

NSDF-Cloud Supported Cloud Providers

41

Provider

Type

Credentials

Regions

Stack

Custom Images

AWS

Commercial

Token+Secret

Yes (Int.)

Custom

Yes

Chameleon

Academic

Token

Yes (US)

CHI on OpenStack

Yes*

CloudLab

Academic

Certificate

Yes (US)

Custom

Yes

Vultr

Commercial

Token+IP-Whitelist

Yes (Int.)

Custom

Yes

Jetstream2

Academic

Token

Yes (US)

Atmosphere on OpenStack

Yes*

We enable scalable compute resources across different commercial and academic cloud sites

* Provider accepts user provided images but they will be public

42 of 68

NSDF-Cloud Latency

42

We measure the NSDF-Cloud latency to:

  • Create ad hoc clusters of up to 16 VMs and SSH them → task performed in < 15 minutes
  • Delete a set of up to 16 VMs in a cluster once an experiment is over → task performed in < 5 minutes

43 of 68

NSDF-Cloud Latency

43

We measure the NSDF-Cloud latency to:

  • Create ad hoc clusters of up to 16 VMs and SSH them → task performed in < 15 minutes
  • Delete a set of up to 16 VMs in a cluster once an experiment is over → task performed in < 5 minutes

The scalability is provider dependent, e.g., to create VMs

  • AWS is stable and with low latency
  • Jetstream is unstable and with high latency

44 of 68

44

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

Storage

The NSDF architecture integrates a suite of

networking (both local and global),

computing, and storage services

45 of 68

Storage Services: NSDF-FUSE

45

Can we enable HPC legacy applications to deploy object storage technology on cloud environments?

46 of 68

Storage Services: NSDF-FUSE Capabilities

46

NSDF-FUSE Capabilities:

  • Creation/deletion of buckets
  • Installation of mapping package
  • Mount/unmount buckets as FS
  • Evaluate I/O performance through I/O jobs

NSDF-FUSE is a service for mapping object storage into POSIX namespaces for legacy support

47 of 68

NSDF-FUSE I/O Jobs

47

Job 1

Job 2

Sequential write, large files, 1 writer

Sequential read, large files, 1 reader

Job 3

Job 4

Sequential write, large files, 8 writers

Sequential read, large files, 8 readers

Job 5

Job 6

Random write, small files, 16 writers

Random read, small files, 16 readers

Each pattern mimics possible I/O accesses in real applications on the cloud and at the edge

NSDF-FUSE supports multiple mapping packages, I/O jobs, and cloud platforms

Mapping package

Posix-

compliant

Data mapping

Metadata

location

Goofys

Partial

Direct

In name

GeeseFS

Partial

Direct

In name

JuiceFS

Full

Chunked

In bucket*

ObjectiveFS

Full

Chunked

In bucket

rclone

Partial

Direct

In bucket

s3backer

Full

Chunked

In bucket

s3fs

Partial

Direct

In name

S3QL

Full

Chunked

In bucket

I/O Jobs

Mapping Package

48 of 68

I/O Performance Using NSDF-FUSE

48

Mapping�Package

Cloud A - Peak Throughput [MiB/s]

Cloud B - Peak Throughput [MiB/s]

Job 1

Job 2

Job 3

Job 4

Job 5

Job 6

Job 1

Job 2

Job 3

Job 4

Job 5

Job 6

Goofys

248

546

481

1638

9

28

136

431

356

910

15

78

GeeseFS

248

455

910

585

19

34

136

409

356

146

28

51

JuiceFS

455

327

744

431

13

25

148

47

327

43

11

15

ObjectiveFS

195

315

273

327

41

39

117

240

282

356

62

40

rclone

107

85

372

682

8

16

89

95

372

630

32

47

s3backer

84

81

102

91

62

51

39

130

42

126

29

34

s3fs

74

117

91

136

1

3

34

512

41

585

4

12

s3ql

44

64

56

117

32

9

13

46

6

31

12

9

We deploy NSDF-FUSE to measure peak I/O performance for six I/O jobs on two cloud platforms

Best I/O

49 of 68

49

XenonNT

IceCube

TACC

PRISMS�Materials �Commons�UMICH

CHESS/Cornell

UTK

JHU

UTAH/SCI

MS-CC

SDSC + OSG

MGHPCC + OSN

SNL

IBM Cloud

XSEDE/ACCESS

Jetstream2

CloudBank �(AWS, Azure, …)

CloudLab

Chameleon

Internet2

LLNL

CHPC

CyVerse

We present our NSDF testbed that integrates networking, computing, and storage services that users access through entry points with different providers

NSDF-FUSE allows the user to reach comprehensive conclusions about mapping packages given different data patterns and cloud platforms

NSDF-Cloud facilitates users at any entry level in the deployment of the cloud → one single API can generate a cluster of many VMs across multiple providers

Computation

Storage

http://nationalsciencedatafabric.org/

NSDF-Plugin enables efficient data sharing, transfer, and monitoring across networks while hiding the technical complexity of the process

Network

Reach out for more information!

Michela Taufer - taufer@utk.edu

Valerio Pascucci - valerio.pascucci@utah.edu

50 of 68

The 16th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2023)

50

Eruption on the 2nd of December

51 of 68

Contributions

  • We present the logical structure and services of our NSDF testbed.
  • We evaluate the performance of the three types of services (networking, storage, and computing) for academic and commercial clouds.
  • We discuss the benefits of NSDF services for scientific research.

51

52 of 68

Networking: Geographic overview

52

Geographic overview of our testbed with entry points at different locations, including local resources on campuses at the University of Utah, the University of Michigan, and different academic clouds such as Chameleon, CloudLab, and Jetstream2

53 of 68

Networking: Tests definition

53

These are the tests in charge of the networking validation:

  1. Throughput
  2. Latency
  3. Traceroute
  4. Throughput with XRootD

54 of 68

Networking: Latency results

54

Point-to-point Collectable Metrics with PerfSonar.

Summary distribution of latency measurements for different entry points in our testbed

Bi-directional latency between Clemson to Massachusetts

Clemson to Massachusetts

Massachusetts to Clemson

Bi-directional latency between Wisconsin to Utah

Wisconsin to Utah

Utah to Wisconsin

55 of 68

Networking: Traceroute results

55

Summary of routing patterns measurements for different entry points in our testbed.

Point-to-point Collectable Metrics with PerfSonar.

56 of 68

Networking: XRootD validation

56

Summary of multiple experiments to validate the throughput reported by perfSONAR. We used the XRootD application by changing the number of parallel jobs and numbers of streams parameters, the experiments are:

  • Copy 1 GiB with a single file
  • Copy 1 GiB with 100 files
  • Copy 1 GiB with 1000 files

These results are from Wisconsin to Utah entry points

57 of 68

NSDF-Entry Points

57

We integrate networking services in the NSDF testbed for efficient data sharing and transfer capabilities across networks while hiding the technical complexity of the process

58 of 68

NSDF-Entry Points: Geolocation

58

8 Diverse Entry Points

59 of 68

NSDF-Plugin Performance

59

The NSDF testbed allows us to monitor throughput, latency, and routing between entry points over time, identifying areas for improvement

and detecting anomalous behaviors

Throughput

Using iperf3 to understand the throughput constraints between NSDF entry points

Using owping to understand the latency constraints between NSDF entry points

Latency

Traceroute

Using traceroute to understand the routing pattern between NSDF entry points

Using the XRootD application to validate the throughput constraints between NSDF entry points

Throughput with XRootD

60 of 68

Computing Services

60

Cloud computing capabilities are increasingly supplied through academic and commercial cloud providers

61 of 68

Computing Services

61

No universal or standard interface for common actions (e.g., configuration, launching, and termination of virtual resources) across providers�Using diverse computing resources effectively imposes a significant technical burden on domain scientists and other users

Cloud computing capabilities are increasingly supplied through academic and commercial cloud providers

62 of 68

NSDF-Cloud Latency

62

We measure the NSDF-Cloud latency for:

  • Create ad-hoc clusters of up to 16 VMs and SSH them → task performed in < 15 minutes.
  • Delete a set of up to 16 VMs in a cluster once an experiment is over → task performed in < 5 minute.

NSDF-Cloud facilitates users at any entry level in the deployment of the cloud �→ one single API can generate a cluster of many VMs across multiple providers

63 of 68

Storage Services

63

Cloud Storage Mirrors provide scalable and resilient solutions for data

64 of 68

Storage Services for Legacy Applications

64

Cloud Storage Mirrors provide scalable and resilient solutions for data

How can we enable HPC legacy applications to deploy object storage technology on cloud environments?

65 of 68

Storage Services: NSDF-FUSE

65

  • We design a storage service to address the challenge of directly mounting cloud object storage data into a file system
  • We enable different mapping packages that use Filesystem in USErspace (FUSE) technology serving as bridges to object storage for legacy applications

Can we enable HPC legacy applications to deploy object storage technology on cloud environments?

66 of 68

Storage Services: NSDF-FUSE

66

Users need to understand merits and pitfalls of existing packages when mapping object storage to file systems

  • We enable different mapping packages that use Filesystem in USErspace (FUSE) technology serving as bridges to object storage for legacy applications

67 of 68

NSDF-FUSE Mapping Packages

67

Mapping package

Open

source

Posix-

compliant

Data mapping

Metadata

location

Compression

Consistency

Multi-clients

Reads

Writes

Goofys

Yes

Partial

Direct

In name

No

None

Yes

No

GeeseFS

Yes

Partial

Direct

In name

No

read-after-write

Yes

No

JuiceFS

Yes

Full

Chunked

In bucket*

Yes

close-to-open

Yes

Yes

ObjectiveFS

No

Full

Chunked

In bucket

Yes

read-after-write

Yes

Yes

rclone

Yes

Partial

Direct

In bucket

No

None

Yes

No

s3backer

Yes

Full

Chunked

In bucket

Yes

PUT or DELETE

No

No

s3fs

Yes

Partial

Direct

In name

No

None

Yes

No

S3QL

Yes

Full

Chunked

In bucket

No

copy-on-write

None

No

*JuiceFS offers a dedicated server for the metadata

68 of 68

I/O Performance Using NSDF-FUSE

68

Mapping�Package

Cloud A - Peak Throughput [MiB/s]

Cloud B - Peak Throughput [MiB/s]

Job 1

Job 2

Job 3

Job 4

Job 5

Job 6

Job 1

Job 2

Job 3

Job 4

Job 5

Job 6

Goofys

248

546

481

1638

9

28

136

431

356

910

15

78

GeeseFS

248

455

910

585

19

34

136

409

356

146

28

51

JuiceFS

455

327

744

431

13

25

148

47

327

43

11

15

ObjectiveFS

195

315

273

327

41

39

117

240

282

356

62

40

rclone

107

85

372

682

8

16

89

95

372

630

32

47

s3backer

84

81

102

91

62

51

39

130

42

126

29

34

s3fs

74

117

91

136

1

3

34

512

41

585

4

12

s3ql

44

64

56

117

32

9

13

46

6

31

12

9

We deploy NSDF-FUSE to measure peak I/O performance for six I/O jobs on two cloud platforms

Best I/O

NSDF-FUSE allows the user to reach comprehensive conclusions about mapping packages given different data patterns and cloud platforms