1 of 1

Introduction

Method

Evaluation

Conclusions

Links

Introduction

In the world of high-performance computing (HPC), efficient data management is critical to achieving optimal system performance. One popular solution for managing large-scale parallel file systems in HPC environments is Lustre, an open-source, distributed file system designed for high-performance computing and big data applications. In this research project, we aimed to create a simulated Lustre File Management System and evaluate its I/O performance for various workloads. Specifically, we focused on measuring the I/O speeds for three different write workloads and two different read workloads. By conducting these tests, we hoped to gain insights into Lustre's capabilities and limitations for handling different types of data-intensive workloads.

Background

Machine Learning in HPC environments is a growing field. Given technologies like chat GPT and google's new AI “chatbots” which are trained on large amounts of data.
As HPC get larger and more powerful, we see the bottleneck become the transfer of data. Even with current optimization methods, we see large gaps where improvements could be made.
The main goal of optimization is to increase the performance of these systems. I/O operations take up significantly more time as components become more powerful in each node.
Using Filebench, we can measure read/write operations with a simulated HPC configuration utilizing Lustre

Setup/Configuration

Lustre File Management System

�-Lustre automatically sorts/configures these nodes with default parameters (these can be adjusted and I will continue to do research on how these parameters affect the system)�

Lustre Configuration

[ 3 GB RAM | 20 GB M.2SSD | 3 CPU ]

-Metadata Target (MDT) x1

-Management Target (MGT) x1�-Object Storage Target (OST) x1�-Client Target x1 �

Filebench Configuration

-Filebench provides a large set of workloads for testing file systems and applications.

-I utilized the following:

File Micro Read: Single threaded random 2KB read of 10GB file
Single Stream Read: Single Stream sequential 1MB read of 5GB file
File Micro Write: Single threaded async 2KB random writes on a 1GB file
File Micro Write Dsync: Single threaded synchronous random writes on 1GB file
Single Stream Write: Single threaded random 1MB writes on a 1GB file

Read Operations:

I tested 2 separate read workloads taken from Filebench (they were listed in method)

https://github.com/filebench/filebench

https://www.lustre.org/

In conclusion, Lustre performs better in write operations, achieving higher throughput in MB/s with significantly lower average write operation times. These findings have important implications for optimizing Lustre file system configurations and workloads in HPC environments. ��Future research can investigate Lustre's performance with different workloads, hardware configurations, and in comparison with other file systems commonly used in HPC. Such research can help further optimize Lustre's performance and inform decision-making in selecting the most suitable file system for specific HPC use cases.

Lustre Write Operations Comparison

Lustre Read Operations Comparison

Motivations

Lustre File System I/O Analysis

Connor Carroll, UNC Charlotte

Md. Hasanur Rashid, UNC Charlotte

Write Operations:

I tested 3 separate write workloads taken from Filebench (they were listed in method.

Findings

-Write workloads tend to have a higher throughput than read workloads.

-Write operations tend to take longer per operation than read operations.

-Lustre performs better in write operations, achieving higher throughput in MB/s compared to read operations, and with significantly lower average write operation times.

Future Work (Ongoing)

Future research could use machine learning and CAPES to optimize Lustre's performance for data-intensive workloads. These advanced techniques could identify the best configuration settings based on the hardware and workload used, further improving Lustre's efficiency.