Implementing a Modern Tri-Use Computing Cluster
for Educational and Research Purposes
Ervin Pangilinan, Stefan Mykytyn, Willem Wilcox, Dr. Andrew J. Pounds
Department of Computer Science, Mercer University, 1501 Mercer University Dr., Macon, GA 31207
Introduction
We present Popcorn, a modern redesign of the computing cluster that was formerly located in the old Computer Science Building. The former Olympus Computing Cluster was configured for distributed parallel computing across all of the systems in the CS computer labs. It enabled users to store, develop, and execute distributed memory parallel applications across as many as 64 nodes. As such, it was used to complete calculations for several publications and conference presentations across the fields of heterogeneous GPU computing, self-tuning HPC applications, and chemical physics[1]. This present work describes the development of a new system that will reinstate these HPC resources for educational and research purposes.
A
Acknowledgements
We would like to acknowledge the hard work of Dr. Andrew J. Pounds for extensive guidance, expertise, and prior experience. Further, we thank Jesse Sowell, the System Administrator of the Computer Science Department for support with numerous networking utilities, Linux tools, and general knowhow.
References
[1] Andrew J. Pounds, Rajeev Nalluri, and Bennie L. Coleman. 2005. The development of a tri-use cluster for general computer education, high performance computing education, and Computationally Intensive Research. Proceedings of the 43rd annual Southeast regional conference - Volume 1 (March 2005). DOI:http://dx.doi.org/10.1145/1167350.1167446
[2] SergioMEV. (2024, March 18). Slurm for Dummies. GitHub Repository. Retrieved from https://github.com/SergioMEV/slurm-for-dummies
Abstract
In 2005, Mercer University’s computer science department had its own computing cluster for general computer science education, high performance computing (HPC) education, and performing computationally-intensive research. Due to changes in the associated operating systems (both Linux and MS Windows) and the relocation of the computer science department to the Willet Science Center, this cluster functionality was lost. This project aims to reimplement the tri-use cluster with newer operating systems and computers. Our new computing cluster will have the ability to dual-boot between Windows and Linux for students needing classroom computers with these capabilities while routinely booting into Linux at night for research level HPC calculations. The scheduled OS swapping is achieved through Linux and Windows scripting to change the operating system upon a reboot. Secondly, there needs to be authentication for students accessing the cluster and a filesystem for students to store their programs, which was resolved through implementing the Network Information Service (NIS) and Network File System (NFS). To handle job scheduling for the research cluster the Simple Linux Utility for Resource Management (SLURM) has been implemented.
Conclusions
Current Configuration Diagram
Future Plans
While Popcorn is a more than a capable configuration for Mercer’s HPC needs, additional improvements and performance are on the horizon. Furthermore, as the needs and requirements of Mercer students grow and change, Popcorn must also grow & change as needed.
Potential Improvements Include:
Cluster Components
In order to meet the tri-use demands (Classroom, HPC, & Research), the following needs were determined.
Dual Booting Capability:
File and System Management:
Job Scheduling:
Dual Booting Capability
The majority of the work is handled in the Linux system. When rebooting, the GRUB boot loader will select Rocky Linux as its default choice. In order to assert a different selection, we implemented a short BASH script that executes a reboot and manually selects a choice #2, which is the Windows Operating System, while Windows uses the built-in task scheduler.
Booting Automatically:
In order to automatically switch from Linux to Windows every morning at 7am, we utilized the cronjob Linux feature. It enables execution of commands at a specified time. Thus, we simply schedule our script to be executed every morning just before 7am.
Figure 2: This is the current configuration for Popcorn’s prototype. As of now, there is one head node and one worker node.
File and System Management
This feature is what enables Popcorn to recreate former time-shared server use in the Computer Science Department, for both faculty and students. These two components, NIS and NFS, work hand-in-hand.
Network Information Service (NIS):
NIS is a client-server directory service protocol which manages system configuration information across the server. It has a centralized repository with all user data, such as usernames and passwords. Users don’t “see” or interact with NIS like they do NFS, but it is what allows the system to work cohesively.
Network File System (NFS):
NFS is a distributed file system protocol which allows users to access files stored on Popcorn from their own computer, input their login information, and access their files as if it was on their local machine. It also enables administrators to oversee users and update permissions, ideal for a educational university environment and coding sandbox.
Job Scheduling
SLURM is a job scheduler used at several supercomputing sites that is both compatible with Rocky Linux and provides backward compatibility with PBS/Torque.
Munge:
Munge was configured by first installing the package onto the head node and running its Munge key creation script. Afterwards, we installed the Munge package onto the worker node and ran the same script. Then, we copied over the same Munge key from the head node to replace the the key on the worker node.
SLURM:
We configured SLURM by first downloading its package from the official website. We then used its configurator file to configure Popcorn’s cluster architecture. This included the number of worker nodes, processors per node, and threads per processor. This configurator file would produce a makefile we would then run to build SLURM. Further testing is needed for SLURM..
Figure 1: Shown above is the reboot script for handling Popcorn’s dual-booting capabilities, and the Linux cronjob feature for scheduling routine tasks.