Keeneland FAQ
What is the process for transferring from FORGE?
What is the difference between the two Keeneland Systems: KID and KFS?
What is the maximum length of a job?
Can we add more GPUs per node to KFS?
Is it configurable by user applications?
What is the impact on performance, reliability, and memory capacity?
How do I use ‘modules’ on Keeneland?
Is it possible to add software to Keeneland?
How do I use MVAPICH for MPI rather than OpenMPI?
How do I use MVAPICH’s CUDA support?
What OpenACC compilers are available?
Most Forge users with remaining allocations will be transitioned to KFS. Some Forge users will be transitioned to KIDS. We recommend that new users review the Keeneland documentation and, if possible, watch the video of the Keeneland Tutorial on the Keeneland web site.
KIDS and KFS are very similar systems. KFS uses newer server architecture, including 8-core Sandy Bridge CPUs and FDR InfiniBand, but both systems have nodes with two CPUs and three NVIDIA M2090 GPUs. They also both run the same software.
The maximum length of a job is 48 hours. Extending the maximum length of a job is something we can consider.
KIDS: Go to http://keeneland.gatech.edu/support/new-account and follow the directions,
KFS: KFS is an XSEDE allocated resource. You must follow the process outlined at https://www.xsede.org/allocations.
We would like to do a Kepler upgrade and are convinced that it would be a very cost effective performance boost, but we currently do not have funding to do so. The decision to upgrade would require additional funds from NSF.
No, the physical size and shape of the nodes imposes a limit on the number of GPUs per node.
Yes.
This has been a continuing issue that we have discussed many times, and so far the answer is no. We can open it up for more discussion, but we don’t think this is a good idea to gain a small performance gain.
With ECC on, 12.5% of the GPU memory is used for for ECC bits, leaving 5.25 GB for each GPU.
Fermi is the first GPU to support Error Correcting Code (ECC) based protection of data in memory. ECC was requested by GPU computing users to enhance data integrity in high performance computing environments. ECC is a highly desired feature in areas such as medical imaging and large-scale cluster computing.
Naturally occurring radiation can cause a bit stored in memory to be altered, resulting in a soft error. ECC technology detects and corrects single-bit soft errors before they affect the system. Because the probability of such radiation induced errors increase linearly with the number of installed systems, ECC is an essential requirement in large cluster installations.
Fermi supports Single-Error Correct Double-Error Detect (SECDED) ECC codes that correct any single bit error in hardware as the data is accessed. In addition, SECDED ECC ensures that all double bit errors and many multi-bit errors are also detected and reported so that the program can be re-run rather than being allowed to continue executing with bad data.
Fermi’s register files, shared memories, L1 caches, L2 cache, and DRAM memory are ECC protected, making it not only the most powerful GPU for HPC applications, but also the most reliable. In addition, Fermi supports industry standards for checking of data during transmission from chip to chip. All NVIDIA GPUs include support for the PCI Express standard for CRC check with retry at the data link layer. Fermi also supports the similar GDDR5 standard for CRC check with retry (aka “EDC”) during transmission of data across the memory bus.
There’s always a price to pay for ECC, even when implemented in hardware — and especially when ECC is implemented as extensively as it is throughout Fermi chips. NVIDIA estimates that performance will suffer by 5% to 20%, depending on the application. For critical programs requiring absolute accuracy, that amount of overhead is of no concern. It’s certainly much better than the overhead imposed by existing work-arounds.
For KIDS, see: http://keeneland.gatech.edu/software/tools.
For KFS, see: https://www.xsede.org/gatech-keeneland#computing.
We currently license the PGI, Intel, and GNU compilers; the Allinea DDT debugger; and a few applications, like AMBER. Users can request additional licenses. However, the Keeneland Project has limited funds for software licenses. We will have to evaluate how much usage the software package will have and balance the cost with our limited funds.
Issue the following command: module swap openmpi mvapich2.
The above makes the mvapich environment available. After that, rebuild your code, using the correct flags, and pointing to the appropriate headers and libraries. Finally, run using mpiexec or mpiexec.mpirun_rsh as opposed to mpirun.
Set the MV2_USE_CUDA environment variable to 1 when running your program. For instance, you could issue your mpirun command as something like:
MV2_USE_CUDA=1 mpirun $PWD/exe
The mpirun (or equivalent) command for some MPI implementations also allows defining and propagating environment variables to all processes in the program. See the man page of the mpirun command for the MPI implementation you use for details.
This can also be done programmatically using putenv() from within the program’s code. However, if the environment variable is set programmatically it must be done before the call to MPI_init().
Currently, we have a current version PGI. We are developing a research compiler designed to give the end user more control over the process.