Keeneland FAQ

http://keeneland.gatech.edu

General

What is the process for transferring from FORGE?

What is the difference between the two Keeneland Systems: KID and KFS?

What is the maximum length of a job?

How do I get an account?

Architecture

Will KFS have Keplers?

Can we add more GPUs per node to KFS?

ECC and Keeneland GPUs

Is ECC enabled for KFS?

Is it configurable by user applications?

What is the impact on performance, reliability, and memory capacity?

Software

How do I use ‘modules’ on Keeneland?

Is it possible to add software to Keeneland?

How do I use MVAPICH for MPI rather than OpenMPI?

How do I use MVAPICH’s CUDA support?

What OpenACC compilers are available?

General

What is the process for transferring from FORGE?

Most Forge users with remaining allocations will be transitioned to KFS.  Some Forge users will be transitioned to KIDS.  We recommend that new users review the Keeneland documentation and, if possible, watch the video of the Keeneland Tutorial on the Keeneland web site.

What is the difference between the two Keeneland Systems: KID and KFS?

KIDS and KFS are very similar systems.  KFS uses newer server architecture, including 8-core Sandy Bridge CPUs and FDR InfiniBand, but both systems have nodes with two CPUs and three NVIDIA M2090 GPUs.  They also both run the same software.

What is the maximum length of a job?

The maximum length of a job is 48 hours.  Extending the maximum length of a job is something we can consider.

How do I get an account?

KIDS: Go to http://keeneland.gatech.edu/support/new-account and follow the directions,

KFS: KFS is an XSEDE allocated resource.  You must follow the process outlined at https://www.xsede.org/allocations.

Architecture

Will KFS have Keplers?

We would like to do a Kepler upgrade and are convinced that it would be a very cost effective performance boost, but we currently do not have funding to do so.  The decision to upgrade would require additional funds from NSF.

Can we add more GPUs per node to KFS?

No, the physical size and shape of the nodes imposes a limit on the number of GPUs per node.

ECC and Keeneland GPUs

Is ECC enabled for KFS?

Yes.

Is it configurable by user applications?

This has been a continuing issue that we have discussed many times, and so far the answer is no.  We can open it up for more discussion, but we don’t think this is a good idea to gain a small performance gain.

What is the impact on performance, reliability, and memory capacity?

With ECC on, 12.5% of the GPU memory is used for for ECC bits, leaving 5.25 GB for each GPU.

Fermi is the first GPU to support Error Correcting Code (ECC) based protection of data in memory. ECC was requested by GPU computing users to enhance data integrity in high performance computing environments. ECC is a highly desired feature in areas such as medical imaging and large-scale cluster computing.

                        

Naturally occurring radiation can cause a bit stored in memory to be altered, resulting in a soft error. ECC technology detects and corrects single-bit soft errors before they affect the system. Because the probability of such radiation induced errors increase linearly with the number of installed systems, ECC is an essential requirement in large cluster installations.

                        

Fermi supports Single-Error Correct Double-Error Detect (SECDED) ECC codes that correct any single bit error in hardware as the data is accessed. In addition, SECDED ECC ensures that all double bit errors and many multi-bit errors are also detected and reported so that the program can be re-run rather than being allowed to continue executing with bad data.

                        

Fermi’s register files, shared memories, L1 caches, L2 cache, and DRAM memory are ECC protected, making it not only the most powerful GPU for HPC applications, but also the most reliable. In addition, Fermi supports industry standards for checking of data during transmission from chip to chip. All NVIDIA GPUs include support for the PCI Express standard for CRC check with retry at the data link layer. Fermi also supports the similar GDDR5 standard for CRC check with retry (aka “EDC”) during transmission of data across the memory bus.

There’s always a price to pay for ECC, even when implemented in hardware — and especially when ECC is implemented as extensively as it is throughout Fermi chips. NVIDIA estimates that performance will suffer by 5% to 20%, depending on the application. For critical programs requiring absolute accuracy, that amount of overhead is of no concern. It’s certainly much better than the overhead imposed by existing work-arounds.  

Software

How do I use ‘modules’ on Keeneland?

For KIDS, see:  http://keeneland.gatech.edu/software/tools.

For KFS, see:  https://www.xsede.org/gatech-keeneland#computing.

Is it possible to add software to Keeneland?

We currently license the PGI, Intel, and GNU compilers; the Allinea DDT debugger; and a few applications, like AMBER.  Users can request additional licenses.  However, the Keeneland Project has limited funds for software licenses.  We will have to evaluate how much usage the software package will have and balance the cost with our limited funds.

How do I use MVAPICH for MPI rather than OpenMPI?

Issue the following command:  module swap openmpi mvapich2.  

The above makes the mvapich environment available.  After that, rebuild your code, using the correct flags, and pointing to the appropriate headers and libraries.  Finally, run using mpiexec or mpiexec.mpirun_rsh as opposed to mpirun.

How do I use MVAPICH’s CUDA support?

Set the MV2_USE_CUDA environment variable to 1 when running your program.  For instance, you could issue your mpirun command as something like:

    MV2_USE_CUDA=1 mpirun $PWD/exe

The mpirun (or equivalent) command for some MPI implementations also allows defining and propagating environment variables to all processes in the program.  See the man page of the mpirun command for the MPI implementation you use for details.

This can also be done programmatically using putenv() from within the program’s code.  However, if the environment variable is set programmatically it must be done before the call to MPI_init().

What OpenACC compilers are available?

Currently, we have a current version PGI.  We are developing a research compiler designed to give the end user more control over the process.