NAMD Wiki: NamdOnInfiniBand

  You are encouraged to improve NamdWiki by adding content, correcting errors, or removing spam.

NAMD is rumored to run on several InfiniBand clusters, although the developers don't have direct access to any production systems. The porting process should be as simple as building the MPI version of Charm on an InfiniBand-aware MPI library.

See also NamdAtTexas, NamdAtNCSA, and NamdOnInfiniPath.

I've ported NAMD to NCSA's T2 (Tungsten-2) cluster.

Tungsten-2 is a a 512-node Dell cluster made up of 526 Dell PowerEdge 1850 systems. Each system is powered by dual 3.6 GHz Nocona Xeon processors with EM64T support and equipped with 6 Gig or DDR-2 Memory. The high-speed interconnect for the system is TopSpin Infiniband subscribed at 3:1. Each system is running Redhat Enterprise Linux Release 3 and is managed via Platform Rocks. The system has a peak performance of 7.4 TFlops and was listed at number 47 of the Top500 Supercomputer list in June of 2005 with a throughput number of 6.118 Tflops.

This build uses mpich based on VMI and the g++ compiler.

First, build charm-5.9 with

cd ~/charm-5.9
vi charm-5.9/src/arch/mpi-linux-amd64/  # change mpiCC to mpicxx
./build charm++ mpi-linux-amd64 --no-build-shared -O -DCMK_OPTIMIZE=1

Then configure and build NAMD with

cd ~/namd2
vi arch/Linux-amd64-MPI.arch  # add "CHARMOPTS = -thread pthreads -memory os"
./config tcl fftw Linux-amd64-MPI
cd Linux-amd64-MPI

In my first attempt, before adding CHARMOPTS, I got a hang or the following error at the end of startup (about when threads start):

Info: Entering startup phase 8 with 143173 kB of memory in use.
Info: Finished startup with 144335 kB of memory in use.

0 - MPI_ISEND : Invalid rank -2
[0]  Aborting program !

This first effort yielded an unimpressive speedup of about 90 on 128 processors (70% efficiency). There is a direct VMI port of Charm++ that may provide better performance than the MPI interface.

Using the TopSpin MPI library was straightforward (no CHARMOPTS modification needed), but the performance was equally disappointing, especially compared to NamdOnInfiniPath, which has excellent scaling even on a dual-core, dual-socket system.

Benchmarks on the forthcoming NCSA Lincoln cluster should prove enlightening.

-JimPhillips (May 19, 2006)