NAMD GPU+ibverbs on multiple nodes: timeout problems

From: Robert Sawko (
Date: Thu Nov 17 2016 - 05:51:21 CST


I am trying to run NAMD on a Firestone cluster i.e. Power8+4xK80
and I am having problems with multiple node runs. With the help of
Jim Phillips, I compiled with XL an ibverbs, smp version of charm and
Power-xLC version of NAMD. I can confirm that I can run on a single node
and observe utilisation of all four GPUs.

Our cluster is using LSF as a batch system and rsh or even ssh between
compute nodes has been switched off by admins. Therefore, Jim advised
to use mpirun so I am using OpenMPI. My understanding is that this is
only to spawn the processes. Additionally, with Power processor there's
a PE and communicator threads processor affinity setting. Also, I added
runscript with LD_LIBRARY_PATH. This is my script:

#BSUB -J namd
#BSUB -oo namd.stdout
#BSUB -eo namd.stderr
#BSUB -q panther
#BSUB -W 01:00
#BSUB -R "span[ptile=4]"
#BSUB -n 8
#BSUB -data /gpfs/fairthorpe/local/HCP004/pxs01/rrs59-pxs01/benchmarks/namd/benchmarks/namd_case

## This is data movement...
rm -rf ${HOME}/namd_on_2nodes 2> /dev/null
mkdir -p ${HOME}/namd_on_2nodes
cd ${HOME}/namd_on_2nodes
bstage in -all

AFFINITY="+commap 0,8,112,120 +pemap 16-111:8.2"

charmrun ++verbose \
    ++runscript ./runscript \
    +p48 ++ppn 6 \
    ++mpiexec ++remote-shell mpiexec \
    ${NAMDBIN} ++verbose +idlepoll +devices 0,1,2,3 ${AFFINITY} \
    29.conf > log.namd2

I get a timeout error from Charm.

Charmrun> charmrun started...
Charmrun> mpiexec started
Charmrun> node programs all started
Charmrun> error attaching to node '':
Timeout waiting for node-program to connect

I am attaching also the standard output from NAMD.

There's clearly a problem with connection. I have found similar problems
on the mailing list like for instance here:
but I am not sure if they got resolved.

Please let know if you can assist on this.

Best wishes,

Dr Robert Sawko
Research Staff Member, IBM
Daresbury Laboratory
Keckwick Lane, Warrington
United Kingdom
Email (IBM):
Email (STFC):
Phone (office): +44 (0) 1925 60 3967
Phone (mobile): +44 778 830 8522
Profile page:
--Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:20:48 CST