From: Shubhra Ghosh Dastidar (sgducd_at_gmail.com)
Date: Tue May 28 2013 - 02:15:15 CDT
I am trying to run namd through infiniband.
First I tried the multicore version, which runs smoothly on 32 cores being
restricted within a node.
Then I tried the TCP version (which uses ethernet), which runs across
multiple nodes, e.g. total 32 cores (16 cores from node-1 and 16 cores from
node-2).
Then I tried the infiniband version and also infiniband-smp version both.
If the job is restricted within the 32 cores on one node then they run
smoothly.
But if it is asked to run across multiple nodes (i.e. communicating through
infiniband) then I get the error...............the last few lines are the
following:
Charmrun> All clients connected.
Charmrun> IP tables sent.
Charmrun> node programs all connected
Charmrun> started all node programs in 3.995 seconds.
Charmrun: error on request socket--
Socket closed before recv.
Can anyone help?
The execution command which I am using is the following:
~/NAMD_2.9_Linux-x86_64-ibverbs/charmrun ++p 16 ++verbose ++remote-shell
ssh ++nodelist nodelist ~/NAMD_2.9_Linux-x86_64-ibverbs/namd2 namd-input
(inifiniband has been tested with other program ,e.g. CHARMM-37, which
seems to be working fine)
Regards
-- Dr. Shubhra Ghosh Dastidar
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:14 CST