From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Wed Dec 04 2013 - 07:36:48 CST
This is another issue. The charmrun ibverbs stuff isn't working with every
infiniband hardware. Therefore one would usually build namd against mpi.
This you try "unset SGE_ROOT" and starting with mpiexec? Do also try to
leave out charmrun at all and simply use mpirun.
Norman Geist.
Von: Anna Gorska [mailto:gvalchca.agr_at_gmail.com]
Gesendet: Mittwoch, 4. Dezember 2013 12:52
An: Norman Geist
Betreff: Re: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Hello,
I identified the problem, although don't know how to fix it.
You were right it works also without mpi as you suggested:
charmrun ++verbose ++nodelist namd-machines-test +p10 namd2 conf > out
but than it runs on requested nodes and cpus for about a minute and crashes
with:
Charmrun: error on requested socket-
Socket closed before recv.
It repeats always - independently on the number of CPUs/memory/nodes,
Regards,
Anna Gorska
If you use the Sun Grid Engine, try "unset SGE_ROOT" within your jobscript,
before hitting the mpirun/charmrun. It's similar on our cluster and seems to
be related to the sge support within openmpi.
Norman Geist.
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Anna Gorska
Gesendet: Dienstag, 3. Dezember 2013 16:24
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Hello,
thank you for the quick response although it still doesn't work. Without MPI
it does not even produce error message, but you were right,
I set the MPI environment variable to give the file with hosts to the mpi.
But still it runs only on the one node.
My new command looks like this:
charmrun ++mpiexec ++remote-shell mpirun +p124 namd2 conf > out
Regards,
Anna Gorska
Of course, if you use mpiexec, it is expected that the queuing system
provides the list of machines to the mpirun directly, so the nodelist is not
used. Try
charmrun ++nodelist namd-machines +p104 namd2 in > out
And read the instructions for using mpiexec again, as I don't know it by
heart now.
Norman Geist.
Von: <mailto:owner-namd-l_at_ks.uiuc.edu> owner-namd-l_at_ks.uiuc.edu [
<mailto:owner-namd-l_at_ks.uiuc.edu> mailto:owner-namd-l_at_ks.uiuc.edu] Im
Auftrag von Anna Gorska
Gesendet: Dienstag, 3. Dezember 2013 12:26
An: <mailto:namd-l_at_ks.uiuc.edu> namd-l_at_ks.uiuc.edu
Betreff: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Hello,
I try to run a NAMD (using NAMD_2.9_Linux-x86_64 version) on multiple nodes
on SGE cluster.
I am able to generate adequate file with list of nodes subscribed to me by
the queuing system,
but the NAMD runs always only on one node taking the specified number of
cores -
it behaves as if the ++nodelist was not there at all.
This is the command I use:
charmrun ++mpiexec ++remote-shell mpirun ++nodelist namd-machines +p104
namd2 in > out
and the namd-machines file looks as follows:
group main ++pathfix /step2_2_tmp
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node503
host node503
host node503
host node503
host node503
host node503
Sincerely,
Anna Gorska
____________
PHD student
Algorithms in Bioinformatics
University of Tuebingen
Germany
_____
Diese E-Mail ist frei von Viren und Malware, denn der
<http://www.avast.com/> avast! Antivirus Schutz ist aktiv.
_____
Diese E-Mail ist frei von Viren und Malware, denn der
<http://www.avast.com/> avast! Antivirus Schutz ist aktiv.
--- Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv. http://www.avast.com
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:58 CST