From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Dec 05 2013 - 04:57:18 CST
Maybe to verifiy that your namd build is working at all, write a machinefile
manually and run namd from commandline. A machinefile simply contain node
names, like:
C01
C02
C03
C04
Then start like:
mpirun –machinefile machinefile –np 20 namd2 conf > out 2> error
Did you noticed my typo
“-np $TMPDIR/machines”
Should of course be
-machinefile $TMPDIR/machines
Mit freundlichen Grüßen
Norman Geist.
Von: Anna Gorska [mailto:gvalchca.agr_at_gmail.com]
Gesendet: Donnerstag, 5. Dezember 2013 11:44
An: Norman Geist
Betreff: Re: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Everything is bad, mpi just runs it twice in NAMD output there is twice the
same step:
OPENING EXTENDED SYSTEM TRAJECTORY FILE
TCL: Setting parameter constraintScaling to 0.5
TCL: Running for 200000 steps
PRESSURE: 0 597.333 103.702 184.113 135.273 640.616 -180.201 172.061
-244.037 329.661
GPRESSURE: 0 638.681 100.807 234.545 104.568 709.205 -175.087 129.792
-190.157 410.687
ETITLE: TS BOND ANGLE DIHED IMPRP
ELECT
VDW BOUNDARY MISC KINETIC TOTAL
TEMP POTENT
IAL TOTAL3 TEMPAVG PRESSURE GPRESSURE
VOLUME PRESSAVG
GPRESSAVG
ENERGY: 0 724.7455 2809.4883 4471.3331 2.6115
-255531.9064 2
0541.9404 343.2310 0.0000 34858.3561 -191780.2005
267.0207 -226638.5
566 -191752.7939 267.0207 522.5365 586.1911
691151.1680 522.5365
586.1911
OPENING EXTENDED SYSTEM TRAJECTORY FILE
TCL: Setting parameter constraintScaling to 0.5
TCL: Running for 200000 steps
PRESSURE: 0 695.049 118.602 177.091 134.388 795.617 -215.887 171.065
-247.805 508.959
GPRESSURE: 0 736.37 116.273 227.721 104.245 864.461 -211.449 129.008 -194.59
589.271
ETITLE: TS BOND ANGLE DIHED IMPRP
ELECT
VDW BOUNDARY MISC KINETIC TOTAL
TEMP POTENT
IAL TOTAL3 TEMPAVG PRESSURE GPRESSURE
VOLUME PRESSAVG
GPRESSAVG
ENERGY: 0 724.7455 2809.4883 4471.3331 2.6115
-255531.9064 2
0541.9404 171.6155 0.0000 34858.3561 -191951.8160
267.0207 -226810.1
721 -191924.3864 267.0207 666.5418 730.0339
691151.1680 666.5418
730.0339
Maybe I should specify something for NAMD ?
Anna Gorska
And does namd write a simulation log?
Norman Geist.
Von: Anna Gorska [mailto:gvalchca.agr_at_gmail.com]
Gesendet: Donnerstag, 5. Dezember 2013 11:14
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: Running NAMD on SGE cluster - multiple modes and cores
This is what I got:
[proxy:0:0_at_node502] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
[proxy:0:0_at_node502] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error stat
us
[proxy:0:0_at_node502] main (./pm/pmiserv/pmip.c:226): demux engine error
waiting for event
[mpiexec_at_node502] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes
terminated badly; aborting
[mpiexec_at_node502] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
waiting for completion
[mpiexec_at_node502] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiti
ng for completion
[mpiexec_at_node502] main (./ui/mpich/mpiexec.c:405): process manager error
waiting for completion
Regards,
Anna Gorska
Something in the namd log? Also, to catch the stderr output you can append
“2> errors” to your call.
Norman Geist.
Von: Anna Gorska [ <mailto:gvalchca.agr_at_gmail.com>
mailto:gvalchca.agr_at_gmail.com]
Gesendet: Donnerstag, 5. Dezember 2013 11:01
An: Norman Geist
Betreff: Re: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Ok,
it runs on both nodes parallel for some time and finishes with:
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
Regards,
Anna Gorska
Don’t your SGE provide a machinefile? I usually use something like
mpirun –np $NSLOTS –np $TMPDIR/machines namd2 conf > out
Norman Geist.
Von: Anna Gorska [ <mailto:gvalchca.agr_at_gmail.com>
mailto:gvalchca.agr_at_gmail.com]
Gesendet: Mittwoch, 4. Dezember 2013 15:47
An: Norman Geist
Betreff: Re: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Hello,
running like this with SGE_UNSET via qsub
charmrun ++verbose ++mpiexec ++remote-shell /usr/bin/mpirun +p20 namd2
conf > out
makes namd to run only on one node (although the node file is proper).
If I run from qsub - runs on one sever and doesn’t crash.
mpirun -np 20 namd2 conf > out
I am using the SGE parallel environment,
Regards,
Anna Gorska
This is another issue. The charmrun ibverbs stuff isn’t working with every
infiniband hardware. Therefore one would usually build namd against mpi.
This you try “unset SGE_ROOT” and starting with mpiexec? Do also try to
leave out charmrun at all and simply use mpirun.
Norman Geist.
Von: Anna Gorska [ <mailto:gvalchca.agr_at_gmail.com>
mailto:gvalchca.agr_at_gmail.com]
Gesendet: Mittwoch, 4. Dezember 2013 12:52
An: Norman Geist
Betreff: Re: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Hello,
I identified the problem, although don’t know how to fix it.
You were right it works also without mpi as you suggested:
charmrun ++verbose ++nodelist namd-machines-test +p10 namd2 conf > out
but than it runs on requested nodes and cpus for about a minute and crashes
with:
Charmrun: error on requested socket—
Socket closed before recv.
It repeats always - independently on the number of CPUs/memory/nodes,
Regards,
Anna Gorska
If you use the Sun Grid Engine, try „unset SGE_ROOT“ within your jobscript,
before hitting the mpirun/charmrun. It’s similar on our cluster and seems to
be related to the sge support within openmpi.
Norman Geist.
Von: <mailto:owner-namd-l_at_ks.uiuc.edu> owner-namd-l_at_ks.uiuc.edu [
<mailto:owner-namd-l_at_ks.uiuc.edu> mailto:owner-namd-l_at_ks.uiuc.edu] Im
Auftrag von Anna Gorska
Gesendet: Dienstag, 3. Dezember 2013 16:24
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Hello,
thank you for the quick response although it still doesn’t work. Without MPI
it does not even produce error message, but you were right,
I set the MPI environment variable to give the file with hosts to the mpi.
But still it runs only on the one node.
My new command looks like this:
charmrun ++mpiexec ++remote-shell mpirun +p124 namd2 conf > out
Regards,
Anna Gorska
Of course, if you use mpiexec, it is expected that the queuing system
provides the list of machines to the mpirun directly, so the nodelist is not
used. Try
charmrun ++nodelist namd-machines +p104 namd2 in > out
And read the instructions for using mpiexec again, as I don’t know it by
heart now.
Norman Geist.
Von: <mailto:owner-namd-l_at_ks.uiuc.edu> owner-namd-l_at_ks.uiuc.edu [
<mailto:owner-namd-l_at_ks.uiuc.edu> mailto:owner-namd-l_at_ks.uiuc.edu] Im
Auftrag von Anna Gorska
Gesendet: Dienstag, 3. Dezember 2013 12:26
An: <mailto:namd-l_at_ks.uiuc.edu> namd-l_at_ks.uiuc.edu
Betreff: namd-l: Running NAMD on SGE cluster - multiple modes and cores
Hello,
I try to run a NAMD (using NAMD_2.9_Linux-x86_64 version) on multiple nodes
on SGE cluster.
I am able to generate adequate file with list of nodes subscribed to me by
the queuing system,
but the NAMD runs always only on one node taking the specified number of
cores -
it behaves as if the ++nodelist was not there at all.
This is the command I use:
charmrun ++mpiexec ++remote-shell mpirun ++nodelist namd-machines +p104
namd2 in > out
and the namd-machines file looks as follows:
group main ++pathfix /step2_2_tmp
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node508
host node503
host node503
host node503
host node503
host node503
host node503
Sincerely,
Anna Gorska
____________
PHD student
Algorithms in Bioinformatics
University of Tuebingen
Germany
_____
Diese E-Mail ist frei von Viren und Malware, denn der
<http://www.avast.com/> avast! Antivirus Schutz ist aktiv.
_____
Diese E-Mail ist frei von Viren und Malware, denn der
<http://www.avast.com/> avast! Antivirus Schutz ist aktiv.
_____
Diese E-Mail ist frei von Viren und Malware, denn der
<http://www.avast.com/> avast! Antivirus Schutz ist aktiv.
_____
Diese E-Mail ist frei von Viren und Malware, denn der
<http://www.avast.com/> avast! Antivirus Schutz ist aktiv.
_____
Diese E-Mail ist frei von Viren und Malware, denn der
<http://www.avast.com/> avast! Antivirus Schutz ist aktiv.
_____
Diese E-Mail ist frei von Viren und Malware, denn der
<http://www.avast.com/> avast! Antivirus Schutz ist aktiv.
--- Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv. http://www.avast.com
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:58 CST