Re: mpi problems on opteron

From: Kyle Gustafson (
Date: Mon Jul 25 2005 - 19:04:56 CDT


Thanks for the reply. I've not decided between ssh and rsh
yet. Do you run with MPI?


---- Original message ----
>Date: Mon, 25 Jul 2005 21:00:04 -0300
>From: Leandro Martínez <>
>Subject: Re: namd-l: mpi problems on opteron
>To: Kyle Gustafson <>
>Hi Kyle,
>We have a cluster similar to yours, but running fedora.
Probably the
>problem is that you need to set ssh to be used without passwords
>between the nodes. We are actually using rsh in our nodes instead
>because it was easier to configure. You need to put in your
>home directory a file named .rhosts containing
> username
> username
> username
> username
> username
>and this file shoud have the permisions changed by
>chown chmod og-rwx .rhosts
>This file must be in your home directory in all nodes (in our
case all
>nodes share the same /home, so it was simpler)
>You can search for better documentation on the web on that,
I'm not
>quite and expert on this subject, I only did what was
necessary to get
>namd running.
>Leandro Martinez
>Institute of Chemistry
>State University of Campinas
>On 7/25/05, Kyle Gustafson <> wrote:
>> Hi all,
>> I have an 18 opteron cluster running SuSE 2.4.21-143-numa
>> I'm trying to install NAMD, which requires me to install
>> After ./build charm++ mpi-linux-amd64 -nobs -O -DCMK_OPTIMIZE
>> I ran megatest. !!All of the one processor tests work fine!!,
>> but with +p2, I get the error below, where it looks like
>> charmrun is unable to use ssh. I can ssh back and forth from
>> any one node to any other, so I don't understand how this
>> problem could occur, because I don't know enough about ssh and
>> charm++. It seems like charm++ doesn't have access to the ssh
>> keys, but this seems crazy. My .nodelist file reads, where
>> head is the master and node00x is a slave. The nodelist file
>> is located in the HOME/charm directory, but I also tried
>> putting .nodelist in the megatest directory.
>> group main
>> host head ++shell ssh
>> host node001 ++shell ssh
>> host node002 ++shell ssh
>> host node003 ++shell ssh
>> host node004 ++shell ssh
>> host node005 ++shell ssh
>> host node006 ++shell ssh
>> host node007 ++shell ssh
>> host node008 ++shell ssh
>> This is the error when I charmrun.
>> I greatly appreciate your attention.
>> head:/home/namd2/NAMD_2.5_Source/charm/tests/charm++/megatest
>> # ./charmrun +p2 ./pgm
>> Running on 2 processors: ./pgm
>> 26005: ssh_exchange_identification: Connection closed by
>> remote host
>> p0_26000: p4_error: Child process exited while making
>> connection to remote process on head: 0
>> Kyle B. Gustafson
>> Department of Physics
>> University of Maryland
>> Box 45
>> 082 Regents Drive
>> College Park, MD 20742

Kyle B. Gustafson
Department of Physics
University of Maryland
College Park, MD USA

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:59 CST