AW: Running Charmrun/NAMD with more than 10 processes fails

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Dec 16 2014 - 04:06:49 CST

Search for ssh maxstartups and you will find it ;)

It has to be changed in /etc/ssh/sshd_config

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von K Miura
> Gesendet: Dienstag, 16. Dezember 2014 09:54
> An: namd-l_at_ks.uiuc.edu
> Betreff: namd-l: Running Charmrun/NAMD with more than 10 processes
> fails
>
> I am using NAMD 2.10 on two MacPro 2014 nodes with MacOSX 10.9.
> I downloaded MacOSX-x86_64-netlrts binary from the NAMD web site, and
> put the program in /usr/local/bin/ of each computer.
> When I run the NAMD with two MAC computers by using the command below:
>
> charmrun +p11 ++verbose /usr/local/bin/namd2.10 ++remote-shell ssh
> NAMD.z.cfg
>
> Then Charmrun program stops with log output below.
>
> Charmrun>charmrun started...
> Charmrun>using ./nodelist as nodesfile
> Charmrun>adding client 0: "192.168.x.49", IP:192.168.x.49
> Charmrun>adding client 1: "192.168.x.50", IP:192.168.x.50
> Charmrun>adding client 2: "192.168.x.49", IP:192.168.x.49
> ..
> Charmrun>adding client 13: "192.168.x.50", IP:192.168.x.50
> Charmrun>adding client 14: "192.168.x.49", IP:192.168.x.49
> start_nodes_rsh
> Charmrun> Sending "0 192.168.x.45 52597 8206 0" to client 0.
> ..
> Charmrun> clisnt 1 connectd (IP=192.168.x.50) data_port=54928)
> Charmrun> clisnt 13 connectd (IP=192.168.x.50) data_port=55722)
> Charmrun> clisnt 11 connectd (IP=192.168.x.50) data_port=50657)
> Charmrun> Waiting for 13-th client to connect.
> Timeout waiting for node-program to connect
>
> When I run the charmrun with +p10 or lower, the namd runs successfully
> in each computer node.
> Running the same program with ++local option failed calculation:
>
> charmrun ++local +p11 ++verbose /usr/local/bin/namd2.10
> ++remote-shell ssh +idlepoll NAMD.z.cfg
>
> I suspected my misconfiguration of ssh server/client, so I googled the
> term "ssh max connections"
> but could not get helpful hits.
> I also searched namd-l logs, NAMD tutorial and FAQs with the error
> message "Timeout waiting for" etc., but I could not find any solution.
>
> my nodelist file is:
>
> group main
> host 192.168.x.49
> host 192.168.x.50
>
> host 192.168.x.49 and 192.168.x.50 allow ssh login by using public key
> without password prompt.
>
> Setting of max connections in /etc/sshd_config are below:
> MacSessions 50
> MaxStartups 50:30:100
>
> Could anyone give me ideas?
>
> Thank you in advance.
>
> -----------------------------
> Kenji Miura
> Kobe University Graduate school of Medicine

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:23:07 CST