bpti example compiled source charmrun++ does not launch

From: Hazard, E. Starr (hazards_at_musc.edu)
Date: Thu Nov 01 2018 - 09:31:05 CDT

RHEL v6 LSF manager

I compiled NAMD/charm

~/COMPILE3/NAMD_Git-2018-09-21_Source/charm-6.8.2

here's my smart-build log

cat ~/COMPILE3/NAMD_Git-2018-09-21_Source/charm-6.8.2/smart-build.log
Fri Sep 21 12:40:12 EDT 2018
Using the following build command:
./build charm++ mpi-linux-x86_64 -j4 -g -O0

Fri Sep 21 12:47:14 EDT 2018
Using the following build command:
./build charm++ mpi-linux-x86_64 smp -j4 -g -O0

Fri Sep 21 12:48:28 EDT 2018
Using the following build command:
./build charm++ mpi-linux-x86_64 -j4 -g -O0

Fri Sep 21 12:50:01 EDT 2018
Using the following build command:
./build charm++ netlrts-linux-x86_64 gcc gfortran -j4 -g -O0

Wed Oct 3 17:00:28 EDT 2018
Using the following build command:
./build LIBS netlrts-linux-x86_64 gcc -j4 --with-production

my LSF file

#!/bin/bash
#BSUB -J NAMD2018
#BSUB -o NAMD2018_OUT%J
#BSUB -e NAMDERR.e%J
#BSUB -n 80
#BSUB -u hazards_at_musc.edu
export PWD=/home/hazards/NAMD/:$PWD
export PATH=/home/hazards/NAMD/toppar:$PATH
/shared/app/NAMD_Git-2018-09-21_Source/charmrun +p80 ++verbose ++remote-shell ssh ++nodelist /home/hazards/NAMD/nodelist \
/shared/app/NAMD_Git-2018-09-21_Source/namd2 +isomalloc_sync /home/hazards/NAMD/bpti.namd > \
/home/hazards/NAMD/BPTI-namdcompilecharm_allnodes80.out

The LSF file captures this

cat NAMDERR.e9041
Charmrun> charmrun started...
Charmrun> using /home/hazards/NAMD/nodelist as nodesfile
Charmrun> remote shell (10.200.1.3:0) started
Charmrun> remote shell (10.200.1.5:7) started
Charmrun> remote shell (10.200.1.6:14) started
Charmrun> remote shell (10.200.1.7:21) started
Charmrun> remote shell (10.200.1.8:28) started
Charmrun> remote shell (10.200.1.9:35) started
Charmrun> remote shell (10.200.1.10:42) started
Charmrun> remote shell (10.200.1.12:49) started
Charmrun> remote shell (10.200.1.13:56) started
Charmrun> remote shell (10.200.1.15:62) started
Charmrun> remote shell (10.200.1.16:68) started
Charmrun> remote shell (10.200.1.17:74) started
Charmrun> node programs all started
Charmrun> error attaching to node '10.200.1.3':
Timeout waiting for node-program to connect

The NAMD output looks like this
 more BPTI-namdcompilecharm_allnodes80.out
Charmrun remote shell(10.200.1.13.56)> remote responding...
Charmrun remote shell(10.200.1.13.56)> starting node-program...
Charmrun remote shell(10.200.1.13.56)> remote shell phase successful.
Charmrun remote shell(10.200.1.17.74)> remote responding...
Charmrun remote shell(10.200.1.16.68)> remote responding...
Charmrun remote shell(10.200.1.6.14)> remote responding...
Charmrun remote shell(10.200.1.17.74)> starting node-program...
Charmrun remote shell(10.200.1.17.74)> remote shell phase successful.
Charmrun remote shell(10.200.1.6.14)> starting node-program...
Charmrun remote shell(10.200.1.6.14)> remote shell phase successful.
Charmrun remote shell(10.200.1.7.21)> remote responding...
..
Charmrun remote shell(10.200.1.15.62)> remote responding...
Charmrun remote shell(10.200.1.5.7)> starting node-program...
Charmrun remote shell(10.200.1.5.7)> remote shell phase successful.
Charmrun remote shell(10.200.1.12.49)> remote responding...
...
Charmrun remote shell(10.200.1.7.21)> starting node-program...
Charmrun remote shell(10.200.1.9.35)> starting node-program...
Charmrun remote shell(10.200.1.9.35)> remote shell phase successful.
Charmrun remote shell(10.200.1.12.49)> starting node-program...
Charmrun remote shell(10.200.1.12.49)> remote shell phase successful.
Charmrun remote shell(10.200.1.3.0)> starting node-program...
Charmrun remote shell(10.200.1.3.0)> remote shell phase successful.
Charmrun> scalable start enabled.
Charmrun> adding client 0: "10.200.1.3", IP:10.200.1.3
Charmrun> adding client 1: "10.200.1.3", IP:10.200.1.3
Charmrun> adding client 2: "10.200.1.3", IP:10.200.1.3
Charmrun> adding client 3: "10.200.1.3", IP:10.200.1.3
Charmrun> adding client 4: "10.200.1.3", IP:10.200.1.3
Charmrun> adding client 5: "10.200.1.3", IP:10.200.1.3
Charmrun> adding client 6: "10.200.1.3", IP:10.200.1.3
Charmrun> adding client 7: "10.200.1.5", IP:10.200.1.5
Charmrun> adding client 8: "10.200.1.5", IP:10.200.1.5
Charmrun> adding client 9: "10.200.1.5", IP:10.200.1.5
Charmrun> adding client 10: "10.200.1.5", IP:10.200.1.5
Charmrun> adding client 11: "10.200.1.5", IP:10.200.1.5
Charmrun> adding client 12: "10.200.1.5", IP:10.200.1.5
Charmrun> adding client 13: "10.200.1.5", IP:10.200.1.5
Charmrun> adding client 14: "10.200.1.6", IP:10.200.1.6
Charmrun> adding client 15: "10.200.1.6", IP:10.200.1.6
Charmrun> adding client 16: "10.200.1.6", IP:10.200.1.6
Charmrun> adding client 17: "10.200.1.6", IP:10.200.1.6
Charmrun> adding client 18: "10.200.1.6", IP:10.200.1.6
Charmrun> adding client 19: "10.200.1.6", IP:10.200.1.6
Charmrun> adding client 20: "10.200.1.6", IP:10.200.1.6
Charmrun> adding client 21: "10.200.1.7", IP:10.200.1.7
Charmrun> adding client 22: "10.200.1.7", IP:10.200.1.7
Charmrun> adding client 23: "10.200.1.7", IP:10.200.1.7
Charmrun> adding client 24: "10.200.1.7", IP:10.200.1.7
Charmrun> adding client 25: "10.200.1.7", IP:10.200.1.7
Charmrun> adding client 26: "10.200.1.7", IP:10.200.1.7
Charmrun> adding client 27: "10.200.1.7", IP:10.200.1.7
Charmrun> adding client 28: "10.200.1.8", IP:10.200.1.8
Charmrun> adding client 29: "10.200.1.8", IP:10.200.1.8
Charmrun> adding client 30: "10.200.1.8", IP:10.200.1.8
Charmrun> adding client 31: "10.200.1.8", IP:10.200.1.8
Charmrun> adding client 32: "10.200.1.8", IP:10.200.1.8
Charmrun> adding client 33: "10.200.1.8", IP:10.200.1.8
Charmrun> adding client 34: "10.200.1.8", IP:10.200.1.8
Charmrun> adding client 35: "10.200.1.9", IP:10.200.1.9
Charmrun> adding client 36: "10.200.1.9", IP:10.200.1.9
Charmrun> adding client 37: "10.200.1.9", IP:10.200.1.9
Charmrun> adding client 38: "10.200.1.9", IP:10.200.1.9
Charmrun> adding client 39: "10.200.1.9", IP:10.200.1.9
Charmrun> adding client 40: "10.200.1.9", IP:10.200.1.9
Charmrun> adding client 41: "10.200.1.9", IP:10.200.1.9
Charmrun> adding client 42: "10.200.1.10", IP:10.200.1.10
Charmrun> adding client 43: "10.200.1.10", IP:10.200.1.10
Charmrun> adding client 44: "10.200.1.10", IP:10.200.1.10
Charmrun> adding client 45: "10.200.1.10", IP:10.200.1.10
Charmrun> adding client 46: "10.200.1.10", IP:10.200.1.10
Charmrun> adding client 47: "10.200.1.10", IP:10.200.1.10
Charmrun> adding client 48: "10.200.1.10", IP:10.200.1.10
Charmrun> adding client 49: "10.200.1.12", IP:10.200.1.12
Charmrun> adding client 50: "10.200.1.12", IP:10.200.1.12
Charmrun> adding client 51: "10.200.1.12", IP:10.200.1.12
Charmrun> adding client 52: "10.200.1.12", IP:10.200.1.12
Charmrun> adding client 53: "10.200.1.12", IP:10.200.1.12
Charmrun> adding client 54: "10.200.1.12", IP:10.200.1.12
Charmrun> adding client 55: "10.200.1.12", IP:10.200.1.12
Charmrun> adding client 56: "10.200.1.13", IP:10.200.1.13
Charmrun> adding client 57: "10.200.1.13", IP:10.200.1.13
Charmrun> adding client 58: "10.200.1.13", IP:10.200.1.13
Charmrun> adding client 59: "10.200.1.13", IP:10.200.1.13
Charmrun> adding client 60: "10.200.1.13", IP:10.200.1.13
Charmrun> adding client 61: "10.200.1.13", IP:10.200.1.13
Charmrun> adding client 62: "10.200.1.15", IP:10.200.1.15
Charmrun> adding client 63: "10.200.1.15", IP:10.200.1.15
Charmrun> adding client 64: "10.200.1.15", IP:10.200.1.15
Charmrun> adding client 65: "10.200.1.15", IP:10.200.1.15
Charmrun> adding client 66: "10.200.1.15", IP:10.200.1.15
Charmrun> adding client 67: "10.200.1.15", IP:10.200.1.15
Charmrun> adding client 68: "10.200.1.16", IP:10.200.1.16
Charmrun> adding client 69: "10.200.1.16", IP:10.200.1.16
Charmrun> adding client 70: "10.200.1.16", IP:10.200.1.16
Charmrun> adding client 71: "10.200.1.16", IP:10.200.1.16
Charmrun> adding client 72: "10.200.1.16", IP:10.200.1.16
Charmrun> adding client 73: "10.200.1.16", IP:10.200.1.16
Charmrun> adding client 74: "10.200.1.17", IP:10.200.1.17
Charmrun> adding client 75: "10.200.1.17", IP:10.200.1.17
Charmrun> adding client 76: "10.200.1.17", IP:10.200.1.17
Charmrun> adding client 77: "10.200.1.17", IP:10.200.1.17
Charmrun> adding client 78: "10.200.1.17", IP:10.200.1.17
Charmrun> adding client 79: "10.200.1.17", IP:10.200.1.17
Charmrun> Charmrun = 10.200.1.13, port = 60873
start_nodes_ssh
Charmrun> Sending "0 10.200.1.13 60873 24622 0" to client 0.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 0.
Charmrun> Starting ssh 10.200.1.3 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "7 10.200.1.13 60873 24622 0" to client 7.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 7.
Charmrun> Starting ssh 10.200.1.5 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "14 10.200.1.13 60873 24622 0" to client 14.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 14.
Charmrun> Starting ssh 10.200.1.6 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "21 10.200.1.13 60873 24622 0" to client 21.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 21.
Charmrun> Starting ssh 10.200.1.7 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "28 10.200.1.13 60873 24622 0" to client 28.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 28.
Charmrun> Starting ssh 10.200.1.8 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "35 10.200.1.13 60873 24622 0" to client 35.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 35.
Charmrun> Starting ssh 10.200.1.9 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "42 10.200.1.13 60873 24622 0" to client 42.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 42.
Charmrun> Starting ssh 10.200.1.10 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "49 10.200.1.13 60873 24622 0" to client 49.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 49.
Charmrun> Starting ssh 10.200.1.12 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "56 10.200.1.13 60873 24622 0" to client 56.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 56.
Charmrun> Starting ssh 10.200.1.13 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "62 10.200.1.13 60873 24622 0" to client 62.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 62.
Charmrun> Starting ssh 10.200.1.15 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "68 10.200.1.13 60873 24622 0" to client 68.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 68.
Charmrun> Starting ssh 10.200.1.16 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Sending "74 10.200.1.13 60873 24622 0" to client 74.
Charmrun> find the node program "/shared/app/NAMD_Git-2018-09-21_Source/namd2" at "/home/hazards/NAMD" for 74.
Charmrun> Starting ssh 10.200.1.17 -l hazards -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o NoHostAuthenticationForLocalhost=yes /bin/bash -f
Charmrun> Waiting for 0-th client to connect.

here are my nodelist files. I have tried both

cat nodelist
group main ++shell ssh
host 10.200.1.3
host 10.200.1.5
host 10.200.1.6
host 10.200.1.7
host 10.200.1.8
host 10.200.1.9
host 10.200.1.10
host 10.200.1.12
host 10.200.1.13
host 10.200.1.15
host 10.200.1.16
host 10.200.1.17
hpcc3:/home/hazards/NAMD: cat nodelist.Oct31
group main ++shell ssh
host compute000
host compute002
host compute003
host compute004
host compute005
host compute006
host compute007
host compute009
host compute010
host compute012
host compute013
host compute013
host compute014

I have tried to understand the advice given here https://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2013-2014/0538.html
I can ping my hostname from any and all nodes.

I need some help. Thanks in advance

Starr

-------------------------------------------------------------------------
This message was secured via TLS by MUSC.

This archive was generated by hypermail 2.1.6 : Thu Dec 05 2019 - 23:20:14 CST