From: Thanassis Silis (djnass_18_at_hotmail.com)
Date: Thu May 07 2015 - 10:29:26 CDT
I am attempting to run a simulation on a Dell Blade system across up to 6 identical systems each with 64 cores.
The simulations seem to run fine - and benefit as more blade servers are used for the simulations.
The problem is that I do not see output files being generated during the simulation from the controlling node, ie. the node through which I started the simulation and on which the log file is saved.
I use the following command to run each simulation
/usr/local/namd/charmrun /usr/local/namd/namd2 ++nodelist ./nodelist +setcpuaffinity +p128 ./test.conf > test.log &
I do not have permission to the path /usr/local/namd where the ibverbs version of the namd related executables reside, but the output files should normally be saved on the same folder that the test.conf file resides (and where I run the simulation from).
I have used the multicore version of namd to run a few simulations locally with command
./../namd-multicore/namd2 +setcputaffinity +p32 test.conf > test.log &
and this saves/generates the output files fine.
What could be the problem in the ibverbs executable case ? Is it simply not possible with the dispersed threads to write data out before the end of the simulation?
Also, I cannot connect to the running simulation through IMD. But, I presume that while there are no files saved, there is nothing to audit, right?
So no IMD listening thread is spawned. I have verified that by running netstat -paet ; it does not show anyone listening on the high port (12345) I have set with IMDport directive in the config file, even though in the log it states " IMD INTERACTIVE LISTENING PORT 12345".
Thank you in advance for your help!
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:52 CST