From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue May 18 2010 - 04:18:20 CDT
2010/5/18 王棽 <corarbor_at_163.com>:
> Dear NAMD users:
> I am running NAMD on Dawning5000A super computer,
> "http://www.ssc.net.cn/en/resources.asp". However, I found my NAMD processes
> vulnerable on such a platfrom. They usually died with an input/output error
> of the *.restart.coor, *.restart.vel or *.restart.xsc files. There is an
> example of stand output below:
> I contacted with the engineers of the super computer center, and they found
> there was a temporary lustre terminal connection break and reconnect event
> when such input/output error happened, which is quite often observed during
> the communication of compute nodes and OSS nodes.
> Do you have any suggestion on this problem?
call you "super" engineers again and tell them to do their job!
this is definitely a problem of the machine and its configuration.
i find it pretty hilarious that the system managers tell you that
they see this error happen and imply that it is a failure of your
application. NAMD is being using on lustre file systems at
a very large scale (NCSA's abe and lincoln cluster, NICS'
cray xt5 and others) successfully.
and NAMD is not really putting a large strain on the I/O
subsystem. other programs should create much worse issues.
-- Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://sites.google.com/site/akohlmey/ Institute for Computational Molecular Science Temple University, Philadelphia PA, USA.
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:54:08 CST