From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Thu Jan 15 2015 - 14:20:58 CST
Look further up in the output and you will likely see a non-fatal error
message about renaming that file failing.
On Thu, 15 Jan 2015, Alexander Tzanov wrote:
> Hi Jim/Guys
> Sorry for bother you with this problem - recently I am seeing intermitted problem with NAMD 10.
> Sometimes it gives the following error:
> Processor 0 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: Unable to open binary file vararea.out.restart.coor: File exists
> Charm++ fatal error:
> FATAL ERROR: Unable to open binary file vararea.out.restart.coor: File exist
> Permissions are right however.
> I googled error but only 2 old mails from old mail list pop up. Have anyone see the same
> problem? Or I should direct the question to PPL? I am running on 128 cores on a large IB cluster.
> Alexander Tzanov,PhD
> On Jan 14, 2015, at 6:16 PM, Jim Phillips <jim_at_ks.uiuc.edu<mailto:jim_at_ks.uiuc.edu>> wrote:
> Hi Ryan,
> First, if at all possible avoid MPI-smp in favor of ibverbs-smp, assuming you do have InfiniBand. Then "charmrun ++mpiexec" will use mpiexec to launch across nodes. This works with most MPI versions, and you can specify a runscript to fix the rest. If you don't have a InfiniBand (or a Cray, or just maybe 10Gbit ethernet) then multi-node runs are going to be slow, period
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:21:33 CST