Re: Bug with FEP?

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Fri Apr 13 2018 - 02:49:14 CDT

Today, while some other FEPs finished regularly, another FEP (frwd-10) of
the group illustrated above crashed at window 32
(i.e., 8 windows before completion) with the same issue

WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 500000
> TCL: Running FEP window 32: Lambda1 0.9774999999999984 Lambda2
> 0.9799999999999983 [dLambda 0.0024999999999999467]
> TCL: Setting parameter firsttimestep to 0
> TCL: Setting parameter alchLambda to 0.9774999999999984
> Info: NONBONDED TABLE R-SQUARED SPACING: 0.0625
> .......................
> ........................
> EP: 73500 130.5390 -9397.2284 901.6677
>
> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 73500
> colvars: Synchronizing (emptying the buffer of) trajectory file
> "frwd-09_0.colvars.traj".
> colvars: Writing the state file "frwd-09.colvars.state".
> WRITING COORDINATES TO DCD FILE frwd-09.dcd AT STEP 74000
> WRITING COORDINATES TO RESTART FILE AT STEP 74000
> FATAL ERROR: Unable to open binary file frwd-09.coor: File exists
>

This message is also forwarded to the cluster as either a bug exists with
namd FEP, or there are defective nodes (it was job ID 695703, about which
"scontrol show 695703" answers "invalid job ID", while I am deleting all
generated frwd-10 files and restarting)

francesco

On Thu, Apr 12, 2018 at 10:28 PM, Francesco Pietra <chiendarret_at_gmail.com>
wrote:

> Hello
> I am carrying out with namd2.12 a FEP for Unbound ligand in water (as a
> preliminary for ligand-protein complex) using 10 segments frwd and 10
> back, for a total 400 windows frwd and same back.
>
> Segment back-02 has already completed. Out of all other running, frwd-09
> crashed at step 54,000 of first window
>
> colvars: The restart output state file will be "frwd-09.colvars.state".
>> colvars: The final output state file will be "frwd-09_0.colvars.state".
>> FEP: RESETTING FOR NEW FEP WINDOW LAMBDA SET TO 0.8 LAMBDA2 0.8025
>> FEP: WINDOW TO HAVE 100000 STEPS OF EQUILIBRATION PRIOR TO FEP DATA
>> COLLECTION.
>> FEP: USING CONSTANT TEMPERATURE OF 300 K FOR FEP CALCULATION
>> PRESSURE: 0 -368.104 569.455 -539.08 569.454 -304.251 -1415.15 -539.08
>> -1415.15 181.994
>> GPRESSURE: 0 -260.121 387.556 -718.786 295.641 -123.294 -1219.32 -365.275
>> -1092.12 269.401
>> ETITLE: TS BOND ANGLE DIHED
>> IMPRP :
>> ...................................
>> .....................................
>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 53500
>> WRITING COORDINATES TO DCD FILE frwd-09.dcd AT STEP 53500
>> WRITING COORDINATES TO RESTART FILE AT STEP 53500
>> FINISHED WRITING RESTART COORDINATES
>> WRITING VELOCITIES TO RESTART FILE AT STEP 53500
>> FINISHED WRITING RESTART VELOCITIES
>> colvars: Synchronizing (emptying the buffer of) trajectory file
>> "frwd-09_0.colva
>> rs.traj".
>> colvars: Writing the state file "frwd-09.colvars.state".
>> WRITING COORDINATES TO DCD FILE frwd-09.dcd AT STEP 54000
>> WRITING COORDINATES TO RESTART FILE AT STEP 54000
>> FATAL ERROR: Unable to open binary file frwd-09.coor: File exists
>> [0] Stack Traceback:
>>
>
>
> I had never encountered such a problem and wonder whether this stems from
> the code or the cluster (each FEP on one node, 36 core, NextScale).
> frwd-09.coor (which is bincoor) has normal size (did not try with
> psf/vmd), while, curiously, frwd-09.dcd and frwd-dcd.BAK were generated
> alongside back-09.fepout and back-09.fepout.BAK, as if the dcd and fepout
> files had been present initially (but they were not). Also, no anomaly can
> be seen in frwd-09.namd and frwd-09.job.
>
> At any event, I am deleting all generated frwd-09 files and restarting
> from scratch in the hope that it was a non systematic error.
>
> thanks for advice
> francesco pietra
>

This archive was generated by hypermail 2.1.6 : Sat Sep 21 2019 - 23:19:31 CDT