Fwd: FEP & node/core number

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Mon May 07 2018 - 09:34:14 CDT

Revision: the ligand-protein complex FEP was carried out on 4 nodes (144
cores), not 1 node as erroneously said, and it does not scale well beyond
that.
fp
---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Mon, May 7, 2018 at 3:19 PM
Subject: Fwd: namd-l: FEP & node/core number
To: brian.radak_at_gmail.com, NAMD <namd-l_at_ks.uiuc.edu>

The puzzle has now been clarified. Despited prolonged equilibration (and
flat RMSD vs frame), FEP 0.0 0.2 0.05, with alchequil 400,000 and numSteps
1,800,000, continued crashing, if not immediately, before 500,000 steps.

The solution was changing to FEP 0.0 0.2 0.02 with alchequil 150,000 and
numSteps 900,000. However, with one node (36 core), in 18hr (out of the
limit 24hr) the simulation only reached lambda 0.1.

As it does not scale beyond one node, while alcheq and/or numSteps should
not be diminished further, I'll ask for a special two-days walltime.

thanks

francesco

---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Wed, May 2, 2018 at 7:53 AM
Subject: Re: namd-l: FEP & node/core number
To: Brian Radak <brian.radak_at_gmail.com>
Cc: namd-l <namd-l_at_ks.uiuc.edu>

Atoms belonged to the protein.

Restarting without alch on goes on regularly.

At any event, the system is now under MD equilibration again, for further
10ns.

thanks

francesco

On Tue, May 1, 2018 at 6:58 PM, Brian Radak <brian.radak_at_gmail.com> wrote:

> Which atoms are moving to fast? Are they in the alchemical region?
>
> Does this happen if you restart without alch on?
>
> In general you shouldn't get instabilities just by changing the number of
> nodes (an exception might be when changing to CUDA).
>
> On Mon, Apr 30, 2018 at 11:36 AM, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>
>> Hello:
>> In the frame of ligand-protein FEP, the system (total 50,555 atoms) was
>> MD equilibrated along 12ns on two nodes (72 cores).
>>
>> With the same hardware, or six node (216 cores), FEP immediately crashed
>> because of protein atoms moving too fast.
>>
>> With four nodes (144 cores) the trial (10 min) arrived safely to step
>> 72,000, with performance 0.09days/ns.
>>
>> All three simulations above were repeated with identical results.
>>
>> I am curious about that, as in the recent past I was already sporadically
>> faced by FEP problems of atoms moving too fast (at the fist step, like in
>> the case above) for ligand-protein systems that had been accurately
>> equilibrated. At that time I did not try to change the number of
>> nodes/cores.
>>
>> Memory was more than enough in all cases.
>>
>> Thanks for paying attention to that.
>>
>> francesco pietra
>>
>
>

This archive was generated by hypermail 2.1.6 : Thu Dec 12 2019 - 23:19:48 CST