Re: FEP & node/core number

From: Brian Radak (brian.radak_at_gmail.com)
Date: Mon May 07 2018 - 08:38:37 CDT

I guess this is off-topic now, but why does this not scale beyond one node?
Alchemy should scale more or less identically to regular MD, albeit with a
higher overall cost.

On Mon, May 7, 2018 at 9:19 AM, Francesco Pietra <chiendarret_at_gmail.com>
wrote:

> The puzzle has now been clarified. Despited prolonged equilibration (and
> flat RMSD vs frame), FEP 0.0 0.2 0.05, with alchequil 400,000 and numSteps
> 1,800,000, continued crashing, if not immediately, before 500,000 steps.
>
> The solution was changing to FEP 0.0 0.2 0.02 with alchequil 150,000 and
> numSteps 900,000. However, with one node (36 core), in 18hr (out of the
> limit 24hr) the simulation only reached lambda 0.1.
>
> As it does not scale beyond one node, while alcheq and/or numSteps should
> not be diminished further, I'll ask for a special two-days walltime.
>
> thanks
>
> francesco
>
> ---------- Forwarded message ----------
> From: Francesco Pietra <chiendarret_at_gmail.com>
> Date: Wed, May 2, 2018 at 7:53 AM
> Subject: Re: namd-l: FEP & node/core number
> To: Brian Radak <brian.radak_at_gmail.com>
> Cc: namd-l <namd-l_at_ks.uiuc.edu>
>
>
> Atoms belonged to the protein.
>
> Restarting without alch on goes on regularly.
>
> At any event, the system is now under MD equilibration again, for further
> 10ns.
>
> thanks
>
> francesco
>
> On Tue, May 1, 2018 at 6:58 PM, Brian Radak <brian.radak_at_gmail.com> wrote:
>
>> Which atoms are moving to fast? Are they in the alchemical region?
>>
>> Does this happen if you restart without alch on?
>>
>> In general you shouldn't get instabilities just by changing the number of
>> nodes (an exception might be when changing to CUDA).
>>
>> On Mon, Apr 30, 2018 at 11:36 AM, Francesco Pietra <chiendarret_at_gmail.com
>> > wrote:
>>
>>> Hello:
>>> In the frame of ligand-protein FEP, the system (total 50,555 atoms) was
>>> MD equilibrated along 12ns on two nodes (72 cores).
>>>
>>> With the same hardware, or six node (216 cores), FEP immediately crashed
>>> because of protein atoms moving too fast.
>>>
>>> With four nodes (144 cores) the trial (10 min) arrived safely to step
>>> 72,000, with performance 0.09days/ns.
>>>
>>> All three simulations above were repeated with identical results.
>>>
>>> I am curious about that, as in the recent past I was already
>>> sporadically faced by FEP problems of atoms moving too fast (at the fist
>>> step, like in the case above) for ligand-protein systems that had been
>>> accurately equilibrated. At that time I did not try to change the number of
>>> nodes/cores.
>>>
>>> Memory was more than enough in all cases.
>>>
>>> Thanks for paying attention to that.
>>>
>>> francesco pietra
>>>
>>
>>
>
>

This archive was generated by hypermail 2.1.6 : Wed Sep 18 2019 - 23:19:56 CDT