Re: Decreasing performance of cluster running FEP

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Wed Jul 11 2018 - 16:31:55 CDT

Thanks for your answer.

36 core Intel® Xeon® Broadwell/node, memory 115Gb/node, so that the
problems are to look for elsewhere.

50555 atoms, including waters, whereby 4 nodes proved to be the best choice
for MD
where the performance was excellent.

With the ligand alone in water the best choice proved to be one node.

In retrospect, are colvars driven by a single CPU? Is that the problem? I
could not set less colvars that I described in order
to maintain the ligand in place.

francesco

On Wed, Jul 11, 2018 at 7:21 PM Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov>
wrote:

> What is the hardware on your cluster? FEP is not accelerated with GPUs.
> Neither are colvars, which is I think where the problem may actually be.
> How many atoms are in your colvar definitions?
>
> -Josh
>
>
>
> On 2018-07-10 23:36:51-06:00 owner-namd-l_at_ks.uiuc.edu wrote:
>
> Hello:
> I am observing a marked decrease in the performance of a NextScale cluster
> running a FEP for protein-ligand, previously equilibrated for over 100ns.
> No such problems when running MD equilibration on the same system. Code
> NAMD 2.12, ad hoc compilation in house with Intel 2016 (NAMD2.12, compiled
> on more recent Intel, available as module at the cluster, proved unable to
> run a FEP)
>
> The system is made of ca 460 residues in water, FEP 0.2-1.0 lambda 0.025
> (32 windows), preeq 175,000/numSteps 750,000, ts=1.0fs.
> FEP on 4 nodes/144core (optimal for scaling) starts with 0.0078/step
> performance until window 3. Thereafter 0.014/step until window 5,
> thereafter 0.021 until present window 9. The ligand, under modest
> r/angle/dih colvars, remains in place with no detectable rotation or
> distortion. Slowdown is such that it becomes extremely expensive carrying
> out a FEP, even if divided in two sectors like now. same problems for FEP
> 0.0-0.2.
>
> I observed the same problem when running FEP on the ligand alone in water
> on one node /36 core.
>
> In all cases, letting the code writing on disk less frequently did not
> help.
> These are problems that are being observed since the start of this projet,
> a few months ago.
>
> Thanks for advice.
>
> francesco pietra
>
>

This archive was generated by hypermail 2.1.6 : Thu Dec 05 2019 - 23:20:01 CST