Re: Decreasing performance of cluster running FEP

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Sat Jul 14 2018 - 10:57:38 CDT

alchVdwLambdaEnd 1.0
alchElecLambdaStart 0.5
alchVdWShiftCoeff 4.0
alchDecouple off

As I can not ask any more for > 24hr at the cluster, my plan now is to go
on with small sectors:

0.0-0.2
0.2-0.4
0.4-0.6
0.6-0.8
0.8-1.0

beginning with the last one (along 9 windows, lambda 0.025) to see what
happens, before investing too much.

fp

On Sat, Jul 14, 2018 at 4:25 PM Brian Radak <brian.radak_at_gmail.com> wrote:

> Follow up - what are the values of alchElectLambdaStart and
> alchvdWLambaEnd? The former in particular may change the cost of PME,
> especially if you have alchDecouple on.
>
> On Thu, Jul 12, 2018, 9:24 AM Brian Radak <brian.radak_at_gmail.com> wrote:
>
>> Determining if colvars or FEP is the culprit here is a necessary first
>> step. We need a minimal example that reproduces the issue.
>>
>> Does the slowdown only occur on the cluster? When running on multiple
>> nodes? Does the problem occur sooner if you run fewer steps per lambda or
>> does it occur after a set walltime?
>>
>> On Thu, Jul 12, 2018, 3:00 AM Francesco Pietra <chiendarret_at_gmail.com>
>> wrote:
>>
>>> I was also perplexed at the performance degrading as the lambda changes,
>>> which occurred soon or later, not always at the same lambda value, and to
>>> the same extent when either one (ligand alone) or four nodes
>>> (ligand+protein) are involved.
>>>
>>> As I said, the rmsd is good, in particular the structure and pose of the
>>> ligand (a polycyclic diterpenoid with a mobile side chain, rather exotic
>>> structure) is well conserved during FEP. The ligand was parameterized
>>> charmm36 with dih fitting at HF/6-31G* level and MD equilibration was
>>> pretty long (>100ns) with absolutely flat rmsd/frame.
>>>
>>> The only I can do (actually I am just doing that) is decreasing the
>>> number
>>> of steps per lambda in order to keep the calculation within 70 hours
>>> (which
>>> still requires a special permission at the cluster). Hopefully it will
>>> not
>>> bring the calculation out of pseudo-convergence. Which occurred, as
>>> expected, when I tried by decreasing the number of windows, while
>>> increasing the number of steps per window.
>>>
>>> Unfortunately there is little specific recent literature with namd/FEP
>>> for
>>> complicated organic ligands. This is why I asked you about topogromacs to
>>> compare with gromacs running charmm36. However, even the literature of
>>> FEP
>>> with gromacs is limited to rather simple organic ligands and, what
>>> surprised me very much, in accordance with experiments while the ligands
>>> had been parameterized with gaff ff at semiempirical level. Probably I'll
>>> see all these affairs with a different eye when my experience is ripe.
>>>
>>> francesco
>>>
>>> On Thu, Jul 12, 2018 at 1:23 AM Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov
>>> >
>>> wrote:
>>>
>>> > Colvars are indeed driven by a single CPU. Most of the colvars perform
>>> > well if the number of atoms involved isn't too big, and bond lengths
>>> and
>>> > angles are typical examples of that. But if you are asking for colvars
>>> that
>>> > involve many atoms in a complicated relationship, performance isn't all
>>> > that good. To me, the weird thing is that the performance degrades
>>> only as
>>> > the lambda changes. Are you getting any absurd bonds as the trajectory
>>> > progresses?
>>> >
>>> > -Josh
>>> >
>>> >
>>> >
>>> > On 2018-07-11 15:32:12-06:00 Francesco Pietra wrote:
>>> >
>>> > Thanks for your answer.
>>> >
>>> > 36 core Intel® Xeon® Broadwell/node, memory 115Gb/node, so that the
>>> > problems are to look for elsewhere.
>>> >
>>> > 50555 atoms, including waters, whereby 4 nodes proved to be the best
>>> > choice for MD
>>> > where the performance was excellent.
>>> >
>>> > With the ligand alone in water the best choice proved to be one node.
>>> >
>>> > In retrospect, are colvars driven by a single CPU? Is that the
>>> problem? I
>>> > could not set less colvars that I described in order
>>> > to maintain the ligand in place.
>>> > francesco
>>> >
>>> > On Wed, Jul 11, 2018 at 7:21 PM Vermaas, Joshua <
>>> Joshua.Vermaas_at_nrel.gov>
>>> > wrote:
>>> >
>>> >> What is the hardware on your cluster? FEP is not accelerated with
>>> GPUs.
>>> >> Neither are colvars, which is I think where the problem may actually
>>> be.
>>> >> How many atoms are in your colvar definitions?
>>> >>
>>> >> -Josh
>>> >>
>>> >>
>>> >>
>>> >> On 2018-07-10 23:36:51-06:00 owner-namd-l_at_ks.uiuc.edu wrote:
>>> >>
>>> >> Hello:
>>> >> I am observing a marked decrease in the performance of a NextScale
>>> >> cluster running a FEP for protein-ligand, previously equilibrated for
>>> over
>>> >> 100ns. No such problems when running MD equilibration on the same
>>> system
>>
>>

This archive was generated by hypermail 2.1.6 : Tue Sep 17 2019 - 23:20:03 CDT