Re: Decreasing performance of cluster running FEP

From: Brian Radak (brian.radak_at_gmail.com)
Date: Thu Jul 12 2018 - 09:24:19 CDT

Determining if colvars or FEP is the culprit here is a necessary first
step. We need a minimal example that reproduces the issue.

Does the slowdown only occur on the cluster? When running on multiple
nodes? Does the problem occur sooner if you run fewer steps per lambda or
does it occur after a set walltime?

On Thu, Jul 12, 2018, 3:00 AM Francesco Pietra <chiendarret_at_gmail.com>
wrote:

> I was also perplexed at the performance degrading as the lambda changes,
> which occurred soon or later, not always at the same lambda value, and to
> the same extent when either one (ligand alone) or four nodes
> (ligand+protein) are involved.
>
> As I said, the rmsd is good, in particular the structure and pose of the
> ligand (a polycyclic diterpenoid with a mobile side chain, rather exotic
> structure) is well conserved during FEP. The ligand was parameterized
> charmm36 with dih fitting at HF/6-31G* level and MD equilibration was
> pretty long (>100ns) with absolutely flat rmsd/frame.
>
> The only I can do (actually I am just doing that) is decreasing the number
> of steps per lambda in order to keep the calculation within 70 hours (which
> still requires a special permission at the cluster). Hopefully it will not
> bring the calculation out of pseudo-convergence. Which occurred, as
> expected, when I tried by decreasing the number of windows, while
> increasing the number of steps per window.
>
> Unfortunately there is little specific recent literature with namd/FEP for
> complicated organic ligands. This is why I asked you about topogromacs to
> compare with gromacs running charmm36. However, even the literature of FEP
> with gromacs is limited to rather simple organic ligands and, what
> surprised me very much, in accordance with experiments while the ligands
> had been parameterized with gaff ff at semiempirical level. Probably I'll
> see all these affairs with a different eye when my experience is ripe.
>
> francesco
>
> On Thu, Jul 12, 2018 at 1:23 AM Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov>
> wrote:
>
> > Colvars are indeed driven by a single CPU. Most of the colvars perform
> > well if the number of atoms involved isn't too big, and bond lengths and
> > angles are typical examples of that. But if you are asking for colvars
> that
> > involve many atoms in a complicated relationship, performance isn't all
> > that good. To me, the weird thing is that the performance degrades only
> as
> > the lambda changes. Are you getting any absurd bonds as the trajectory
> > progresses?
> >
> > -Josh
> >
> >
> >
> > On 2018-07-11 15:32:12-06:00 Francesco Pietra wrote:
> >
> > Thanks for your answer.
> >
> > 36 core Intel® Xeon® Broadwell/node, memory 115Gb/node, so that the
> > problems are to look for elsewhere.
> >
> > 50555 atoms, including waters, whereby 4 nodes proved to be the best
> > choice for MD
> > where the performance was excellent.
> >
> > With the ligand alone in water the best choice proved to be one node.
> >
> > In retrospect, are colvars driven by a single CPU? Is that the problem? I
> > could not set less colvars that I described in order
> > to maintain the ligand in place.
> > francesco
> >
> > On Wed, Jul 11, 2018 at 7:21 PM Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov
> >
> > wrote:
> >
> >> What is the hardware on your cluster? FEP is not accelerated with GPUs--0000000000000e93b60570ce1f98--

This archive was generated by hypermail 2.1.6 : Wed Sep 18 2019 - 23:20:08 CDT