Re: Running QM-MM MOPAC on a cluster

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Tue Jan 08 2019 - 11:06:27 CST

Now there must be something unfitted (to explain why as slow as without cpu
affinity, until it crashed), if I followed correctly your indications

Slurm setting
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=36

/galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log

NAMD setting
qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD QMMM
GEO-OK THREADS=24"

qmExecPath "numactl -a -C 5-35
/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"

# qmExecPath "numactl -C +5-35
/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"

# qmExecPath "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"

NAMD log
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
[2] pthread affinity is: 2
[4] pthread affinity is: 4
[1] pthread affinity is: 1
[3] pthread affinity is: 3
[0] pthread affinity is: 0

Info: 1 NAMD Git-2018-11-22 Linux-x86_64-multicore 5 node216 fpietra0
Info: Running on 5 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.

TIMING: 3185 CPU: 996.776, 0.314889/step Wall: 999.169, 0.315218/step,

WRITING COORDINATES TO DCD FILE PolyAla_out.dcd AT STEP 3185
FATAL ERROR: Error running command for QM forces calculation.

namd-01.err
------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: Error running command for QM forces calculation.

Charm++ fatal error:
FATAL ERROR: Error running command for QM forces calculation.

/var/spool/slurmd/job592440/slurm_script: line 14: 33492 Aborted

 /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log

cat /0/*aux
..............
 LMO_ENERGY_LEVELS[020]=
  -17.192 -17.113 -16.969 -16.435 -13.952 -12.606 -12.244 -11.755
-11.148 -10.968
    1.002 1.567 3.340 3.366 3.593 3.697 3.796 3.855
4.365 4.522
 MOLECULAR_ORBITAL_OCCUPANCIES[00020]=
 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000
 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
 CPU_TIME:SECONDS[1]= 0.20

which is a regular end by MOPAC.

francesco

On Mon, Jan 7, 2019 at 5:40 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:

>
> Thanks for the great news!
>
> I suspect that "numactl -C +5-35 ..." is failing because it is conflicting
> with the affinity set by NAMD. I think the following should work since
> the -a option ignores the current affinity of the launching thread. Also
> note that the + is removed so these are absolute cpu ids.
>
> qmExecPath "numactl -a -C 5-35 ..."
>
> Jim
>
>
> On Fri, 4 Jan 2019, Francesco Pietra wrote:
>
> > Slurm setting
> > #SBATCH --nodes=1
> > #SBATCH --ntasks=1
> > #SBATCH --cpus-per-task=36
> >
> >
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> > namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
> >
> > NAMD setting
> > qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD QMMM
> > GEO-OK THREADS=24"
> >
> > # qmExecPath "numactl -C +5-35
> > /galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >
> > qmExecPath "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >
> > NAMD log
> > [1] pthread affinity is: 1
> > [3] pthread affinity is: 3
> > [0] pthread affinity is: 0
> > [2] pthread affinity is: 2
> > [4] pthread affinity is: 4
> >
> > Info: Running on 5 processors, 1 nodes, 1 physical nodes.
> > Info: CPU topology information available.
> >
> > TIMING: 12926 CPU: 666.423, 0.050845/step
> > TIMING: 14828 CPU: 763.82, 0.045536/step
> > TIMING: 19676 CPU: 1013.25, 0.050659/step
> >
> > WallClock: 1049.411743 CPUTime: 1040.567749 Memory: 432.250000 MB
> >
> > at an amazing ca ten times faster than on previous trials. Which seems to
> > me to be an absolute good for not using /dev/shm
> > (I was unable to set it; asked advice at the cluster if at all possible
> or
> > useful on the node)
> >
> > VARIANTS:
> > --with THREADS=30 to MOPAC it was a bit slower
> > TIMING: 13822 CPU: 720.347, 0.052429/step Wall: 726.456,
> 0.055537/step,
> > perhaps because the Polyala tutorial is for a small system.
> > --by assigning ten cores to namd it was somewhat slower.
> >
> > --I was unable to implement numactl by interpreting your suggestion
> > as +number_of_cores_to_namd
> > -total_number_of_cores_less_one, as follows
> > qmExecPath "numactl -C +5-35
> > /galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >
> > # qmExecPath "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >
> >
> > ------------ Processor 2 Exiting: Called CmiAbort ------------
> > Reason: FATAL ERROR: Error running command for QM forces calculation.
> >
> > Charm++ fatal error:
> > FATAL ERROR: Error running command for QM forces calculation.
> >
> > /var/spool/slurmd/job582681/slurm_script: line 14: 957 Aborted
> >
> >
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> > namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
> >
> > Thanks a lot for these advanced lessons
> >
> > francesco
> >
> >
> > On Wed, Jan 2, 2019 at 11:40 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
> >
> >>
> >> For starters, use the faster settings from the previous emails:
> >>
> >>> #SBATCH --ntasks=1
> >>> #SBATCH --cpus-per-task=34
> >>
> >> For a little more information add +showcpuaffinity.
> >>
> >> I suspect that +setcpuaffinity isn't looking at the limits on affinity
> >> that are enforced by the queueing system, so it's trying to use a
> >> forbidded cpu. If you request all cores on the node with
> >> --cpus-per-task=36 that might make the problem go away.
> >>
> >> Jim
> >>
> >>
> >> On Tue, 1 Jan 2019, Francesco Pietra wrote:
> >>
> >>> Thanks a lot for these suggestions. There must be some restriction
> >>> hindering the suggested settings. Slurm, namd-01.conf, and error are
> >> shown
> >>> below in the given order:
> >>>
> >>> #!/bin/bash
> >>> #SBATCH --nodes=1
> >>> #SBATCH --ntasks=10
> >>> #SBATCH --cpus-per-task=1
> >>> #SBATCH --time=00:30:00
> >>> #SBATCH --job-name=namd-01
> >>> #SBATCH --output namd-01.out
> >>> #SBATCH --error namd-01.err
> >>> #SBATCH --partition=gll_usr_prod
> >>> #SBATCH --mem=115GB
> >>> #SBATCH --account=IscrC_QMMM-FER_1
> >>> # goto launch directory
> >>> cd $SLURM_SUBMIT_DIR
> >>>
> >>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> >>> namd-01.conf +p10 +setcpuaffinity > namd-01.log
> >>>
> >>> qmExecPath "numactl -C +10-33
> >>> /galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >>>
> >>> $ cat *err
> >>> pthread_setaffinity: Invalid argument
> >>> pthread_setaffinity: Invalid argument
> >>> pthread_setaffinity: Invalid argument
> >>> ------------- Processor 7 Exiting: Called CmiAbort ------------
> >>> Reason: set cpu affinity abort!
> >>>
> >>> Charm++ fatal error:
> >>> set cpu affinity abort!
> >>>
> >>> /var/spool/slurmd/job540826/slurm_script: line 14: 21114 Segmentation
> >>> fault
> >>>
> >>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> >>> namd-01.conf +p10 +setcpuaffinity > namd-01.log
> >>>
> >>> fp
> >>>
> >>> On Mon, Dec 31, 2018 at 4:42 PM Jim Phillips <jim_at_ks.uiuc.edu> wrote:
> >>>
> >>>>
> >>>> Well, that's progress at least. I have one other idea to ensure that
> >> NAMD
> >>>> and MOPAC aren't competing with each other for the same cores:
> >>>>
> >>>> 1) Add "+setcpuaffinity" to the NAMD command line before ">".
> >>>>
> >>>> 2) Add "numactl -C +10-33" to the beginning of qmExecPath in
> >> namd-01.conf
> >>>> (quote the string, e.g., "numactl -C +10-33 /path/to/MOPAC.exe")
> >>>>
> >>>> This should keep NAMD on your first ten cores and MOPAC on the next
> 24.
> >>>>
> >>>> What is qmBaseDir set to? Something in /dev/shm is the best choice.
> If
> >>>> qmBaseDir is on a network filesystem that could slow things down.
> >>>>
> >>>> Jim
> >>>>
> >>>>
> >>>> On Fri, 21 Dec 2018, Francesco Pietra wrote:
> >>>>
> >>>>> I finally learned how to ssh on a given node. The results for
> >>>>> #SBATCH --nodes=1
> >>>>> #SBATCH --ntasks=10
> >>>>> #SBATCH --cpus-per-task=1
> >>>>>
> >>>>
> >>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> >>>>> namd-01.conf +p10 > namd-01.log
> >>>>>
> >>>>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD
> QMMM
> >>>>> GEO-OK THREADS=24"
> >>>>>
> >>>>> are
> >>>>>
> >>>>> ssh node181
> >>>>> namd %cpu 720-750
> >>>>> mopac %cpu 1-30
> >>>>> 1 (per-core load):
> >>>>> %Cpu0-4: 90-100
> >>>>> %Cpu18-22: 60-100
> >>>>> %Cpu5-17: 0.0
> >>>>> %Cpu23-34: 0.0
> >>>>>
> >>>>> namd.log: 0.5/step (at 11min executed 1346 steps)
> >>>>> ______________________
> >>>>> As above, only changing
> >>>>>
> >>>>> SBATCH --nodes=1
> >>>>> #SBATCH --ntasks=1
> >>>>> #SBATCH --cpus-per-task=34
> >>>>>
> >>>>> ssh node181
> >>>>> namd %cpu 900
> >>>>> mopac %cpu 0-34
> >>>>> 1
> >>>>> %Cpu0-34: 0.3-100.0
> >>>>>
> >>>>> namd.log: 0.3/step (at 11min executed 2080 steps)
> >>>>>
> >>>>> Despite all cpus used, disappointing performance. I can't say whether
> >>>> namd
> >>>>> and mopac compete, at least in part, for the same cores.
> >>>>>
> >>>>> francesco
> >>>>>
> >>>>>
> >>>>> On Mon, Dec 17, 2018 at 4:12 PM Jim Phillips <jim_at_ks.uiuc.edu>
> wrote:
> >>>>>
> >>>>>>
> >>>>>> Since you are asking Slurm for 10 tasks with 1 cpu-per-task it is
> >>>> possible
> >>>>>> that all 34 threads are running on a single core. You can check
> this
> >>>> with
> >>>>>> top (hit "1" to see per-core load) if you can ssh to the execution
> >> host.
> >>>>>>
> >>>>>> You should probably request --ntasks=1 --cpus-per-task=34 (or 36) so
> >>>> that
> >>>>>> Slurm will allocate all of the cores you wish to use. The number of
> >>>> cores
> >>>>>> used by NAMD is controlled by +p10 and you will need THREADS=24 for
> >>>> MOPAC.
> >>>>>>
> >>>>>> It is a good idea to use top to confirm that all cores are being
> used.
> >>>>>>
> >>>>>> Jim
> >>>>>>
> >>>>>>
> >>>>>> On Sun, 16 Dec 2018, Francesco Pietra wrote:
> >>>>>>
> >>>>>>> I had early taken into consideration the relative nr of threads, by
> >>>>>>> imposing them also to MOPAC.
> >>>>>>> Out of the many such trials, namd.config:
> >>>>>>>
> >>>>>>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD
> >> QMMM
> >>>>>>> GEO-OK THREADS=24"
> >>>>>>>
> >>>>>>> qmExecPath
> >>>> "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >>>>>>>
> >>>>>>> corresponding SLURM:
> >>>>>>> #SBATCH --nodes=1
> >>>>>>> #SBATCH --ntasks=10
> >>>>>>> #SBATCH --cpus-per-task=1
> >>>>>>>
> >>>>>>
> >>>>
> >>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> >>>>>>> namd-01.conf +p10 > namd-01.log
> >>>>>>>
> >>>>>>> Thus, 24+10=34, while the number of cores on the node was 36.
> Again,
> >>>>>>> execution took nearly two hours, slower than on my vintage VAIO
> with
> >>>> two
> >>>>>>> cores (1hr and half).
> >>>>>>>
> >>>>>>> As to the MKL_NUM_THREADS, I am lost, there is no such environment
> >>>>>> variable
> >>>>>>> in MOPAC's list. On the other hand, the namd night build I used
> >>>> performs
> >>>>>> as
> >>>>>>> effective as it should with classical MD simulations on one node of
> >> the
> >>>>>>> same cluster.
> >>>>>>>
> >>>>>>> thanks
> >>>>>>> fp
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Dec 14, 2018 at 4:29 PM Jim Phillips <jim_at_ks.uiuc.edu>
> >> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> The performance of a QM/MM simulation is typically limited by the
> QM
> >>>>>>>> program, not the MD program. Do you know how many threads MOPAC
> is
> >>>>>>>> launching? Do you need to set the MKL_NUM_THREADS environment
> >>>> variable?
> >>>>>>>> You want the number of NAMD threads (+p#) plus the number of MOPAC
> >>>>>> threads
> >>>>>>>> to be less than the number of cores on your machine.
> >>>>>>>>
> >>>>>>>> Jim
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, 14 Dec 2018, Francesco Pietra wrote:
> >>>>>>>>
> >>>>>>>>> Hi all
> >>>>>>>>> I resumed my attempts at finding the best settings for running
> namd
> >>>>>> qmmm
> >>>>>>>> on
> >>>>>>>>> a cluster. I used Example1, Polyala).
> >>>>>>>>>
> >>>>>>>>> In order to use namd2/13 multicore night build, I was limited to
> a
> >>>>>> single
> >>>>>>>>> multicore node, 2*18-core Intel(R) Xeon(R) E5-2697 v4 @ 2.30GHz
> >> and
> >>>>>> 128
> >>>>>>>>> GB RAM (Broadwell)
> >>>>>>>>>
> >>>>>>>>> Settings
> >>>>>>>>> qmConfigLine "PM7 XYZ T=2M 1SCF MOZYME CUTOFF=9.0 AUX LET GRAD
> >>>> QMMM
> >>>>>>>>> GEO-OK"
> >>>>>>>>>
> >>>>>>>>> qmExecPath
> >>>>>> "/galileo/home/userexternal/fpietra0/mopac/MOPAC2016.exe"
> >>>>>>>>>
> >>>>>>>>> of course, on the cluster the simulation can't be run on shm
> >>>>>>>>>
> >>>>>>>>> execution line
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
> >>>>>>>>> namd-01.conf +p# > namd-01.log
> >>>>>>>>>
> >>>>>>>>> where # was either 4, 10, 15, 36
> >>>>>>>>>
> >>>>>>>>> With either 36 or 15 core; segmentation fault
> >>>>>>>>>
> >>>>>>>>> With either 4 of 10 core, execution of the 20,000 steps of
> Example
> >> 1
> >>>>>> took
> >>>>>>>>> nearly two hours. From the .ou file in folder /0, the execution
> >> took
> >>>>>> 0.18
> >>>>>>>>> seconds.
> >>>>>>>>>
> >>>>>>>>> My question is what is wrong in my attempts to rationalize such
> >>>>>>>>> disappointing performance.
> >>>>>>>>>
> >>>>>>>>> Thanks for advice
> >>>>>>>>>
> >>>>>>>>> francesco pietra
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

This archive was generated by hypermail 2.1.6 : Sat Dec 07 2019 - 23:20:22 CST