**From:** Francesco Pietra (*chiendarret_at_gmail.com*)

**Date:** Fri Feb 01 2019 - 09:24:44 CST

**Next message:**Francesco Pietra: "Fwd: Tuning QM-MM with namd-orca on one cluster node"**Previous message:**Francesco Pietra: "Re: Fwd: Tuning QM-MM with namd-orca on one cluster node"**In reply to:**Marcelo C. R. Melo: "Re: Tuning QM-MM with namd-orca on one cluster node"**Next in thread:**Francesco Pietra: "Fwd: Tuning QM-MM with namd-orca on one cluster node"**Reply:**Francesco Pietra: "Fwd: Tuning QM-MM with namd-orca on one cluster node"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Marcelo

Now that the !PAL directive has been fixed, I looked for (unsuccessfully)

how to implement the %pal directive

!BP86 ...

%pal

nproc 34

end

in place of the !PAL directive in

qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL8 SlowConv"

qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1

Print\[P_AtCharges_M\] 1 end"

in order to go beyond the PAL8 limit and exploit all core resources of the

node (while also allowing to set multinode)

thanks for advice

francesco

PS As far as I can understand, with "my" cluster, requesting nearly all

memory (as I did) provides all resources exclusively to me for the node

that is allocated.

On Thu, Jan 31, 2019 at 11:42 PM Marcelo C. R. Melo <melomcr_at_gmail.com>

wrote:

*> They are referring to tasks, not nodes. One could request 8 tasks in a
*

*> 4-core multi-threaded system, for example. Makes sense? (Though that would
*

*> not be advisable in your case).
*

*>
*

*> As I mentioned in my previous e-mail, you should check the commands that
*

*> control how ORCA distributes its computations in a cluster, as you may need
*

*> to provide a "hostfile" indicating the name(s) of the node(s) where ORCA
*

*> will find available processors. This is something every cluster makes
*

*> available when the queuing system reserves nodes for a job, so you should
*

*> find out how to access that in your cluster.
*

*>
*

*> I imagine the cluster's MPI system is not making the tasks available when
*

*> ORCA calls MPI.
*

*> And yes, ORCA will use MPI even to parallelize within a single node.
*

*>
*

*> Best
*

*> ---
*

*> Marcelo Cardoso dos Reis Melo, PhD
*

*> Postdoctoral Research Associate
*

*> Luthey-Schulten Group
*

*> University of Illinois at Urbana-Champaign
*

*> crdsdsr2_at_illinois.edu
*

*> +1 (217) 244-5983
*

*>
*

*>
*

*> On Thu, 31 Jan 2019 at 15:30, Francesco Pietra <chiendarret_at_gmail.com>
*

*> wrote:
*

*>
*

*>> Are PAL4 and PAL8 expecting four or eight nodes, respectively, rather
*

*>> than cores?
*

*>>
*

*>> ---------- Forwarded message ---------
*

*>> From: Francesco Pietra <chiendarret_at_gmail.com>
*

*>> Date: Thu, Jan 31, 2019 at 10:22 PM
*

*>> Subject: Re: namd-l: Tuning QM-MM with namd-orca on one cluster node
*

*>> To: Marcelo C. R. Melo <melomcr_at_gmail.com>
*

*>> Cc: NAMD <namd-l_at_ks.uiuc.edu>
*

*>>
*

*>>
*

*>> Hi Marcelo:
*

*>> Fist thanks.
*

*>> I moved away from MOPAC as I could not obtain SCF convergence, which was
*

*>> not unexpected because of the two iron ions. ORCA reached single point
*

*>> convergence in two runs of 125 iterations each (I was unable to set a flag
*

*>> for more iterations, "maxiter #" on the qmConfigLine was not accepted and a
*

*>> perusal of the manual did not help me). I used extensively ORCA years ago
*

*>> for CD simulation (excited states), but then never more.
*

*>> As to the size of the system, I am a biochemist, therefore interested in
*

*>> real systems (which is no justification, I admit) Anyway I used a most
*

*>> sloppy DFT and convergence in the hope that it is anyway more appropriate
*

*>> than semiempirical for my system.
*

*>>
*

*>> I must correct my previous post, as I missed to notice the line
*

*>>>
*

*>>> Charm++> cpu affinity enabled.
*

*>>
*

*>> In new runs, described below, affinity info was complete in namd.log
*

*>>
*

*>>> Charm++> cpu affinity enabled.
*

*>>> [1] pthread affinity is: 1
*

*>>> [3] pthread affinity is: 3
*

*>>> [4] pthread affinity is: 4
*

*>>> [2] pthread affinity is: 2
*

*>>> [0] pthread affinity is: 0
*

*>>
*

*>>
*

*>> I went before into troubles with PAL# then I (badly) forgot to reactivate
*

*>> it but, in my hands, such troubles remain. I.e., with either PAL8 or PAL4,
*

*>> the error, revealed in /0/*TmpOut, was
*

*>>
*

*>>> There are not enough slots available in the system to satisfy the 4
*

*>>> slots
*

*>>> that were requested by the application:
*

*>>> /cineca/prod/opt/applications/orca/4.0.1/binary/bin/orca_gtoint_mpi
*

*>>
*

*>>
*

*>>> Either request fewer slots for your application, or make more slots
*

*>>> available
*

*>>> for use.
*

*>>
*

*>>
*

*>> Settings were
*

*>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad PAL4 SlowConv" (or PAL8)
*

*>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
*

*>> Print\[P_AtCharges_M\] 1 end"
*

*>>
*

*>>
*

*>> #SBATCH --nodes=1
*

*>> #SBATCH --ntasks=1
*

*>> #SBATCH --cpus-per-task=36
*

*>> #SBATCH --time=00:30:00
*

*>> module load profile/archive
*

*>> module load autoload openmpi/2.1.1--gnu--6.1.0 (without activating mpi,
*

*>> the system complains that mpirun is unavailable and crashes. I must admit
*

*>> to be confused about that because for a single node mpi should not be
*

*>> requested)
*

*>>
*

*>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
*

*>> namd-01.conf +p5 +setcpuaffinity +showcpuaffinity > namd-01.log
*

*>>
*

*>> It seems that my settings are not providing hardware enough to ORCA
*

*>> despite the full node of 36 cores.
*

*>>
*

*>> Thanks for advice
*

*>>
*

*>> francesco
*

*>>
*

*>>
*

*>>
*

*>> On Thu, Jan 31, 2019 at 8:08 PM Marcelo C. R. Melo <melomcr_at_gmail.com>
*

*>> wrote:
*

*>>
*

*>>> Hi Francesco,
*

*>>>
*

*>>> The first line in your namd.log says
*

*>>> "Info: Running on 5 processors, 1 nodes, 1 physical nodes."
*

*>>> Which indicates NAMD is indeed using the 5 cores you requested with
*

*>>> "+p5". Some times "top" will show just one process, but the CPU usage of
*

*>>> the process will show 500%, for example, indicating 5 cores. This happens
*

*>>> in some cluster management systems too.
*

*>>>
*

*>>> As for ORCA, your "qm config line" does not indicate you are requesting
*

*>>> it to use multiple cores, so it most likely is really using just one. You
*

*>>> should be using the keyword "PAL?", where the question mark indicates the
*

*>>> number of requested cores: use "PAL8", for example, to ask for 8 cores.
*

*>>> You should become familiarized with the commands that control how ORCA
*

*>>> distributes its computations in a cluster (their manual is very good), as
*

*>>> you may need to provide a "hostfile" indicating the name(s) of the node(s)
*

*>>> where ORCA will find available processors. This is something every cluster
*

*>>> makes available when the queuing system reserves nodes for a job, so you
*

*>>> should find out how to access that in your cluster.
*

*>>>
*

*>>> As a final note, even in parallel, calculating 341 QM atoms (QM system +
*

*>>> link atoms) using DFT will be slow. Really slow. Maybe not 10 hours per
*

*>>> timestep, but you just went from a medium sized semi-empirical (parallel
*

*>>> MOPAC) calculation to large DFT one. Even in parallel, MOPAC could take a
*

*>>> couple of seconds per timestep (depending on CPU power). ORCA/DFT will take
*

*>>> much more than that.
*

*>>>
*

*>>> Best,
*

*>>> Marcelo
*

*>>> ---
*

*>>> Marcelo Cardoso dos Reis Melo, PhD
*

*>>> Postdoctoral Research Associate
*

*>>> Luthey-Schulten Group
*

*>>> University of Illinois at Urbana-Champaign
*

*>>> crdsdsr2_at_illinois.edu
*

*>>> +1 (217) 244-5983
*

*>>>
*

*>>>
*

*>>> On Thu, 31 Jan 2019 at 12:27, Francesco Pietra <chiendarret_at_gmail.com>
*

*>>> wrote:
*

*>>>
*

*>>>> Hello
*

*>>>> Having obtained very good performance of NAMD(nightbuild)-MOPAC on one
*

*>>>> cluster node on my system (large qm part, see below, including two iron
*

*>>>> ions) , I am now trying the same with NAMD(nightbuild)-ORCA on the same
*

*>>>> cluster (36 cores along two sockets). So far I was unable to have namd and
*

*>>>> orca running on more than one core each.
*

*>>>>
*

*>>>> namd.conf
*

*>>>> qmConfigLine "! UKS BP86 RI SV def2/J enGrad SlowConv"
*

*>>>> qmConfigLine "%%output Printlevel Mini Print\[ P_Mulliken \] 1
*

*>>>> Print\[P_AtCharges_M\] 1 end"
*

*>>>> (SCF already converged by omitting "enGrad")
*

*>>>>
*

*>>>> namd.job
*

*>>>> #SBATCH --nodes=1
*

*>>>> #SBATCH --ntasks=1
*

*>>>> #SBATCH --cpus-per-task=36
*

*>>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
*

*>>>> namd-01.conf +p5 +setcpuaffinity + showcpuaffinity > namd-01.log
*

*>>>>
*

*>>>> namd.log
*

*>>>> Info: Running on 5 processors, 1 nodes, 1 physical nodes.
*

*>>>> Info: Number of QM atoms (excluding Dummy atoms): 315
*

*>>>> Info: We found 26 QM-MM bonds.
*

*>>>> Info: Applying user defined multiplicity 1 to QM group ID 1
*

*>>>> Info: 1) Group ID: 1 ; Group size: 315 atoms ; Total PSF charge: -1
*

*>>>> Info: Found user defined charge 1 for QM group ID 1. Will ignore PSF
*

*>>>> charge.
*

*>>>> Info: MM-QM pair: 180:191 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 208:195 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 243:258 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 273:262 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 296:313 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 324:317 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 358:373 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 394:377 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 704:724 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 742:728 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 756:769 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 799:788 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 820:830 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 864:851 -> Value (distance or ratio): 1.09 (QM Group
*

*>>>> 0 ID 1)
*

*>>>> Info: MM-QM pair: 1461:1479 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 1511:1500 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 1532:1547 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 1566:1551 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 1933:1946 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 1991:1974 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 2011:2018 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 2050:2037 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 2072:2083 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 2098:2087 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 2139:2154 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> Info: MM-QM pair: 2174:2158 -> Value (distance or ratio): 1.09 (QM
*

*>>>> Group 0 ID 1)
*

*>>>> TCL: Minimizing for 200 steps
*

*>>>> Info: List of ranks running QM simulations: 0.
*

*>>>> Nothing about affinity!! (which was clearly displayed in MOPAC case)
*

*>>>>
*

*>>>> /0/qmm_0_input.TmpOut shows SCF ITERATIONS
*

*>>>>
*

*>>>> "top" shown a single PR for both namd and orca.
*

*>>>> ___-
*

*>>>> I had already tried a different job setting
*

*>>>> #SBATCH --nodes=1
*

*>>>> #SBATCH --ntasks-per-node=4
*

*>>>> #SBATCH --ntasks-per-socket=2
*

*>>>> module load profile/archive
*

*>>>> module load autoload openmpi/2.1.1--gnu--6.1.0
*

*>>>> /galileo/home/userexternal/fpietra0/NAMD_Git-2018-11-22_Linux-x86_64-multicore/namd2
*

*>>>> namd-01.conf +p5 > namd-01.log
*

*>>>>
*

*>>>> Here too, "top" showed a single PR for both namd and orca, so that in
*

*>>>> about 20 hous, namd.log was at "ENERGY 2", indicating that 1400 hrs were
*

*>>>> needed to complete the simulation.
*

*>>>>
*

*>>>> Thanks for advice
*

*>>>> francesco pietra
*

*>>>>
*

*>>>>
*

*>>>>
*

**Next message:**Francesco Pietra: "Fwd: Tuning QM-MM with namd-orca on one cluster node"**Previous message:**Francesco Pietra: "Re: Fwd: Tuning QM-MM with namd-orca on one cluster node"**In reply to:**Marcelo C. R. Melo: "Re: Tuning QM-MM with namd-orca on one cluster node"**Next in thread:**Francesco Pietra: "Fwd: Tuning QM-MM with namd-orca on one cluster node"**Reply:**Francesco Pietra: "Fwd: Tuning QM-MM with namd-orca on one cluster node"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

*
This archive was generated by hypermail 2.1.6
: Tue Dec 31 2019 - 23:20:28 CST
*