From: René Hafner TUK (hamburge_at_physik.uni-kl.de)
Date: Sat Aug 07 2021 - 17:44:01 CDT
yes I noticed that icc was used in the precompiled one but did not
think it had such a big influence wether its gcc or icc..
But indeed it does very much!
Using intel/2019 compiler
and comparim performances on hardware like:
* 12cores (of XEON SP 6126, best available at the cluster) +1xV100 gpu
Now I even get a slightly higher speed than the
precompiled one (compared without colvars)! :)
With colvars I now get:
* on 12cores + V100: ~350ns/day.
* 6cores +V100: ~ 250 ns/day.
And intel + CUDA 11.3 or CUDA 10.1 still makes no differences.
On 8/7/2021 10:40 PM, Giacomo Fiorin wrote:
> Hello René, if Colvars is not active in both runs (with pre-compiled
> and self-compiled) it is very unlikely that changes in its source
> files can impact performance. You could probably confirm this by
> using your own build of an /unmodified/ 2.14 source tree, which I
> would expect to behave the same way.
> One thing to note is that most pre-compiled builds of NAMD are built
> with the Intel compiler, see e.g. the following when you launch the
> "Linux-x86_64-multicore-CUDA" build:
> Info: Based on Charm++/Converse 61002 for
> Info: Built Mon Aug 24 10:10:58 CDT 2020 by jim on belfast.ks.uiuc.edu
> Jim Phillips, or one of the other core maintainers at UIUC may be able
> to comment further about what you could do on your end to reproduce
> the optimizations of the pre-compiled build on your own build.
> In general, I would also look into changing the number of CPU threads
> associated in 2.x NAMD. You have a fairly good performance to begin
> with, consistent with such a small system. The CPU-GPU communication
> step is one of the main factors limiting simulation speed, and this is
> definitely affected by how many CPU threads communicate with the GPU.
> For such a small system, fewer CPU threads per GPU would be more
> appropriate. (Note that this is valid for 2.x, NAMD 3.0 is entirely
> On Sat, Aug 7, 2021 at 3:18 PM René Hafner TUK
> <hamburge_at_physik.uni-kl.de <mailto:hamburge_at_physik.uni-kl.de>> wrote:
> Dear NAMD maintainers,
> I tried implementing a new colvar (which was successful) but
> wondered about speed reduction by it.
> Though I compared my self compiled version plain MD
> simulations (finally without colvars) with the precompiled binary
> from the website.
> The only thing changed in the code is in colvars module files
> that is not active for the following comparism.
> I obtain the following speed of simulations for a single
> standard cutoff etc. simulation (membrane + water, 7k atoms)
> Precompiled: 300 ns/day (4fs ,HMR)
> Selfcompiled: 162 ns/day (4fs timestep, HMR)
> This is not CUDA Version dependent as this results is stable with
> both CUDA 11.3 as well as with CUDA 10.1 (this latter version was
> used in the precompiled binary).
> Any help is appreciated.
> Kind regards
> I compiled it with the following settings:
> # building charmm
> module purge
> module load gcc/8.4
> ./build charm++ multicore-linux-x86_64 gcc -j16 --with-production
> module purge
> module load gcc/8.4
> module load nvidia/10.1
> ./config Linux-x86_64-g++ --charm-arch multicore-linux-x86_64-gcc
> --with-tcl --with-python --with-fftw --with-cuda --arch-suffix
> cd Linux-x86_64-g++
> # append the line CXXOPTS=-lstdc++ -std=c++11 to the Make.config
> ## if no CXXOPTS like --with-debug are defined then it will not work
> echo "CXXOPTS=-lstdc++ -std=c++11" >> Make.config
> echo "showin Make.config"
> cat Make.config
> # then run it
> make -j 12 | tee
> Dipl.-Phys. René Hafner
> TU Kaiserslautern
-- -- Dipl.-Phys. René Hafner TU Kaiserslautern Germany
This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST