Re: NAMD2.9 single-node benchmarks, 0-2 Kepler GPU's.

From: Aron Broom (broomsday_at_gmail.com)
Date: Tue Apr 30 2013 - 23:59:32 CDT

Thanks for sharing the benchmarks!

As general remarks/answers to your questions:

1) NAMD is currently limited considerably by GPU bandwidth, so you might
see some big improvements if you get the PCI 3.0 working. This is not the
case for AMBER.

2) I wouldn't do any hyper-threading, I've never seen it help and often
seen it hurt, which is not shocking given the intensity of the task.
Certainly if you want better performance more CPUs will help, particularly
if you get the PCI working.

3) NAMD can run using AMBER inputs, there is a section in the manual about
how to get equivalent behavior if you really want to compare.

4) For implicit solvent or small explicit systems ( < 10k atoms ) AMBER
will dominate NAMD for performance. By the time your systems are > 100k
the difference will be less extreme.

5) The GPU enhanced AMBER is missing a lot of the AMBER functionality. You
should make sure it can actually do what you want as it could be years in
coming, NAMD does not have this problem.

6) I think currently for all the MD codes out there if you have multiple
GPUs the best use is to run multiple simulations.

7) If you want performance and function, and have even casual programming
abilities, check out OpenMM.

On Tue, Apr 30, 2013 at 10:17 PM, Aaron Cann <aaron_at_canncentral.net> wrote:

> Hello all, I thought Id write some of my experiences setting up a basic
> NAMD 2.9 GPU workstation. Lots of benchmarks, and some conclusions and a
> few questions for the illuminati.****
>
> SETUP: System is an Intel LGA 2011 system with two Nvidia GTX 4G 670s in
> the x16 slots. Theyre running at PCIE 2.0 thanks to the strange snafus
> with Sandy Bridge-E CPUs at 3.0 speeds. CPU is a 4-core hyperthreaded i7,
> 3.6GHz. Running Ubuntu 13.04, NAMD 2.9, either with or without CUDA, 64
> bit, latest NVIDIA drivers. Displays were hanging off the GPUs, not doing
> anything during the runs. Switching to console mode didnt change anything.
> Deliberately loading the GPU with a large VMD rotation slowed runs down.
> ****
>
> Note that I cite thread numbers: up to 8 threads on 4 cores. > 4 threads =
> fake extra CPUs.****
>
> Standard namd benchmarks except outputEnergies=600. Dhfr was adapted from
> the AMBER benchmark by Charles Brooks, 2 fs timestep.****
>
> STMV benchmark.****
>
> Ns/day. T= # threads, (may be 2x # of cores.)****
>
> T 1GPU 2GPU****
>
> 1 0.099 0.100****
>
> 2 0.151 0.193****
>
> 3 0.175 0.220****
>
> 4 0.182 0.282****
>
> 8 0.186 0.282****
>
> Thoughts-- ****
>
> STMV is a large dataset. 2 threads gets 94% of the horsepower out of 1
> gpu, and moving from 2T/1G 4T/2G gives pretty good scaling with this
> dataset (94% of doubled output). This dataset looks largely GPU bound,
> although a six core CPU would still have been slightly better. Adding a 3
> rd GPU here (on the existing four core CPU) would be an inefficient use
> of the 3rd gpu. ****
>
> ** **
>
> APOA1****
>
> Ns/day****
>
> T 0GPU 1GPU 2GPU****
>
> 1 0.12 1.10 1.10****
>
> 2 0.24 1.94 2.18****
>
> 3 0.33 2.21 2.79****
>
> 4 0.30 2.31 3.10****
>
> 8 0.31 2.28 3.70****
>
> Moving from 1 threads on 1 GPU to 2T/2G again has excellent scaling, 99%
> of doubled output, although most of the increase was from the second core,
> not the GPU. Getting to 96% of peak output of 1 GPU required 3 threads,
> not two. Moving from 2 threads/1GPU to 4/2 gave only an 80% speedup,
> suggesting communications was becoming an issue instead of GPU horsepower--047d7b2e4d32d4c23a04dba0fbff--

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:11 CST