From: Mitchell Gleed (aliigleed16_at_gmail.com)
Date: Sun Jul 12 2015 - 22:36:54 CDT
Sorry for the late reply, I've been utilizing the university
supercomputer's GPU nodes for other simulations the past week and couldn't
test this out until those simulations finished up.
Since the GPU nodes have 24 cores, I followed your suggestion to do 4
replicas with 8 processes since I can't do 16 replicas with 32 processes.
With this setup, I started getting the error "CUDA-enabled NAMD requires at
least one patch per thread" for the namd/lib/replica/example test case.
I thought maybe the error meant I could only use CUDA-enabled NAMD with a
PME system, so I decided to make a test case for a PME system, adapting the
lib/replica/umbrella-2d case. I'm now able to get the GPU's to accelerate
the replica exchange simulations, even 1 replica:1 gpu:1 process. However,
I've found the GPU's only help if there's one GPU per replica, and when
#replicas > #gpu's, simulations run slower with the GPU's than without. I
assume that might just be the way things will have to be, but if there's
anything else I can try in order to get my ideal case of
16replicas:16procs:4gpu to benefit from the GPU's, that'd be great.
Here are the benchmark results for the ~30k atom system I tested, in case
4replicas 4procs 0gpu 1.61468 days/ns 4replicas 4procs 4gpu 0.669901
days/ns 4replicas 8procs 0gpu 1.11726 days/ns 4replicas 8procs 4gpu
0.445677 days/ns 4replicas 16procs 0gpu 1.03864 days/ns 16replicas 16procs
0gpu 1.87094 days/ns 16replicas 16procs 4gpu 2.52038 days/ns
Thanks for your help, Norman.
This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:13 CST