Diagnosing CUDA (non)performance

tl;dr: When I run the f1atpase benchmark on a machine with a GTX 1080, I get a speedup of ?x with the CUDA version of namd 2.11. When I run my system on the same machine, I get no speed up. I’m trying to figure out why.

I’m running namd 2.11 on a linux box with a 16-core processor and a GTX 1080 GPU. When I run the f1atpase benchmarks, I get the following results:

CUDA: 2.3 ns/day
Non-CUDA: 0.3 ns/day

The speed up is approximately 7.5x.

I’m using this machine to simulate water flowing through a carbon nanotube. It’s a fairly small system, a few thousand carbon atoms and a few hundred waters. When I run my system on the same machine, I get:

CUDA: 39 ns/day
Non-CUDA: 36 ns/day

For a speed up of 1.08x. I’m wondering why this is, and if there’s anything I can do about it, or if it’s just the particulars of my system and how namd allocates resources. Any insight would be greatly appreciated. I’m included my .conf file below.


## Start of namd configuration file

# Limit the length of the log file

outputEnergies 100000

# Set up periodic boundary conditions

cellBasisVector1 50.000 0.000 0.000
cellBasisVector2 0.000 50.000 0.000
cellBasisVector3 0.000 0.000 491.200
cellOrigin 0 0 0

wrapWater off
wrapAll off

# Set the input files

structure Config.psf
coordinates Config.pdb

# Set the force field parameters

paraTypeCharmm on
parameters par_all27_prot_lipid.prm
exclude scaled1-4
cutoff 12.0
pairlistdist 14.0
switching on
switchdist 10.0
PME yes
PMEGridSpacing 1.0
FFTWWisdomFile FFTW_NAMD_2.11_nanotube.txt

# Set the integration parameters

timestep 1
nonbondedFreq 2
fullElectFrequency 4
stepspercycle 20
rigidBonds water

restartfreq 1000
dcdfreq 1000
veldcdfreq 1000
forcedcdfreq 1000

# Set the restraints on the carbon

constraints on
consref Config-restraint.pdb
conskfile Config-restraint.pdb
conskcol O

# Set the external forcing

constantForce yes
consForceScaling 0.014392621
consForceFile Config-forcing.pdb

# Set the Langevin thermostat

langevin on
langevinFile Config-langevin.pdb
langevinCol O
langevinTemp 300

# Set the execution parameters

outputname Data
bincoordinates Config.restart.coor
binvelocities Config.restart.vel
run 10000000

