Re: Diagnosing CUDA (non)performance

From: Brian Radak (bradak_at_anl.gov)
Date: Mon Oct 31 2016 - 13:37:13 CDT

Hi Chris,

I haven't profiled CUDA performance a ton myself, but what you describe
doesn't sound crazy - a system that small just won't scale very well.
I'd still be happy that you get a 3 ns/day boost - it could be worse,
you could be getting negative scaling!

This is just pure conjecture, but using a bigger system might actually
make things faster in the long run. However, you'll be dealing with a
lot of complex algorithmic prefactors, like the difference between
proper Ewald and PME, etc., so it's really hard to say what would happen.

Cheers,

Brian

On 10/31/2016 07:48 AM, Chris Goedde wrote:
> Hi all,
>
> tl;dr: When I run the f1atpase benchmark on a machine with a GTX 1080, I get a speedup of ?x with the CUDA version of namd 2.11. When I run my system on the same machine, I get no speed up. I’m trying to figure out why.
>
> I’m running namd 2.11 on a linux box with a 16-core processor and a GTX 1080 GPU. When I run the f1atpase benchmarks, I get the following results:
>
> CUDA: 2.3 ns/day
> Non-CUDA: 0.3 ns/day
>
> The speed up is approximately 7.5x.
>
> I’m using this machine to simulate water flowing through a carbon nanotube. It’s a fairly small system, a few thousand carbon atoms and a few hundred waters. When I run my system on the same machine, I get:
>
> CUDA: 39 ns/day
> Non-CUDA: 36 ns/day
>
> For a speed up of 1.08x. I’m wondering why this is, and if there’s anything I can do about it, or if it’s just the particulars of my system and how namd allocates resources. Any insight would be greatly appreciated. I’m included my .conf file below.
>
> Thanks.
>
> Chris Goedde
>
> ## Start of namd configuration file
>
> # Limit the length of the log file
>
> outputEnergies 100000
>
> # Set up periodic boundary conditions
>
> cellBasisVector1 50.000 0.000 0.000
> cellBasisVector2 0.000 50.000 0.000
> cellBasisVector3 0.000 0.000 491.200
> cellOrigin 0 0 0
>
> wrapWater off
> wrapAll off
>
> # Set the input files
>
> structure Config.psf
> coordinates Config.pdb
>
> # Set the force field parameters
>
> paraTypeCharmm on
> parameters par_all27_prot_lipid.prm
> exclude scaled1-4
> cutoff 12.0
> pairlistdist 14.0
> switching on
> switchdist 10.0
> PME yes
> PMEGridSpacing 1.0
> FFTWWisdomFile FFTW_NAMD_2.11_nanotube.txt
>
> # Set the integration parameters
>
> timestep 1
> nonbondedFreq 2
> fullElectFrequency 4
> stepspercycle 20
> rigidBonds water
>
> restartfreq 1000
> dcdfreq 1000
> veldcdfreq 1000
> forcedcdfreq 1000
>
> # Set the restraints on the carbon
>
> constraints on
> consref Config-restraint.pdb
> conskfile Config-restraint.pdb
> conskcol O
>
> # Set the external forcing
>
> constantForce yes
> consForceScaling 0.014392621
> consForceFile Config-forcing.pdb
>
> # Set the Langevin thermostat
>
> langevin on
> langevinFile Config-langevin.pdb
> langevinCol O
> langevinTemp 300
>
> # Set the execution parameters
>
> outputname Data
> bincoordinates Config.restart.coor
> binvelocities Config.restart.vel
> run 10000000
>
>

-- 
Brian Radak
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
9700 South Cass Avenue, Bldg. 240
Argonne, IL 60439-4854
(630) 252-8643
brian.radak_at_anl.gov

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:22:34 CST