From: John Stone (johns_at_ks.uiuc.edu)
Date: Fri Sep 11 2015 - 11:19:34 CDT

Simon,
  Thanks for the followup email. I'm very happy to hear that you were
able to get ILS running well on the K20!

Cheers,
  John Stone

On Fri, Sep 11, 2015 at 10:15:12AM -0400, Simon Drr wrote:
> Just a followup for those who also run into this error.
>
> Use a bigger GPU!
> With a NVidia Tesla GPU (in my case the k20) with 4.7gb RAM the
> calculation uses only ~5sec per frame and runs with CUDA
>
> Cheers,
> Simon
>
> 2015-09-02 11:03 GMT-04:00 John Stone <johns_at_ks.uiuc.edu>:
> > Hi,
> > The current ILS code was written back in 2009.
> > The ILS GPU kernels have several hard limits that arise
> > from the size of some performance-critical data structures
> > that are put into tiny areas of very fast on-chip memory.
> >
> > On the most recent GPUs (e.g. GeForce 980s and later) it would
> > likely be possible to relax some of the ILS hard-limits because
> > the new hardware architectures have much greater flexibility in
> > terms of caching read-only data. This would require writing
> > new GPU kernels and exploring the design trade-offs in current
> > hardware.
> >
> > With significant work, the ILS algorithms could be parallelized
> > much more on both CPU and GPU (thereby allowing multi-GPU runs),
> > but this would require a big time investment.
> >
> > ILS isn't currently a VMD development focus because the postdoc
> > that was doing the development of the ILS code left for industry
> > and there hasn't been new ILS development work here since that time.
> >
> > I would suggest using the ILS tools as they are now and see if
> > they can work for you. If more people use these features of VMD,
> > there would be more motivation to revisit the performance of
> > the algorithms currently implemented at some later time.
> >
> > Cheers,
> > John Stone
> > vmd_at_ks.uiuc.edu
> >
> >
> > On Wed, Sep 02, 2015 at 10:42:03AM -0400, Simon Drr wrote:
> >> Hi all,
> >>
> >> I'm trying to run an ILS calculation on a GPU Cluster.
> >> The stats of the system I'm using:
> >> - Scientific Linux 6.1
> >> - 24gb RAM
> >> - 8 CPUs
> >> - 7 GPUs (NVidia GeForce GTX 580 with 1.5gb RAM)
> >>
> >> I use VMD 1.9.1 (OpenGL/CUDA enabled) and CUDA Driver v6.5
> >>
> >> My system has 60.000 atoms and is a 10ns equilibration from NAMD 2.9.
> >> The dcd contains a frame for each ps.
> >>
> >> When I try to run ILS with oxygen and a subres greater than 1 I cannot
> >> use CUDA for acceleration of the computing (".....max_binoffsets
> >> exceeded, using CPU....").
> >> Also it seems I'm using only one GPU not all 7 available ones. VMD
> >> detects them all and when i set the subres to 1 the calculation uses
> >> CUDA but only on GPU [0] (frametime ~20sec).
> >>
> >> My Questions:
> >> Is it possible to use all GPUs for the calculation?
> >> Is it possible that the memory of the GPUs is not sufficient to
> >> accelerate this calculation with CUDA when subres >= 2 ?
> >> Is there any way to circumvent this?
> >>
> >>
> >> See parts of the log below
> >>
> >> Cheers,
> >> Simon
> >>
> >> Info) Multithreading available, 8 CPUs detected.
> >> Info) Free system memory: 21775MB (90%)
> >> Info) Creating CUDA device pool and initializing hardware...
> >> Info) Detected 7 available CUDA accelerators:
> >> Info) [0] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
> >> Info) [1] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
> >> Info) [2] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
> >> Info) [3] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
> >> Info) [4] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
> >> Info) [5] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
> >> Info) [6] GeForce GTX 580 16 SM_2.0 @ 1.59 GHz, 1.5GB RAM, OIO, ZCP
> >>
> >> Info) ILS frame 1/10000
> >> Info) Coord setup: 0.088089 s
> >> Info) Aligning frames.
> >> Info) ComputeOccupancyMap_setup() 0.016058 s
> >> Using CUDA device: 0
> >> ***** ERROR: Exceeded MAX_BINOFFSETS for CUDA kernel
> >> Info) vmd_cuda_evaluate_occupancy_map() FAILED, using CPU for calculation
> >> Info) ComputeOccupancyMap: find_distance_exclusions() 0.232957 s
> >> Info) ComputeOccupancyMap: find_energy_exclusions()
> >> 329.343837 s -> 6690550 exclusions
> >> Info) ComputeOccupancyMap: compute_occupancy_multiatom() 303.300753 s
> >> Info) ComputeOccupancyMap_calculate_slab() 632.882040 s
> >> Info) Total frame time = 632.988717 s
> >
> > --
> > NIH Center for Macromolecular Modeling and Bioinformatics
> > Beckman Institute for Advanced Science and Technology
> > University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> > http://www.ks.uiuc.edu/~johns/ Phone: 217-244-3349
> > http://www.ks.uiuc.edu/Research/vmd/

-- 
NIH Center for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
http://www.ks.uiuc.edu/~johns/           Phone: 217-244-3349
http://www.ks.uiuc.edu/Research/vmd/