VMD-L Mailing List
From: Martin Aumüller (aumueller_at_uni-koeln.de)
Date: Wed Jan 28 2009 - 06:35:19 CST
- Next message: John Stone: "Re: [patch] potential CUDA acceleration - load-balancing on different CUDA devices"
- Previous message: John Stone: "Re: trans routine"
- Next in thread: John Stone: "Re: [patch] potential CUDA acceleration - load-balancing on different CUDA devices"
- Reply: John Stone: "Re: [patch] potential CUDA acceleration - load-balancing on different CUDA devices"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Hi,
when trying out the CUDA accelerated potential computation I ran into a
problem with our hardware configuration: we have a Quadro FX 5800 (240 cores)
and a Quadro NVS 290 (16 cores) in one workstation. I experienced a tremendous
slow-down when using both CUDA devices: The even load distribution between all
CUDA devices leads to unnecessarily long run times, as the slowest device has
to do as much work as all the other devices and hence determines the total run
time.
I solved it by simply providing a mutex-protected global counter for the slice
loop for all threads. As this is a rather coarse-grain load distribution
scheme, I hope that the mutex does not lead to much overhead.
I'd be happy if you can apply the attached patch to VMD.
Regards,
Martin
- text/x-patch attachment: vmd-cuda-potential-balance.diff
- Next message: John Stone: "Re: [patch] potential CUDA acceleration - load-balancing on different CUDA devices"
- Previous message: John Stone: "Re: trans routine"
- Next in thread: John Stone: "Re: [patch] potential CUDA acceleration - load-balancing on different CUDA devices"
- Reply: John Stone: "Re: [patch] potential CUDA acceleration - load-balancing on different CUDA devices"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]