Re: The Future of NAMD was Re: GeForce vs. Tesla

From: Axel Kohlmeyer (
Date: Wed Nov 25 2009 - 17:07:27 CST

On Wed, Nov 25, 2009 at 4:22 PM, Biff Forbush <> wrote:
> Hi Tian et al.,

hi biff,

>   Most often, specific questions (like yours) about NAMD and CUDA/GPUs go
> unanswered on this site.  Presumably this shows that the number of users who
> have real experience with NAMD/CUDA behavior is extremely low.  One hopes
> that this also indicates the developers are too busy working on NAMD / CUDA
> code to deal with individual replies.

well, at this point it is probably worth pointing out that "the developers"
is a _very_ small number of people, particularly considering the quite
large number of people that are actually using NAMD.

giving good answers takes time, and particularly asking for purchase
recommendations is very risky, since without proper research into
usage patterns and competence of users, one can easily give bad
advice. so it is very understandable that with something that is not
as mature a piece of code, like the CUDA modules, there is some
hesitation to give an answer.

>   It would be extremely helpful  if the NAMD team could make available a
> short "white paper" on the situation with GPUs and NAMD.  Potential users

i think the right poeple that should do this are not the NAMD developers,
but rather the enthusiast users, that _did_ try the current NAMD code on
GPUs already. there have been enough questions about it, so that one
can assume that this is a significant number. this all would just need to
be summarized.

> face purchasing decisions and are currently in the dark as to the present
> and future usefulness of GPUs for NAMD.  Perhaps some information could be
> supplied on the following: (1) Difficulties in applying GPUs to MD that have
> limited GPU utility.  (2) Current NAMD-CUDA -- is it production quality?
>  What is the timeline for bringing the CUDA (PTX 1.0) version to full
> functionality -- will that ever happen, given the clear incentive to migrate
> immediately to Fermi?  (3) Some discussion of best architectures (gpu/cpu,
> scaling, multi-node, etc.) and compilation of available benchmark results
> would be extremely useful.

i would say, that all of this is still too much a matter of experimentation
to put a final word on it. again, this is something where the user community
could be extremely helpful in providing their experience.

in my personal(!) opinion, which is mostly based on tests with an older
version of NAMD with CUDA support, i would say the following:

- i have only seen significant speedup with GT285/GT260/Tesla C1060
  hardware. in this case a _single_ GPU would be about as fast as a
  dual processor quad core nehalem EP node. any CUDA capable
  hardware with significantly less cores, can be used for development
  and testing but is not worth trying for production.

- GPU accelerated NAMD will allow you to run a given job on a
  smaller number of nodes, but it is still possible to outrun it by
  a well balanced massively parallel machine with a large number
  of CPU cores.

- I don't share your pessimism about fermi making older hardware
  obsolete for NAMD. since namd _can_ use single precision the
  nominal speedup from fermi is of the order of a factor of two.
  the new features in fermi will particularly help people that have
  little experience in GPU programming to get good performance
  and those that absolutely need double precision math.

- writing good software that is trusted by a large number of people
  with their research takes a lot of care and guts. i think folks occasionally
  forget what kind of a responsibility this is that jim and others are carrying.
  everybody wants to play with the latest toys, but one also has to
  appreciate the "it is done when it is done" approach.

- as far as reliability is concerned, i would say that the GPU version
  of NAMD is something for 'enthusiasts', i.e. people who don't mind
  if not everything is working perfectly and that don't plan to stake their
  career on it working. it can be useful and effective for some calculations
  but it doesn't support other features that people may need.

- the hardware configuration that gets the most out of a GPU seems to be having
  two CPU cores per GPU. also bus, chipset and other details play a role,
  so it is impossible to give sound advice without running tests on the exact
  planned configuration. i consider a machine with 8 GTX285 or Tesla C1060
  GPUs per node a nice marketing gag and otherwise i would expect it to be
  difficult to run. to the best of my knowledge CUDA has never been tested
  by nvidia people with more than 8 GPUs. dual GPU per PCIe slot solutions
  like Tesla S1070 or GTX295 trade a higher GPU density against better
  individual performance. so a larger compute problem is needed
  to offset the reduction in bandwidth.

>  If so, users should be advised that significant investment in
> GT200-generation GPUs of any sort is a complete mistake (probably a moot
> point in a month or so due to availability), as backward compatability will
> not be there if major improvements are made.

according to announcements at SC09, there is no fermi type hardware
to be expected available to regular people within the next 6 months.
the new architecture will be a major step and thus - as with any other
new compute architecture - i would expect that it will take a while until
all teething troubles have been resolved.

>   While it sounds like Anton may soon be the only game in town at the high
> end of long simulations, it seems NAMD should still have its place for

anton serves very special purposes. there is a lot of room
for several other MD codes with other focus.

> shorter simulations, particularly if it can finally utilize the increasing
> power of GPU technology in small commodity clusters.

the biggest benefit of GPUs in my opinion is currently that you get
the power of a small cluster on your desktop without having to have
a cluster. parallel GPU clusters have a lot of challenges to overcome
before they reach production level.

right now i would suggest to not build an entire cluster around
GPUs if you don't know what you are doing and have to rely on
other people's advice. it is better to go a two-tier approach and
make sure that you get a good performance with the CPUs, as
they are still much more flexible and consistent in performance,
while GPUs excel at some applications, but are still disappointing
at others. this will change over time, when people have gathered
more experience and know how on how to use GPUs well and
how to adapt or rewrite application to better suite the needs of
GPUs better. this is much more a "people constraint" rather than
a "hardware constraint" (see my remarks from above about
"the developers").

in scientific computing the smarter user and algorithm always
have an advantage over the brute force of the hardware, but
one has to give it some room.

hope that helps,

p.s.: just to reiterate and to make sure there is no misunderstanding
all of the above are my personal impressions and opinions.

> Regards,
> Biff Forbush
> Tian, Pu (NIH/NIDDK) [C] wrote:
>> Hello NAMD users,
>> I am planning to build a GPU cluster of 4 or 8 nodes, each with 4 GPUs.
>> The Tesla cards are very expensive compared to the GeForce series. I am not
>> sure if the expanded memories would matter that much for MD simulations of
>> systems with less than 100,000 atoms. The reliability is another
>> consideration. So as long as the GeForce cards have reasonable reliability,
>> I would want to wait for the consumer series (Ge3XX) of the "Fermi"
>> architecture cards instead of Tesla C2050/70.
>> I would really appreciate it if anybody who has used GeForce2XX cards for
>> a while could share their experiences! I assume that the coming GeForce3XX
>> would similar reliability as GeForce2XX series. Additionally, is that
>> possible to lower the clock rate of these cards a little bit to improve
>> their reliability?
>> Best,
>> Pu

Dr. Axel Kohlmeyer
Institute for Computational Molecular Science
College of Science and Technology
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:32 CST