Re: Performance peoblem about NAMD-CUDA benchmarks

From: Axel Kohlmeyer (
Date: Tue Apr 13 2010 - 13:44:21 CDT

On Sun, 2010-04-11 at 19:53 +0800, xiaoguang liu wrote:

> These results are much poorer than yours.
> As my hardwares are almost same to yours , why the performance is
> poorer than yours?
> Do I lose some important things?

there are two issues to consider.

your main board has the same total bus bandwidth
than a 4 slot board. you still have "only" two
tylersburg chipsets with 36 PCIe lanes. to get
to 8 slots, you have PCIe bridges (or multiplexers)
on board. it could just be that the PCIe bridge
inside the GTX-295 is more efficient or more
suitable for GPUs.

second, your GPU to CPU assignment may be poor.
due to having two southbridges half of the PCIe
slots "belong" to one CPU and the other half to
the other. if you transfer data from main memory
to GPU memory, you have an advantage, if the memory
and the bus a physically "closer" to the same CPU
otherwise you have to transfer via QPI from one
CPU to the other _twice_.

with some mainboards there are BIOS settings that
change the logical to physical mapping of the memory
(in the one i saw, it was called "NUMA aware OS").
setting this to "yes", helps on linux. otherwise
memory will be interleaved and thus allowing for
more evenly but on average slower performance of
individual tasks (e.g. on windows, which seems to
qualify as a non-NUMA aware OS in that BIOS's manual).

> Many thanks!
> Liuxg

Dr. Axel Kohlmeyer
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:54:00 CST