AW: Reducing the amount of work being done on CPU

From: Norman Geist (
Date: Tue Sep 01 2015 - 12:47:56 CDT

It sometimes helps to artifically increase the number of patches by adding

twoawayx yes

to the jobscript. Optionally if that brings improvement, one can try twoawayy and twoawayz additionally.

There might be two major problems here:

1. Raised computing power for same sized system might result in outscaling.
2. PCI-E bandwidth is saturated.

I suspect your system might be too small. If the above solution with twoawayx helps, this would point out already.

Norman Geist

-----Ursprüngliche Nachricht-----
Von: [] Im Auftrag von Maxime Boissonneault
Gesendet: Dienstag, 1. September 2015 16:48
An: namd-l <>
Betreff: namd-l: Reducing the amount of work being done on CPU

We have received very fat GPU nodes, which have 16 GPUs (8 x K80), and 2
12-core sockets (24 CPU cores).

While I got a rather good scaling with the ApoA1 benchmark on nodes with
8 x K20 + 2 x 10-core sockets (scaling was almost perfect between 1 GPU
+ 2 cores and 8 GPUs + 20 cores), the scaling is not nearly as
impressive on our very fat nodes.

I suspect the reason is because the low number of CPU cores per GPU is
becoming a bottle neck.
Is there any setting I should chnage in the benchmark to bias more
workload toward the GPUs rather than the CPUs ?


Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Instructeur Software Carpentry
Président - Comité de coordination du soutien à la recherche de Calcul Québec
Ph. D. en physique

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:17 CST