Reducing the amount of work being done on CPU

From: Maxime Boissonneault (
Date: Tue Sep 01 2015 - 09:48:24 CDT

We have received very fat GPU nodes, which have 16 GPUs (8 x K80), and 2
12-core sockets (24 CPU cores).

While I got a rather good scaling with the ApoA1 benchmark on nodes with
8 x K20 + 2 x 10-core sockets (scaling was almost perfect between 1 GPU
+ 2 cores and 8 GPUs + 20 cores), the scaling is not nearly as
impressive on our very fat nodes.

I suspect the reason is because the low number of CPU cores per GPU is
becoming a bottle neck.
Is there any setting I should chnage in the benchmark to bias more
workload toward the GPUs rather than the CPUs ?


Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Instructeur Software Carpentry
Président - Comité de coordination du soutien à la recherche de Calcul Québec
Ph. D. en physique

