AW: Why CPU Usage is low when I run ibverbs-smp-cuda version NAMD

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Nov 11 2014 - 07:03:53 CST

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Bin He
Gesendet: Dienstag, 11. November 2014 13:19
An: namd-l_at_ks.uiuc.edu; Norman Geist
Betreff: Re: namd-l: Why CPU Usage is low when I run ibverbs-smp-cuda version NAMD

 

Hi,

 

Thanks a lot for your kind reply.

 

I am sorry that the time data I provided was confused.

 

So I used the default binary(Download from the NAMD website) to test again.

 

The binary I used:

NAMD_2.10b1_Linux-x86_64-multicore-CUDA.tar.gz

NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA.tar.gz

Hardware

E5-2670*2

GPU k20m*2

IB

command:

1 node with multicore-CUDA version:

./namd2 +p16 +deices 0,1 ../workload/f1atpase2000/f1atpase.namd

 

1 node with ibverbs-smp-CUDA

/home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/charmrun ++p 16 ++ppn 8 ++nodelist nodelist ++scalable-start ++verbose /home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86 _64-ibverbs-smp-CUDA/namd2 +devices 0,1 /home/gpuusr/binhe/namd/workload/f1atpase2000/f1atpase.namd

With "++local", the application can not start. So I have to run with nodelist.

nodelist content

  group main ++shell ssh-

  host node330

  host node330

 

2 node with ibverbs-smp-CUDA

 /home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/charmrun ++p 32 ++ppn 8 ++nodelist nodelist2node ++scalable-start ++verbose /home/gpuusr/binhe/namd/NAMD_2.10b1_Linu x-x86_64-ibverbs-smp-CUDA/namd2 +devices 0,1 /home/gpuusr/binhe/namd/workload/f1atpase2000/f1atpase.namd

 nodelist content

group main ++shell ssh-

host node330

host node330

host node329

host node329

~~

4 node with ibverbs-smp-CUDA

 /home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/charmrun ++p 64 ++ppn 8 ++nodelist nodelist4node ++scalable-start ++verbose /home/gpuusr/binhe/namd/NAMD_2.10b1_Linu x-x86_64-ibverbs-smp-CUDA/namd2 +devices 0,1 /home/gpuusr/binhe/namd/workload/f1atpase2000/f1atpase.namd

  nodelist content

group main ++shell ssh-

host node330

host node330

host node329

host node329

host node328

host node328

host node332

host node332

 

TIME

f1atpase

numstep 2000; outputEnergies 100

version

CPU/node

GPU/node

NODE

TIME

multicores-CUDA

16

2

1

90

ibverbs-smp-CUDA

16

2

1

111.24

ibverbs-smp-CUDA

16

2

2

60

ibverbs-smp-CUDA

16

2

4

35

 

 

Actually, the ibverbs-smp-CUDA version scales not bad. BUT the the cpu usage:

Cpu(s): 53.1%us, 29.0%sy, 0.0%ni, 17.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

means that not all computing resources are used well.

 

Hey Binhe,

 

so far your benchmarking procedure now was better. Please notice that the distribution of processes and threads can also have an influence on the performance, means different +ppn can cause significant different timings.

 

And We can find that ibverbs-smp-CUDA is slower than multicores-CUDA in a node. Yes, Network bandwidth and latency may cause it, but ibverbs version without CUDA scale well and the cpu usage of ibverbs version is perfect when running several nodes.

 

What you forgot is, that the requirements to network bandwidth and latency scales with the computing power of the communication endpoints. So without CUDA the endpoints computing power is much less compared to with CUDA. So generally spoken:

 

The more computing power per node->the faster the part problems are solved->the more messaging is required->the more waiting times and redistributing of work occur. ;)

 

So, I do not think network bandwidth and latency is the key reason to cause it. How can I increase the cpu usage and accelerate namd?

 

For any binary CUDA or nor that runs across network add to namd2 “+idlepoll”

 

Cheers Norman Geist

 

Thanks

 

Binhe

 

------------------------
Best Regards!
Bin He
Member of IT
Unique Studio
Room 811,Building LiangSheng,1037 Luoyu Road, Wuhan 430074,P.R. China
☎:(+86) 13163260252
Weibo:何斌_HUST
Email:binhe_at_hustunique.com <mailto:Email%3Abinhe_at_hustunique.com>
Email:binhe22_at_gmail.com <mailto:Email%3Abinhe22_at_gmail.com>

 

 

2014-11-11 16:48 GMT+08:00 Norman Geist <norman.geist_at_uni-greifswald.de>:

Ok, you actually DON’T have a problem! You compare apples with oranges. To compare the performance of different binaries, you SHOULD use the same hardware. So you would want to test the ibverbs version on the machine with 4gpus+16 cores or vice versa the multicore binary on one of the 2GPU+12cores nodes.

 

Away from that, using multiple nodes introduce a new bottleneck which is network bandwidth and latency. So you always will have losses due the additional overhead and you cpus spending time in waiting for communication rather than working. This varies for different system sizes (Amdahl’s law). BUT actually your scaling isn’t that bad. From 2 to 4 nodes It scales by 46% instead of ideal 50% (u miss the 1 node case btw.).

 

So don’t care about CPU usage, only about the actual timings. Also try to namd2 “+idlepoll” which can improve parallel scaling across network.

Also for CUDA and small systems try in config:

 

twoawayx yes

 

only if that brings improvement try

 

twoawayx yes

twoawayy yes

 

only if that brings improvement try

 

twoawayx yes

twoawayy yes

twoawayz yes

 

Most of cases twoawayx is enough or already too much.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Bin He
Gesendet: Montag, 10. November 2014 20:51
An: Norman Geist
Cc: namd-l_at_ks.uiuc.edu
Betreff: Re: namd-l: Why CPU Usage is low when I run ibverbs-smp-cuda version NAMD

 

1. Using the servers mentioned above, I got the result:

 

multicores-CUDA

GPU

CORE

TIME

        
        
4

16

64s

        

 

ibverbs-smp-cuda

GPU

CORE

NODE

TIME

        
2 per node

12 per node

2

57

        
2 per node

12 per node

4

37

 

when running ibverbs-smp-cuda, the cpu usr usage is less than 50%, and sys usage is about 30%.

 

The cpu usage is too ugly. What I want to do is to find the reason why the cpu usage is so strange.

 

2. If I want to get the best performance with cuda, what parameters in the config file I can modify?

 

 

 

------------------------
Best Regards!
Bin He
Member of IT
Unique Studio
Room 811,Building LiangSheng,1037 Luoyu Road, Wuhan 430074,P.R. China
☎:(+86) 13163260252 <tel:%28%2B86%29%2013163260252>
Weibo:何斌_HUST
Email:binhe_at_hustunique.com <mailto:Email%3Abinhe_at_hustunique.com>
Email:binhe22_at_gmail.com <mailto:Email%3Abinhe22_at_gmail.com>

 

 

2014-11-10 14:53 GMT+08:00 Norman Geist <norman.geist_at_uni-greifswald.de>:

What you observe might be expectable as the CUDA code of NAMD is officially tuned for the multicore version. BUT, do you actually notice any performance difference regarding time/step?

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Bin He
Gesendet: Samstag, 8. November 2014 08:25
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: Why CPU Usage is low when I run ibverbs-smp-cuda version NAMD

 

Hi everyone,

 

I am a fresh man to NAMD.

 

The Desc of our clusters:

cpu:E5-2670(8cores)

memory:32GB

socket:2

network:IB

GPU:k20m*2

CUDA:6.5

workload: f1atpase(numsteps2000)

 

When I run the multicores-namd version, the cpu usage is about 100% and GPU usage is about 50%;

CMD:./namd2 +p16 +devices 0,1 ../workload/f1atpase/f1atpase.namd

cpu time is about 88s.

When I run the ibverbs-smp-cuda version, the cpu usage is about just 40%us and 30 % sy. GPU usage is about 50%.

CMD:/home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/charmrun ++p 60 ++ppn 15 ++nodelist nodelist ++scalable-start ++verbose /home/gpuusr/binhe/namd/NAMD_2.10b1_Linux-x86_64-ibverbs-smp-CUDA/namd2 +devices 0,1 /home/gpuusr/binhe/namd/workload/f1atpase/f1atpase.namd

cpu time is about 37s.

 

when I try to use setcpuaffinity, the result is worst.

So what is wrong with my operation?

Thanks

------------------------
Best Regards!
Bin He
Member of IT
Unique Studio

 

  _____

 <http://www.avast.com/>

Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus <http://www.avast.com/> Schutz ist aktiv.

 

 

 

  _____

 <http://www.avast.com/>

Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus <http://www.avast.com/> Schutz ist aktiv.

 

 

---
Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv.
http://www.avast.com

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:59 CST