From: Andrew Pearson (andrew.j.pearson_at_gmail.com)
Date: Tue Feb 26 2013 - 08:14:52 CST
Hi Norman
OK, the first thing to note is that I have several 12-core nodes and
several 16-core nodes in my cluster. I restricted my initial runs to the
12-core nodes in order to keep things consistent. You were correct when you
said that Charm++ said I was using a 24-way SMP node because HT was
enabled. Now that HT is disabled, Charm++ sees a 12-way SMP.
Later when I did further tests, the 12-core nodes were occupied by another
user and so I switched to the 16-core nodes. Now, Charm++ sees a 16-way SMP
node (because I've now disabled HT). I would imagine that if I had used the
16-core nodes previously, Charm++ would have seen 32-way SMP.
I performed a scaling test on a single 16-core node, first with HT, and
then without. The HT result shows linear scaling until approximately 8
processors, and by 12 processors the departure from linearity is
significant. The non-HT result shows the same initial linear scaling, but
it continues all the way to 16 processors. At 12 processors the speedup is
9.5 and at 16 processors the speedup is 12.3.
I admit to not being an expert in the specific numerical method that NAMD
uses to solve the problem, but I imagine that it involves a lot of
communication, and that the resulting speedup will not be ideal. Is this
correct, or should I be expecting almost-16x speedup for 16 processors?
I think this explains everything. I'll send you my /proc/cpuinfo if you
really want to see it or you think there are still-unanswered questions.
Thanks,
Andrew
On Tue, Feb 26, 2013 at 1:27 AM, Norman Geist <
norman.geist_at_uni-greifswald.de> wrote:
> Hi Andrew,****
>
> ** **
>
> nice to hear that so far. But I’m still confused about:****
>
> ** **
>
> **1. **Charm++ telling it’s a 24way smp node.****
>
> **2. **The speedup being 12****
>
> **3. **You telling it’s a 16 core node****
>
> ** **
>
> Could you post the output of “cat /proc/cpuinfo”, so we can make sure that
> we fully understood what’s going on.****
>
> ** **
>
> Norman Geist.****
>
> ** **
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Andrew Pearson
> *Gesendet:* Montag, 25. Februar 2013 18:50
>
> *An:* Norman Geist
> *Cc:* Namd Mailing List
> *Betreff:* Re: namd-l: Always 24-way SMP?****
>
> ** **
>
> Hello again Norman****
>
> Yes, this was exactly the problem. I disabled hyperthreading on a compute
> node and performed my scaling test again, and this time the results were
> perfect. The speedup is now linear, and I get 12.3x for a 16-core run on a
> single 16-core node. Thank you for your advice and for pointing out this
> problem -- this would have affected many of our users, and not just NAMD
> users.****
>
> Andrew****
>
> ** **
>
> On Mon, Feb 25, 2013 at 10:06 AM, Norman Geist <
> norman.geist_at_uni-greifswald.de> wrote:****
>
> Andrew,****
>
> ****
>
> what kind of cpu are you using on this node. What you experience remembers
> me on hyper threading. Could it be that your machine has only 12 physical
> cores, and the rest are the hyper threading “logical” cores? If so, it’s no
> wonder that namd can’t get any benefits out of the virtual cores (actually
> only a second command schedule per physical core), which are usually
> thought to better fill up spaces in the cpu schedule when doing
> multitasking, as tasks also produce wait times for example with disk IO. As
> namd doesn’t leave to much spaces because of being a highly optimized code,
> the maximum speedup of 12 is reasonable.****
>
> So I think you have two six-core cpus on your node. Please let us know
> this first.****
>
> ****
>
> Furthermore, I never observed problems with the precompiled namd builds.
> And most things I read about it, were about infiniband and ofed stuff.
> Also, this problems were about succesfully starting namd, but not about bad
> parallel scaling.****
>
> ****
>
> Norman Geist.****
>
> ****
>
> *Von:* Andrew Pearson [mailto:andrew.j.pearson_at_gmail.com]
> *Gesendet:* Montag, 25. Februar 2013 13:28
> *An:* Norman Geist
> *Cc:* Namd Mailing List
> *Betreff:* Re: namd-l: Always 24-way SMP?****
>
> ****
>
> Hi Norman****
>
> ****
>
> Thanks for the response. I didn't phrase my question well - I know I'm
> experiencing scaling problems, and I'm trying to determine whether
> precompiled namd binaries are known to cause problems. I ask this since
> many people seem to say that you should compile namd yourself to save
> headaches.****
>
> ****
>
> Your explanation about charm++ displaying information about the number of
> cores makes sense. I'll bet that's what's happening.****
>
> ****
>
> My scaling problem is that for a given system (27 patches, 50000 atoms) I
> get perfect speedup until nprocs = 12 and then the speedup line goes almost
> flat. This occurs for runs performed on a single 16 core node. ****
>
> ****
>
> Andrew****
>
>
> On Monday, February 25, 2013, Norman Geist wrote:****
>
> Hi Andrew,****
>
> ****
>
> it’s a bad idea to ask someone else if you have scaling problems. You
> should know if you have or not. The information from the outfile just comes
> from the charm++ startup and is simply a information about the underlying
> hardware. It doesn’t mean it uses smp. It just tells you it’s a
> multiprocessor/multicore node. Watch the output carefully and you will see
> IMHO that it uses the right number of cpus (for example the Benchmark
> lines). So what kind of scaling problems you have? Don’t you get the
> expected speedup?****
>
> ****
>
> Norman Geist.****
>
> ****
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Andrew Pearson
> *Gesendet:* Freitag, 22. Februar 2013 19:30
> *An:* namd-l_at_ks.uiuc.edu
> *Betreff:* namd-l: Always 24-way SMP?****
>
> ****
>
> I'm investigating scaling problems with NAMD. I'm running precompiled
> linux-64-tcp binaries on a linux cluster with 12-core nodes using "charmrun
> +p $NPROCS ++mpiexec".
>
> I know scaling problems have been covered, but I can't find the answer to
> my specific question. No matter how many cores I use or how many nodes
> they are spread over, at the top of stdout charm++ always reports "Running
> on # unique compute nodes (24-way SMP)". It gets # correct, but it's
> always 24-way SMP. Is this supposed to be this way? If so, why?****
>
> Everyone seems to say that you should recompile NAMD with your own MPI
> library, but I don't seem to have problems running NAMD jobs to completion
> with charmrun + OpenMPI built with intel (except for the scaling). Could
> using the precompiled binaries result in scaling problems?
>
> Thank you.****
>
> ** **
>
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:20:57 CST