Re: NAMD run on Intel hyperthreaded cores

From: Souvik Sinha (souvik.sinha893_at_gmail.com)
Date: Tue Feb 06 2018 - 14:17:50 CST

Surely, I will check it. Thank you.
On 7 Feb 2018 1:25 a.m., "Giacomo Fiorin" <giacomo.fiorin_at_gmail.com> wrote:

> Try running numatcl -H to understand how the "real" and the hyper-threaded
> cores are numbered. The problem may be in your case the argument to
> +pemap, not +setcpuaffinity itself.
>
> Giacomo
>
> On Tue, Feb 6, 2018 at 9:20 AM, Souvik Sinha <souvik.sinha893_at_gmail.com>
> wrote:
>
>> Sorry, that's my mistake. It was actually 2 CPUs with 16 cores each and 2
>> threads per core.
>>
>> Thaks for your reply.
>>
>> On Tue, Feb 6, 2018 at 6:01 PM, Jérôme Hénin <jerome.henin_at_ibpc.fr>
>> wrote:
>>
>>> Hi Souvik,
>>>
>>> This CPU has 16 cores, the 32 cores are virtual. You may get similar
>>> throughput with just 16 threads. At any rate, 64 seems excessive.
>>>
>>> https://ark.intel.com/products/91766/Intel-Xeon-Processor-E5
>>> -2683-v4-40M-Cache-2_10-GHz
>>>
>>> Jerome
>>>
>>> On 6 February 2018 at 09:41, Souvik Sinha <souvik.sinha893_at_gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> There is a cluster with nodes containing 32 cpu cores and each core is
>>>> doubly threaded and the processor is "Intel(R) Xeon(R) CPU E5-2683 v4". I
>>>> am currently using "NAMD_2.12_Linux-x86_64-multicore" binary in that
>>>> cluster. I am not exactly sure how to distribute hyperthreaded jobs. So, to
>>>> check the performance of hyperthreaded cores, the following commands are
>>>> tried and the resulting "Benchmark time" are given:
>>>>
>>>> charmrun namd2 +setcpuaffinity +pemap 0-31+32 +p64 <inputfile > :
>>>> Benchmark: 0.170056 days/ns
>>>>
>>>> charmrun namd2 +p64 <inputfile > : Benchmark: 0.168904 days/ns
>>>>
>>>>
>>>> charmrun namd2 +setcpuaffinity +pemap 0-31+32 +p32 <inputfile > :
>>>> Benchmark: 0.228512 days/ns
>>>>
>>>> charmrun namd2 +p32 <inputfile > : Benchmark: 0.157081 days/ns
>>>>
>>>> I can't see how this "+setcpuaffinity" is helping, as without defining
>>>> the mapping of PEs on threads is working fine (considering the Benchmark
>>>> time). Does multicore binary, by default, distribute processes on all
>>>> available threads (i.e. 64 in this case) and there is no need of
>>>> "+setcpuaffinity"? If that is true, then why with and
>>>> without "+setcpuaffinity", benchmark time differs significantly while
>>>> launching 32 processes?
>>>>
>>>> Please help me to understand this. Thank you.
>>>>
>>>> --
>>>> Souvik Sinha
>>>> Research Fellow
>>>> Bioinformatics Centre (SGD LAB)
>>>> Bose Institute
>>>>
>>>> Contact: 033 25693275
>>>>
>>>
>>>
>>
>>
>> --
>> Souvik Sinha
>> Research Fellow
>> Bioinformatics Centre (SGD LAB)
>> Bose Institute
>>
>> Contact: 033 25693275
>>
>
>
>
> --
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Contractor, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU
> https://github.com/giacomofiorin
>

This archive was generated by hypermail 2.1.6 : Sat Dec 14 2019 - 23:19:34 CST