From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Mar 07 2013 - 00:47:59 CST
Ok, it’s not surprising that additional used features increase runtime,
nevertheless from there on, it should scale again, so I think, or at least
better. The settings you posted for the ib0 network had a very low mtu,
maybe ask your admins if they can increase it to 65520, which always
improved scaling a lot, when I tested it. Or use an ibverbs binary of namd.
It seems that using LES need a lot more of bandwidth.
Additionally to improve scaling, try +idlepoll to the namd2 command which
gives a very nice gain over multiple nodes sometimes, as it decreases
latency. Also it cannot harm the timing so is not dangerous to use and have
not to be check for being faster every time.
Norman Geist.
Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Donnerstag, 7. März 2013 04:51
An: Norman Geist
Betreff: Re: namd-l: LES very slow
Hi Norman,
One thing we have realized that may be important is that LES seems to be
performing poorly even on a single CPU, which may suggest a problem with
(our?) setup of LES in NAMD rather than network issues. For example if we
take our simulation system (28084 atoms) we are using for LES and run it on
a single CPU we get the following benchmark time:
Info: Benchmark time: 1 CPUs 1.16222 s/step 13.4516 days/ns 392.777 MB
memory
If we take the equivalent system with only one copy of the dye rather than
20 (24702 atoms in total) and run it without LES we get the following:
Info: Benchmark time: 1 CPUs 0.35795 s/step 2.07147 days/ns 211.812 MB
memory
So even on a single CPU, the use of LES is slowing the run down
considerably. The issues with the scaling on multiple CPUs may stem from the
same problem that causes this massive slow down on a single CPU.
To your network question, I asked the people running the supercomputer and
they said the nodes communicate via an infiniband connection.
2013/3/6 Norman Geist <norman.geist_at_uni-greifswald.de>
Hi again Siri,
which of these network connections do you use? If you do not know, watch at
the machinefile/nodelist from your queuing system, and try to find out which
network is used between the nodes when they resolve each other via the
hostnames there. The easiest way to do this, is to log on to one node, and
ping another node via the hostname used in the machinefile, this should
point out which ip answeres.
Norman Geist.
Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Dienstag, 5. März 2013 08:14
An: Norman Geist
Betreff: Re: namd-l: LES very slow
Yes, I'm using a queing system.
The output is:
eth0 Link encap:Ethernet HWaddr 98:4B:E1:74:EE:0C
inet addr:10.2.0.19 Bcast:10.2.255.255 Mask:255.255.0.0
inet6 addr: fe80::9a4b:e1ff:fe74:ee0c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:43247 errors:0 dropped:0 overruns:0 frame:0
TX packets:16604 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7073185 (6.7 MiB) TX bytes:2108682 (2.0 MiB)
Interrupt:30 Memory:ec000000-ec012800
eth1 Link encap:Ethernet HWaddr 98:4B:E1:74:EE:0E
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:37 Memory:ea000000-ea012800
eth2 Link encap:Ethernet HWaddr 98:4B:E1:74:EE:24
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:31 Memory:f0000000-f0012800
eth3 Link encap:Ethernet HWaddr 98:4B:E1:74:EE:26
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:39 Memory:ee000000-ee012800
eth4 Link encap:Ethernet HWaddr 28:92:4A:D1:77:08
inet addr:202.8.34.206 Bcast:202.8.34.223 Mask:255.255.255.224
inet6 addr: fe80::2a92:4aff:fed1:7708/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:295569 errors:0 dropped:0 overruns:0 frame:0
TX packets:291870 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1545433802 (1.4 GiB) TX bytes:37439917 (35.7 MiB)
Interrupt:67
eth5 Link encap:Ethernet HWaddr 28:92:4A:D1:77:0C
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:71
Ifconfig uses the ioctl access method to get the full address information,
which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are
displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib0 Link encap:InfiniBand HWaddr
80:00:00:03:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:172.16.0.19 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::211:7500:79:4810/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2770 errors:0 dropped:0 overruns:0 frame:0
TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:170764 (166.7 KiB) TX bytes:4539 (4.4 KiB)
Ifconfig uses the ioctl access method to get the full address information,
which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are
displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib1 Link encap:InfiniBand HWaddr
80:00:00:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:172.16.10.19 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::211:7500:79:4811/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2754 errors:0 dropped:0 overruns:0 frame:0
TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:169867 (165.8 KiB) TX bytes:1060 (1.0 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:28 errors:0 dropped:0 overruns:0 frame:0
TX packets:28 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2238 (2.1 KiB) TX bytes:2238 (2.1 KiB)
2013/3/5 Norman Geist <norman.geist_at_uni-greifswald.de>
Ok, so let’s try something. Could you please post the output of
“/sbin/ifconfig –a”. Also, are you using an queuing system for submitting
your jobs.
Regarding the output you posted yesterday, connected mode is ok but mtu
should be 65520 IMHO. If you are already using the right network, which I
want to find out with the upper questions, changing this setting should
improve scaling a lot.
Norman Geist.
Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Montag, 4. März 2013 09:09
An: Norman Geist
Betreff: Re: namd-l: LES very slow
I don't know what infiniband is? I don't know about ibverbs mpi or ipoib
either but as the run file contains "mpirun namd2" I'm thinking the mpi
option?
The output of cat /sys/class/net/ib0/m* is: connected
1500
I really appreciate your help! Thanks a lot!
2013/3/4 Norman Geist <norman.geist_at_uni-greifswald.de>
Hi Siri,
did you use the infiniband in this test?
Well the numbers are not very accurate, but there shouldn’t be too much
difference here. Do you use a ibverbs mpi or ipoib ? What’s the output of
“cat /sys/class/net/ib0/m*”
Norman Geist.
Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Donnerstag, 28. Februar 2013 23:24
An: Norman Geist
Betreff: Re: namd-l: LES very slow
This is the head of the log file:
Charm++> Running on MPI version: 2.1
Charm++> level of thread support used: MPI_THREAD_SINGLE (desired:
MPI_THREAD_SINGLE)
Charm++> Running on non-SMP mode
Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (12-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: NAMD 2.9 for Linux-x86_64-MPI
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
With 2 cpus on one node the time is now 11-12days pr. ns.
With 1 cpu on two nodes the time is 13 days pr. ns.
1 cpu on one node is 18 days pr. ns today.
2013/2/28 Norman Geist <norman.geist_at_uni-greifswald.de>
Hi again Siri,
ok so your basic setup is ok. But what’s about LES. I just can’t imagine a
reason for this kind of simulation being limited in scaling. You are right
with the 255 copies, seems I had an older manual. Can we see an output of
this LES simulations (head). Also, as a quick test for your node
interconnect, could you try the following with your LES simulation:
1. 2 Cores @ 1 Node = 2 Processes
2. 2 Cores @ 2 nodes = 2 Processes
So we can see if using your network makes a big difference. Nevertheless,
the scaling on one node should be better.
Now the developers could jump in and tell if there are known scaling issues
when using LES.
Norman Geist.
Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Mittwoch, 27. Februar 2013 23:22
10.2.1.19
An: Norman Geist
Betreff: Re: namd-l: LES very slow
Hi
The manual for NAMD 2.9 says up to 255 copies is supported.When I do a
normal simulation with 1 cpu and 12 cpus the simulation time is estimated to
be 2,2 and 0,4 days pr. ns, respectively. If I increase the number to 24 (2
nodes times 12 cpus) the estimated time is 0,2 days pr. ns. If I do the same
for the LES system I get no decrease in simulation time.
2013/2/27 Norman Geist <norman.geist_at_uni-greifswald.de>
Hi Siri,
so far, I couldn’t find a reason for your problem in your hardware. I don’t
know what LES is actually doing, but the manual tells that NAMD only
supports up to 15 copies.
Nevertheless, I can’t see a reason why this kind of computation should harm
the good scaling of namd. Does “normal” md scale better, so we can identify
if it is a general problem of your setup, or if it is due LES.
Regards
Norman Geist.
Von: Siri Søndergaard [mailto:siris2501_at_gmail.com]
Gesendet: Mittwoch, 27. Februar 2013 00:11
An: Norman Geist
Betreff: Re: namd-l: LES very slow
I've attached the files... I hope this is what you were looking for.
2013/2/26 Norman Geist <norman.geist_at_uni-greifswald.de>
Hi Siri,
to help you we could use some information about the hardware you use.
Approximating you use linux, please supply the output of the following
commands:
1. cat /proc/cpuinfo
2. lspci
This should be enough for the beginning.
PS: If not using linux, please give otherwise information about the hardware
you use.
Norman Geist.
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Siri Søndergaard
Gesendet: Dienstag, 26. Februar 2013 01:00
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: LES very slow
Hi
I'm trying to run LES on a system of ~30.000 atoms. I'm using 20 copies of
each of two dyes attached to DNA. The problem is when I extend the
simulation to more than one cpu the scaling does not increase accordingly.
An increase from one to 12 cpus only gives a decrease in simulation time
from ~9 days to ~4 days pr. ns. Does anybody know how to solve this?
Best regards, Siri
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:01 CST