From: James Starlight (jmsstarlight_at_gmail.com)
Date: Wed Nov 06 2013 - 23:32:18 CST
I've gone to conclusion that using 2 GPUs simultaneously gave me the same
performance as 1 GPU like
charmrun +p12 ++runscript ./runscript.sh namd2 +idlepoll +devices 0
./md.conf >> ./output/log_1gpus_12proc
charmrun +p12 +ppn6 ++runscript ./runscript.sh namd2 +idlepoll +devices
0,1 ./md.conf >> ./output/log_2gpus_12proc_ppn 2
Doest it be due to the small CPU cores or addition RAM ( this system has 32
gb) is needed ? OR may be some extra options are needed in the config?
James
2013/11/4 Norman Geist <norman.geist_at_uni-greifswald.de>
> Yes, but notice that using virtual cores (Intel Hyperthreading - HT)
> usually comes with no or negative speedup. So assuming 6 instead of 12
> cores might be better. The ratio between processes and threads should be
> benchmarked.
>
>
>
> Norman Geist.
>
>
>
> *Von:* James Starlight [mailto:jmsstarlight_at_gmail.com]
> *Gesendet:* Montag, 4. November 2013 11:43
>
> *An:* Norman Geist; Namd Mailing List
> *Betreff:* Re: namd-l: Two GPU-based workstation
>
>
>
> Norman,
>
> In case of my system I've noticed trivial notation: the ussage of higher
> number of CPUs (with both GPUs active) gives higher performance.
>
> My 6 cores i7 is recognized as 12 nodes in Debian. In accordance to your
> suggestions the best palatalization on my workstation without any
> connection to other nodes could be by means of
>
> charmrun +p12 +ppn6 ++runscript ./runscript.sh namd2 +idlepoll +devices
> 0,1 ./md.conf >> ./output/log_2gpus_12proc_ppn 2
>
> doesnt it ?
>
> James
>
>
>
> 2013/11/4 Norman Geist <norman.geist_at_uni-greifswald.de>
>
>
>
> *Von:* James Starlight [mailto:jmsstarlight_at_gmail.com]
>
> *Gesendet:* Montag, 4. November 2013 08:52
>
> *An:* Norman Geist; Namd Mailing List
>
>
> *Betreff:* Re: namd-l: Two GPU-based workstation
>
>
>
> Norman,
>
>
>
> James,
>
> thanks for suggestions! As I've noticed NAMD directory also consist of
> libcudart.so.4 as 've found in the VMDs dir which correspond to the 4.xx
> version of cudatools (I have installed cuda-5.00) Would it be the source of
> conflicts between older and newest nvidia drivers?
>
> Usually not, and it’s anyway recommended to use the libcudart shipped with
> namd.
>
> What are another advantageous to run namd via charmrun ? Does it possible
> to show time remained to the end of simulation ?
>
> To understand this, you need to know that there are two main ways of how a
> parallelism can be implemented in a software. The 1st is “shared memory”
> the 2nd is “distributed memory”. The shared memory parallelism, also called
> “threading” usually uses one single process, using multiple cpu cores with
> threads (top shows a process using multiple 100 %). Usually it just runs
> iterations of loops that are mostly independent on different cores to
> speedup the software, instead of running them serially. The distributed
> memory parallelism uses multiple processes, doing predefined parts of the
> work in parallel while exchanging information over a network protocol. So
> if you have a multicore-only build of namd, without network support, you
> cannot run across multiple machines as they do not share the same memory.
> Moreover, both parallelization methods have advantages and disadvantages.
> Luckily, namd have two layers of parallelism using both methods. Usually
> using the distributed memory is faster for most applications but there can
> be some sweet spots on various platform using a mixture of both. For
> example running one process per cpu socket threading over all the cores of
> it, but that’s just theory.
>
> I didn’t notice that you just use the multicore version. Possibly running
> a dedicated process per GPU will come with an advantage in speed and is
> worth to try. You will need a build with network support. As long as you
> stay on one node, you can easily run with++local to charmrun. For multiple
> nodes you will need a passwordless ssh login between the nodes and the
> mentioned runscript method.
>
>
>
> So for one node, something like the following should show 2 processes,
> threading over the rest of the cores:
>
> charmrun +p6 +ppn3 ++local namd2 +idlepoll +devices 0,1 your.config
>
> Norman Geist
>
> James
>
>
>
>
>
> 2013/11/4 Norman Geist <norman.geist_at_uni-greifswald.de>
>
> The log file is what you see on the screen when starting namd like you did
> obviously. To get a file of it, append the command with >> my.log 2>>
> my.errors to redirect the output of stdout and stderr.
>
> GPU load monitoring is only enabled on quadro and tesla series cards.
>
> The easiest option to bypass the libcudart stuff is to use the charmrun
> ++runscript option. Save the following three lines to a file called
> runscript.sh in your namd folder and make it executable with "chmod +x
> runscript.sh", adapt the path to your namd location.
>
> #!/bin/bash
> export LD_LIBRARY_PATH=/your/namd/folder/:$LD_LIBRARY_PATH
> $*
>
> Now always start namd like "/your/namd/folder/charmrun +p6 ++runscript
> /your/namd/folder/runscript.sh /your/namd/folder/namd2 +idlepoll +devices
> 0,1 >> log 2>> errors"
>
> Norman Geist.
>
> > -----Ursprüngliche Nachricht-----
>
> > Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> > Auftrag von James Starlight
>
> > Gesendet: Sonntag, 3. November 2013 15:04
>
> > An: Ajasja Ljubetič; Namd Mailing List
>
> > Betreff: Re: namd-l: Two GPU-based workstation
> >
>
> > Its strange but no log file is found in the work directory and I could
> > not
> > find a suitable option in the conf file for the log saving. :) Almost
> > that I have its the 645000 steps computed during 6 hours of simulation
> > (100k atoms protein in explicit water)
> > Also I'd be thankful for list of all possible swithes accompanied with
> > the
> > namd2 terminal command
> >
> > James
> >
> >
> >
> >
> > 2013/11/3 Ajasja Ljubetič <ajasja.ljubetic_at_gmail.com>
> >
> > >
> > > On 3 November 2013 09:38, James Starlight <jmsstarlight_at_gmail.com>
> > wrote:
> > >
> > >> updating
> > >>
> > >> using namd2 +idlepoll +p4 +devices 0,1 ./restart.conf
> > >> I've launched simulation on both GPUs (according to thermal
> > monitoring in
> > >> nvidia-settings) but only half of cpus were fully loaded.
> > >>
> > >> Yes, naturally. Look up what the +p4 switch does. (Also read up on
> > > hyperthreading)
> > >
> > > By the way how I could monitor real GPU loading as well as namd
> > >> performance ( in ns\days or GFlops )?
> > >>
> > >
> > > Try looking in the namd log file for the ns/days speed.
> > >
> > > And out of interest do report the ns/day of
> > >
> > > namd2 +idlepoll +p6 +devices 0 ./restart.conf
> > > vs
> > > namd2 +idlepoll +p6 +devices 0,1 ./restart.conf
> > >
> > > Regards,
> > > Ajasja
> > >
> > >
> > >>
> > >>
> > >> James
> > >>
> > >>
> > >> 2013/11/1 James Starlight <jmsstarlight_at_gmail.com>
> > >>
> > >>> Ok. I'll try to make some simulations of this configure. The main
> > issue
> > >>> with which I can force is the possible conflict between that older
> > cuda
> > >>> library (used from vmd) and more newest development driver ( 5.5
> > version)
> > >>> which comes from installed cuda-5.5.
> > >>>
> > >>> By the way how I could use both of the GPUs simultaneously ? Just
> > use
> > >>> the below command?
> > >>>
> > >>> namd2 +idlepoll +p4 +devices 0,1 ./restart.conf
> > >>>
> > >>> Where 0 and 1 are the ids of my GPUs? Is there additional options
> > for
> > >>> synchronization of the simulations in dual-GPU regime ?
> > >>>
> > >>> James
> > >>>
> > >>>
> > >>> 2013/10/31 Aron Broom <broomsday_at_gmail.com>
> > >>>
> > >>>> don't replace anything, just point to the version of the library
> > in
> > >>>> your NAMD directory as you did. It should work fine.
> > >>>>
> > >>>>
> > >>>> On Thu, Oct 31, 2013 at 1:24 PM, James Starlight <
> > >>>> jmsstarlight_at_gmail.com> wrote:
> > >>>>
> > >>>>> Dear Namd users,
> > >>>>>
> > >>>>> I've build my new workstations consisted of two Titans with i6
> > (linux
> > >>>>> recognize it like 12 core process but actually it consist of 6
>
> > nodes)=
>
>
>
>
>
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:53 CST