Re: pre compiled charmm-6.8.2 for namd2.13 nightly version compilation for multiple GPU node simulations

From: Jim Phillips (jim_at_ks.uiuc.edu)
Date: Wed Jan 30 2019 - 16:34:39 CST

A few suggestions:

1) Run ldd verbs-linux-x86_64-smp/tests/charm++/simplearrayhello so you
can see what shared libraries it needs.

2) Test the netlrts version to be sure your problem is not related to the
InfiniBand verbs library.

3) Show the actual command you are using to run and use ++verbose.

Jim

On Tue, 29 Jan 2019, Aravinda Munasinghe wrote:

> Hi Josh,
> Thank you very much for your reply. There was no specific reason for using
> intel compilers. As per your suggestion, I did try without icc ( and also
> with iccstatic). And still fails to run charmrun. Compilation do get
> completed with
>
> charm++ built successfully.
> Next, try out a sample program like
> verbs-linux-x86_64-smp/tests/charm++/simplearrayhello
>
> But, when I try to run hello executable with charmrun I get the following
> error,
>
> Charmrun> remote shell (localhost:0) started
> Charmrun> node programs all started
> Charmrun remote shell(localhost.0)> remote responding...
> Charmrun remote shell(localhost.0)> starting node-program...
> Charmrun remote shell(localhost.0)> remote shell phase successful.
> Charmrun> Waiting for 0-th client to connect.
> Charmrun> error attaching to node 'localhost':
> Timeout waiting for node-program to connect
>
> This is the same error I kept getting all this time when I try to compile
> it by my self. Only thing I cannot figure is how come precompiled version
> works perfectly, but when I try to build from scratch it never works.
> Any thoughts on this?
> Best,
> AM
>
>
> On Tue, Jan 29, 2019 at 12:42 PM Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov>
> wrote:
>
>> Hi Aravinda,
>>
>> Any particular reason you want to use the intel compilers? Since your goal
>> is to use CUDA anyway, and the integration between the CUDA toolkit and the
>> intel compilers tends to be hit or miss depending on the machine, I'd try
>> the GNU compilers first (just drop the icc from the build line). If you can
>> get that working, then you can spend a bit more time debugging exactly what
>> your error messages mean. It could just be as simple as using iccstatic
>> instead of icc, so that the libraries are bundled into the executable at
>> compile time, which would solve your LD_LIBRARY_PATH issues.
>>
>> -Josh
>>
>>
>>
>> On 2019-01-29 09:42:41-07:00 owner-namd-l_at_ks.uiuc.edu wrote:
>>
>> Dear NAMD users and developers,
>> I have recently attempted to compile namd2.13 nightly build to run
>> multiple GPU node replica exchange simulations using REST2 methodology.
>> First, I was able to run the current version of namd 2.13
>> Linux-x86_64-verbs-smp-CUDA (Multi-copy algorithms on InfiniBand) binaries
>> with charmrun in our university cluster using multiple node/GPU setup (with
>> slurm).
>> Then, I tried compiling namd 2.13 nightly version to use REST2 (since the
>> current version have a bug with selecting solute atom IDs as told here -
>> https://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2018-2019/1424.html
>> ), with information in NVIDIA site as well as what mentioned in the
>> release note. But I failed my self miserably as several others had ( as I
>> can see from the mailing thread). Since the precompiled binaries within the
>> current version work perfectly, I cannot think of a reason why my attempts
>> failed other than some issue related to library files and compilers I am
>> loading when building charm for multiple node GPU setup. I have used
>> following flags to build the charmm.
>> *./build charm++ verbs-linux-x86_64 icc smp --with-production *
>> I have used ifort and Intel/2018 compilers.
>> One thing I have noticed is that when I use precompiled namd2.13 I did not
>> have to link LD_LIBRARY_PATH. But I had to do so when I compiled it my
>> self (otherwise I keep getting missing library files error).
>> It would be a great help if any of you who have successfully compiled
>> multiple node GPU namd 2.13 could share your charmm--6.8.2 files along with
>> information on compilers you used, so I could compile namd by my self. Or
>> any sort of advice on how to solve this or sharing namd2.13 precompiled
>> binaries for the nightly version itself is highly appreciated.
>> Thank you,
>> Best,
>> --
>> Aravinda Munasinghe,
>>
>>
>
> --
> Aravinda Munasinghe,
>

This archive was generated by hypermail 2.1.6 : Wed Sep 18 2019 - 23:20:38 CDT