VMD-L Mailing List
From: Lenz Fiedler (l.fiedler_at_hzdr.de)
Date: Mon Apr 04 2022 - 10:45:52 CDT
- Next message: yjcoshc: "Re: VMD 1.9.4 crashes after running the "Merge Structures" plugin (with psfgen 2.0)"
- Previous message: John Stone: "Re: VMD 1.9.4 crashes after running the "Merge Structures" plugin (with psfgen 2.0)"
- In reply to: John Stone: "Re: VMD MPI error"
- Next in thread: John Stone: "Re: VMD MPI error"
- Reply: John Stone: "Re: VMD MPI error"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Hi John,
Thank you so much - the error was indeed from the tachyon MPI version!
It was just as you described, I had compiled the MPI version for both
VMD and tachyon. After using the serial version for the latter, I don't
get the crash anymore! :)
Does this mean then that the rendering will be done in serial only on
rank 0? I am trying to render an image based on a very large (9GB) .cube
file (with isosurface), and so far using either 1, 2 and 4 nodes with
360GB shared memory have resulted in a segmentation fault. I assume it
is memory related, because I can render smaller files just fine.
Also thanks for the info regarding the threading, I will keep that in mind!
Kind regards,
Lenz
-- Lenz Fiedler, M. Sc. PhD Candidate | Matter under Extreme Conditions Tel.: +49 3581 37523 55 E-Mail: l.fiedler_at_hzdr.de https://www.casus.science CASUS - Center for Advanced Systems Understanding Helmholtz-Zentrum Dresden-Rossendorf e.V. (HZDR) Untermarkt 20 02826 Görlitz Vorstand: Prof. Dr. Sebastian M. Schmidt, Dr. Diana Stiller Vereinsregister: VR 1693 beim Amtsgericht Dresden On 4/4/22 17:04, John Stone wrote: > Hi, > The MPI bindings for VMD are really intended for multi-node runs > rather than for dividing up the CPUs within a single node. The output > you're seeing shows that VMD is counting 48 CPUs (hyperthreading, no doubt) > for each MPI rank, even though they're all being launched on the same node. > The existing VMD startup code doesn't automatically determine when sharing > like this occurs, so it's just behaving the same way it would if you had > launched the job on 8 completely separate cluster nodes. You can set some > environment variables to restrict the number of shared memory threads > VMD/Tachyon use if you really want to run all of your ranks on the same node. > > The warning you're getting from OpenMPI about multiple initialization > is interesting. When you compiled VMD, you didn't compile both VMD > and the built-in Tachyon with MPI enabled did you? If Tachyon is also > trying to call MPI_Init() or MPI_Init_Thread() that might explain > that particular error message. Have a look at that and make sure > that (for now at least) you're not compiling the built-in Tachyon > with MPI turned on, and let's see if we can rid you of the > OpenMPI initialization errors+warnings. > > Best, > John Stone > vmd_at_ks.uiuc.edu > > On Mon, Apr 04, 2022 at 04:39:17PM +0200, Lenz Fiedler wrote: >> Dear VMD users and developers, >> >> >> I am facing a problem in running VMD using MPI. >> >> I compiled VMD from source (alongside Tachyon, which I would like to >> use for rendering). I had first checked everything in serial, there >> it worked. Now, after parallel compilation, I struggle to run VMD. >> >> E.g. I am allocating 8 CPUs on a cluster node that has 24 CPUs in >> total. Afterwards, I am trying to do: >> >> mpirun -np 8 vmd >> >> and I get this output: >> >> Info) VMD for LINUXAMD64, version 1.9.3 (April 4, 2022) >> Info) http://www.ks.uiuc.edu/Research/vmd/ >> Info) Email questions and bug reports to vmd_at_ks.uiuc.edu >> Info) Please include this reference in published work using VMD: >> Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual >> Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38. >> Info) ------------------------------------------------------------- >> Info) Initializing parallel VMD instances via MPI... >> Info) Found 8 VMD MPI nodes containing a total of 384 CPUs and 0 GPUs: >> Info) 0: 48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster >> Info) 1: 48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster >> Info) 2: 48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster >> Info) 3: 48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster >> Info) 4: 48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster >> Info) 5: 48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster >> Info) 6: 48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster >> Info) 7: 48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster >> -------------------------------------------------------------------------- >> Open MPI has detected that this process has attempted to initialize >> MPI (via MPI_INIT or MPI_INIT_THREAD) more than once. This is >> erroneous. >> -------------------------------------------------------------------------- >> [gv002:139339] *** An error occurred in MPI_Init >> [gv002:139339] *** reported by process [530644993,2] >> [gv002:139339] *** on a NULL communicator >> [gv002:139339] *** Unknown error >> [gv002:139339] *** MPI_ERRORS_ARE_FATAL (processes in this >> communicator will now abort, >> [gv002:139339] *** and potentially your MPI job) >> >> >> From the output it seems to me that each of the 8 MPI ranks assumes >> it is rank zero? At least the fact that each rank gives 48 CPUs >> (24*2 I assume?) makes me believe that. >> >> Could anyone give me a hint on what I might be doing wrong? The >> OpenMPI installation I am using has been used for many other >> programs on this cluster, so I would assume it is working correctly. >> >> >> Kind regards, >> >> Lenz >> >> -- >> Lenz Fiedler, M. Sc. >> PhD Candidate | Matter under Extreme Conditions >> >> Tel.: +49 3581 37523 55 >> E-Mail: l.fiedler_at_hzdr.de >> https://www.casus.science >> >> CASUS - Center for Advanced Systems Understanding >> Helmholtz-Zentrum Dresden-Rossendorf e.V. (HZDR) >> Untermarkt 20 >> 02826 Görlitz >> >> Vorstand: Prof. Dr. Sebastian M. Schmidt, Dr. Diana Stiller >> Vereinsregister: VR 1693 beim Amtsgericht Dresden >> >> > >
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
- Next message: yjcoshc: "Re: VMD 1.9.4 crashes after running the "Merge Structures" plugin (with psfgen 2.0)"
- Previous message: John Stone: "Re: VMD 1.9.4 crashes after running the "Merge Structures" plugin (with psfgen 2.0)"
- In reply to: John Stone: "Re: VMD MPI error"
- Next in thread: John Stone: "Re: VMD MPI error"
- Reply: John Stone: "Re: VMD MPI error"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]