Re: How to solve the segmentation faults in compiling NAMD to build and test the Charm++/Converse library (MPI version)?

From: Brian Radak (brian.radak_at_gmail.com)
Date: Fri May 04 2018 - 08:05:21 CDT

I'm not a charm++ expert, but this looks like a compiling issue there. I
almost exclusively use the smartbuild.pl script, which has been fairly
robust for me, so long as you tell it to use the MPI installation that it
autodetects (that's not the default). Maybe give this a try?

On Fri, May 4, 2018 at 8:07 AM, LIAO Mingling <Mingling1949_at_hotmail.com>
wrote:

> Dear all,
>
> I am trying to compile NAMD on the upgraded cluster (x86_64 GNU/Linux)
> from source code with MPI version.
>
> When I typed the command to build and test the Charm++/Converse library:
>
> env MPICXX=mpicxx ./build charm++ mpi-linux-x86_64 --with-production
>
> I got the following error. Does anyone know how to solve this problem?
> Thanks in advance!
>
>
>
> *ERROR:*
>
> ###########
>
> [1525432978.493880] [login01:30113:0] sys.c:744 MXM WARN
> Conflicting CPU frequencies detected, using: 2000.00
>
> [1525432978.504097] [login01:30114:0] sys.c:744 MXM WARN
> Conflicting CPU frequencies detected, using: 2000.00
>
> [1525432978.507528] [login01:30111:0] sys.c:744 MXM WARN
> Conflicting CPU frequencies detected, using: 2000.00
>
> [1525432978.514122] [login01:30112:0] sys.c:744 MXM WARN
> Conflicting CPU frequencies detected, using: 2000.00
>
> [login01:30113:0] Caught signal 11 (Segmentation fault)
>
> [login01:30114:0] Caught signal 11 (Segmentation fault)
>
> [login01:30111:0] Caught signal 11 (Segmentation fault)
>
> [login01:30112:0] Caught signal 11 (Segmentation fault)
>
> ==== backtrace ====
>
> 2 0x00000000000687bc mxm_handle_error() /var/tmp/OFED_topdir/BUILD/
> mxm-3.6.3104/src/mxm/util/debug/debug.c:641
>
> 3 0x0000000000068d0c mxm_error_signal_handler()
> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3104/src/mxm/util/debug/debug.c:616
>
> 4 0x0000000000035270 killpg() ??:0
>
> 5 0x000000000002ccb0 ompi_comm_dup_with_info() /var/tmp/OFED_topdir/BUILD/
> openmpi-3.0.0rc6/ompi/communicator/comm.c:976
>
> 6 0x000000000005cba6 PMPI_Comm_dup() /var/tmp/OFED_topdir/BUILD/
> openmpi-3.0.0rc6/ompi/mpi/c/profile/pcomm_dup.c:63
>
> 7 0x00000000006bff0d ConverseInit() ??:0
>
> 8 0x0000000000591799 main() ??:0
>
> 9 0x0000000000021c05 __libc_start_main() ??:0
>
> 10 0x00000000004c66b9 _start() ??:0
>
> ===================
>
> ==== backtrace ====
>
> 2 0x00000000000687bc mxm_handle_error() /var/tmp/OFED_topdir/BUILD/
> mxm-3.6.3104/src/mxm/util/debug/debug.c:641
>
> 3 0x0000000000068d0c mxm_error_signal_handler()
> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3104/src/mxm/util/debug/debug.c:616
>
> 4 0x0000000000035270 killpg() ??:0
>
> 5 0x000000000002ccb0 ompi_comm_dup_with_info() /var/tmp/OFED_topdir/BUILD/
> openmpi-3.0.0rc6/ompi/communicator/comm.c:976
>
> 6 0x000000000005cba6 PMPI_Comm_dup() /var/tmp/OFED_topdir/BUILD/
> openmpi-3.0.0rc6/ompi/mpi/c/profile/pcomm_dup.c:63
>
> 7 0x00000000006bff0d ConverseInit() ??:0
>
> 8 0x0000000000591799 main() ??:0
>
> 9 0x0000000000021c05 __libc_start_main() ??:0
>
> 10 0x00000000004c66b9 _start() ??:0
>
> ===================
>
> ==== backtrace ====
>
> 2 0x00000000000687bc mxm_handle_error() /var/tmp/OFED_topdir/BUILD/
> mxm-3.6.3104/src/mxm/util/debug/debug.c:641
>
> 3 0x0000000000068d0c mxm_error_signal_handler()
> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3104/src/mxm/util/debug/debug.c:616
>
> 4 0x0000000000035270 killpg() ??:0
>
> 5 0x000000000002ccb0 ompi_comm_dup_with_info() /var/tmp/OFED_topdir/BUILD/
> openmpi-3.0.0rc6/ompi/communicator/comm.c:976
>
> 6 0x000000000005cba6 PMPI_Comm_dup() /var/tmp/OFED_topdir/BUILD/
> openmpi-3.0.0rc6/ompi/mpi/c/profile/pcomm_dup.c:63
>
> 7 0x00000000006bff0d ConverseInit() ??:0
>
> 8 0x0000000000591799 main() ??:0
>
> 9 0x0000000000021c05 __libc_start_main() ??:0
>
> 10 0x00000000004c66b9 _start() ??:0
>
> ===================
>
> ==== backtrace ====
>
> 2 0x00000000000687bc mxm_handle_error() /var/tmp/OFED_topdir/BUILD/
> mxm-3.6.3104/src/mxm/util/debug/debug.c:641
>
> 3 0x0000000000068d0c mxm_error_signal_handler()
> /var/tmp/OFED_topdir/BUILD/mxm-3.6.3104/src/mxm/util/debug/debug.c:616
>
> 4 0x0000000000035270 killpg() ??:0
>
> 5 0x000000000002ccb0 ompi_comm_dup_with_info() /var/tmp/OFED_topdir/BUILD/
> openmpi-3.0.0rc6/ompi/communicator/comm.c:976
>
> 6 0x000000000005cba6 PMPI_Comm_dup() /var/tmp/OFED_topdir/BUILD/
> openmpi-3.0.0rc6/ompi/mpi/c/profile/pcomm_dup.c:63
>
> 7 0x00000000006bff0d ConverseInit() ??:0
>
> 8 0x0000000000591799 main() ??:0
>
> 9 0x0000000000021c05 __libc_start_main() ??:0
>
> 10 0x00000000004c66b9 _start() ??:0
>
> ===================
>
> #################
>
>
>
>
>
> *COMMANDS:*
>
> ###################
>
> $ source /etc/profile.d/modules.sh
>
> $ module load impi
>
> $ module load intel
>
> $ tar xzf NAMD_2.12_Source.tar.gz
>
> $ cd NAMD_2.12_Source
>
>
>
> $ wget http://www.ks.uiuc.edu/Research/namd/libraries/fftw-
> linux-x86_64.tar.gz
>
> $ tar xzf fftw-linux-x86_64.tar.gz
>
> $ mv linux-x86_64 fftw
>
> $ wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.
> 5.9-linux-x86_64.tar.gz
>
> $ wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.
> 5.9-linux-x86_64-threaded.tar.gz
>
> $ tar xzf tcl8.5.9-linux-x86_64.tar.gz
>
> $ tar xzf tcl8.5.9-linux-x86_64-threaded.tar.gz
>
> $ mv tcl8.5.9-linux-x86_64 tcl
>
> $ mv tcl8.5.9-linux-x86_64-threaded tcl-threaded
>
>
>
> $ tar xf charm-6.7.1.tar
>
> $ cd charm-6.7.1
>
> $ env MPICXX=mpicxx ./build charm++ mpi-linux-x86_64 --with-production
>
> $ cd mpi-linux-x86_64/tests/charm++/megatest
>
> $ make pgm
>
> $ mpiexec -n 4 ./pgm
>
> ############################
>

This archive was generated by hypermail 2.1.6 : Sun Sep 15 2019 - 23:19:38 CDT