Re: NAMD_2.9_Linux-x86_64-multicore-CUDA Segfaults

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Wed Jun 11 2014 - 02:42:48 CDT

On Wed, Jun 11, 2014 at 3:36 AM, Vlastimil Zíma <zima_at_karlov.mff.cuni.cz>
wrote:

> Hi,
>
> I occasianlly run into Segmentation fault in my simulations but recently
> they become more often. I use NAMD2.9 which I downloaded from NAMD site as
> binary.
>
> It manifests on two of my machines so far, hence I don't expect any
> troubles with hardware. Both machines are running Debian wheezy, but with
> different GPU card. I have tried several versions of nvidia drivers.
>

​have you considered the fact, that the seasons have changed and thus
temperatures are rising and hence cooling of computers is becoming a bigger
problem (basic thermodynamics teaches us that the amount of heat
transferred over an interface depends on the temperature difference...​

> The simulation usually runs for 10 to 25 hours before it crashes.
>

​that and the fact that crashes are increasing now support my hypothesis.
check the temperature on the GPUs. do they have ECC? if yes, try turning it
on and monitor the error logs.

axel.​

>
> Are there any actions I should take in order to debug the segfault?
>
> Regards
> Vlastik
>

-- 
Dr. Axel Kohlmeyer  akohlmey_at_gmail.com  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:27 CST