From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Wed Jun 11 2014 - 02:42:48 CDT
On Wed, Jun 11, 2014 at 3:36 AM, Vlastimil Zíma <zima_at_karlov.mff.cuni.cz>
wrote:
> Hi,
>
> I occasianlly run into Segmentation fault in my simulations but recently
> they become more often. I use NAMD2.9 which I downloaded from NAMD site as
> binary.
>
> It manifests on two of my machines so far, hence I don't expect any
> troubles with hardware. Both machines are running Debian wheezy, but with
> different GPU card. I have tried several versions of nvidia drivers.
>
have you considered the fact, that the seasons have changed and thus
temperatures are rising and hence cooling of computers is becoming a bigger
problem (basic thermodynamics teaches us that the amount of heat
transferred over an interface depends on the temperature difference...
> The simulation usually runs for 10 to 25 hours before it crashes.
>
that and the fact that crashes are increasing now support my hypothesis.
check the temperature on the GPUs. do they have ECC? if yes, try turning it
on and monitor the error logs.
axel.
>
> Are there any actions I should take in order to debug the segfault?
>
> Regards
> Vlastik
>
-- Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0 College of Science & Technology, Temple University, Philadelphia PA, USA International Centre for Theoretical Physics, Trieste. Italy.
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:27 CST