CUDA error in cuda_check_local_progress

From: Abhishek TYAGI (atyagiaa_at_connect.ust.hk)
Date: Thu Apr 17 2014 - 02:41:11 CDT

Hi,

I am running a simulation for graphene and dna system. While running in my CPU their is no error, but while running on GPU Cluster (Nvidia, Cuda) I am using NAMD tool available on website (NAMD_2.9_Linux-x86_64-multicore-CUDA.tar.gz). The following error appears all the time. I tried to change timesteps, frequencies and other things too but i really dont understand what to do in this case.

I run the command for minimization but it is failed everytime:

% charmrun namd2 +idlepoll +p4 eq1.namd > eq1.log &

------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 0 (gpu10 device 0): unspecified launch failure

Charm++ fatal error:
FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 0 (gpu10 device 0): unspecified launch failure

The eq1.namd conf file is as follows:

#############################################################
## JOB DESCRIPTION ##
#############################################################

# Minimization and Equilibration of
# COMMENT ON YOUR SYSTEM HERE

#############################################################
## ADJUSTABLE PARAMETERS ##
#############################################################

structure ionized.psf
coordinates ionized.pdb

set temperature 298
set outputname eq1

firsttimestep 0

#############################################################
## SIMULATION PARAMETERS ##
#############################################################

# Input
paraTypeCharmm on
parameters par_all27_na.prm
parameters par_graphene.prm
temperature $temperature

# Force-Field Parameters
exclude scaled1-4
1-4scaling 1.0
cutoff 12.
switching on
switchdist 10.
pairlistdist 13.5

# Integrator Parameters
timestep 0.5
rigidBonds all
nonbondedFreq 2
fullElectFrequency 4
stepspercycle 10

# Constant Temperature Control
langevin off
langevinDamping 5
langevinTemp $temperature
langevinHydrogen off

# Output
outputName $outputname

restartfreq 500 ;# 500steps = every 1ps
dcdfreq 300
outputEnergies 100
outputPressure 100

#############################################################
## PBC PARAMETERS ##
#############################################################

# Periodic Boundary Conditions
cellBasisVector1 40.0 0.0 0.0
cellBasisVector2 0.0 40.0 0.0
cellBasisVector3 0.0 0.0 30.0
cellOrigin 0.0 0.0 0.0

#############################################################
## EXECUTION SCRIPT ##
#############################################################

# Minimization
minimize 100000
reinitvels $temperature

run 50000

Please suggest me how to resolve this issue.

Thanks in advance

Abhishek

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:21 CST