From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Sat Apr 06 2013 - 17:17:13 CDT
Hello:
Could you please help understanding which is which with GPU cards on a
Linux box with six processors,
running namd2.9?
MD simulation allocated (from log file) to the two GPU cards:
Pe 1 physical rank 1 will use CUDA device of pe 2
Pe 4 physical rank 4 binding to CUDA device 1 on gig64: 'GeForce GTX 680'
Mem: 2047MB Rev: 3.0
Pe 3 physical rank 3 will use CUDA device of pe 4
Pe 5 physical rank 5 will use CUDA device of pe 4
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX 680'
Mem: 2047MB Rev: 3.0
Did not find +devices i,j,k,... argument, using all
Pe 0 physical rank 0 will use CUDA device of pe 2
(where gig64 is the machine name)
the card IDs being:
nvidia-smi -L
GPU0 UID 600f64d0-2996-8e71-dca8-8d66f139f772
GPU1 UID 704bb625-95a7-8779-cfdc-14a90e6581fc
under regular conditions:
nvidia-smi
driver v. 304.48
0 GTX 680 Bus-Id 0000:02:00.0 mem-usage 4% 89MB/2047MB temp 70C
1 GTX 680 Bus-Id 0000:03:00.0 mem-usage 5% 93MB/2047MB temp 69C
FATAL ERROR: CUDA error in cuda_check_remote_progress on Pe 2 (gig64 device
0):
unspecified launch failure.
To which card does the error refer?
Thanks for advice about this error, which appeared to have been fixed,
whereas it reappeared on next day
francesco pietra
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:06 CST