How to use printf in device/global code in CUDA

If your card has 2.0 or higher compute capability, you can use printf inside your device/global code to print variables for debugging purposes.

You may get an error though. If that happens, add -arch=sm_20 to the end of your compile code. For example

nvcc -arch=sm_20

This way your code will take into account the compute capability you have in your card. For compatibility reasons, the default is sm_10.