13 CUDA Environment Routines

psb_cuda_init — Initializes PSBLAS-CUDA environment

call psb_cuda_init(ctxt [, device])

This subroutine initializes the PSBLAS-CUDA environment.

Type:

Synchronous.

On Entry

device

ID of CUDA device to attach to.
Scope: local.
Type: optional.
Intent: in.
Specified as: an integer value.  Default: use mod(iam,ngpu) where iam is the calling process index and ngpu is the total number of CUDA devices available on the current node.

Notes

  1. A call to this routine must precede any other PSBLAS-CUDA call.

psb_cuda_exit — Exit from PSBLAS-CUDA environment

call psb_cuda_exit(ctxt)

This subroutine exits from the PSBLAS CUDA context.

Type:

Synchronous.

On Entry

ctxt

the communication context identifying the virtual parallel machine.
Scope: global.
Type: required.
Intent: in.
Specified as: an integer variable.

psb_cuda_DeviceSync — Synchronize CUDA device

call psb_cuda_DeviceSync()

This subroutine ensures that all previosly invoked kernels, i.e. all invocation of CUDA-side code, have completed.

psb_cuda_getDeviceCount

ngpus =  psb_cuda_getDeviceCount()

Get number of devices available on current computing node.

psb_cuda_getDevice

ngpus =  psb_cuda_getDevice()

Get device in use by current process.

psb_cuda_setDevice

info = psb_cuda_setDevice(dev)

Set device to be used by current process.

psb_cuda_DeviceHasUVA

hasUva = psb_cuda_DeviceHasUVA()

Returns true if device currently in use supports UVA (Unified Virtual Addressing).

psb_cuda_WarpSize

nw = psb_cuda_WarpSize()

Returns the warp size.

psb_cuda_MultiProcessors

nmp = psb_cuda_MultiProcessors()

Returns the number of multiprocessors in the CUDA device.

psb_cuda_MaxThreadsPerMP

nt = psb_cuda_MaxThreadsPerMP()

Returns the maximum number of threads per multiprocessor.

psb_cuda_MaxRegistersPerBlock

nr = psb_cuda_MaxRegistersPerBlock()

Returns the maximum number of register per thread block.

psb_cuda_MemoryClockRate

cl = psb_cuda_MemoryClockRate()

Returns the memory clock rate in KHz, as an integer.

psb_cuda_MemoryBusWidth

nb = psb_cuda_MemoryBusWidth()

Returns the memory bus width in bits.

psb_cuda_MemoryPeakBandwidth

bw = psb_cuda_MemoryPeakBandwidth()

Returns the peak memory bandwidth in MB/s (real double precision).