call psb_cuda_init(ctxt [, device])
This subroutine initializes the PSBLAS-CUDA environment.
Type:
Synchronous.
On Entry
device
ID of CUDA device to attach to.
Scope: local.
Type: optional.
Intent: in.
Specified as: an integer value. Default: use mod(iam,ngpu)
where iam
is
the calling process index and ngpu
is the total number of CUDA devices
available on the current node.
Notes
A call to this routine must precede any other PSBLAS-CUDA call.
call psb_cuda_exit(ctxt)
This subroutine exits from the PSBLAS CUDA context.
Type:
Synchronous.
On Entry
ctxt
the communication context identifying the virtual parallel machine.
Scope: global.
Type: required.
Intent: in.
Specified as: an integer variable.
call psb_cuda_DeviceSync()
This subroutine ensures that all previosly invoked kernels, i.e. all invocation of CUDA-side code, have completed.
ngpus = psb_cuda_getDeviceCount()
Get number of devices available on current computing node.
ngpus = psb_cuda_getDevice()
Get device in use by current process.
info = psb_cuda_setDevice(dev)
Set device to be used by current process.
hasUva = psb_cuda_DeviceHasUVA()
Returns true if device currently in use supports UVA (Unified Virtual Addressing).
nw = psb_cuda_WarpSize()
Returns the warp size.
nmp = psb_cuda_MultiProcessors()
Returns the number of multiprocessors in the CUDA device.
nt = psb_cuda_MaxThreadsPerMP()
Returns the maximum number of threads per multiprocessor.
nr = psb_cuda_MaxRegistersPerBlock()
Returns the maximum number of register per thread block.
cl = psb_cuda_MemoryClockRate()
Returns the memory clock rate in KHz, as an integer.
nb = psb_cuda_MemoryBusWidth()
Returns the memory bus width in bits.
bw = psb_cuda_MemoryPeakBandwidth()
Returns the peak memory bandwidth in MB/s (real double precision).