You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
psblas3/docs/html/userhtmlse13.html

300 lines
12 KiB
HTML

6 months ago
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html >
<head><title>CUDA Environment Routines</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="generator" content="TeX4ht (https://tug.org/tex4ht/)">
<meta name="originator" content="TeX4ht (https://tug.org/tex4ht/)">
<!-- html,3 -->
<meta name="src" content="userhtml.tex">
<link rel="stylesheet" type="text/css" href="userhtml.css">
</head><body
>
<!--l. 87--><div class="crosslinks"><p class="noindent">[<a
href="userhtmlse12.html" >prev</a>] [<a
href="userhtmlse12.html#tailuserhtmlse12.html" >prev-tail</a>] [<a
href="userhtmlse10.html#tailuserhtmlse13.html">tail</a>] [<a
href="userhtml.html# " >up</a>] </p></div>
<h3 class="sectionHead"><span class="titlemark">13 </span> <a
id="x20-15300013"></a>CUDA Environment Routines</h3>
<!--l. 91--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-154000"></a>psb_cuda_init &#8212; Initializes PSBLAS-CUDA environment</h4>
<a
id="Q1-20-191"></a>
<div class="center"
>
<!--l. 99--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-110">
call&#x00A0;psb_cuda_init(ctxt&#x00A0;[,&#x00A0;device])
</pre>
<!--l. 103--><p class="nopar" > </div></div>
<!--l. 108--><p class="noindent" >This subroutine initializes the PSBLAS-CUDA environment.
<dl class="description"><dt class="description">
<!--l. 110--><p class="noindent" >
<span
class="cmbx-10">Type:</span> </dt><dd
class="description">
<!--l. 110--><p class="noindent" >Synchronous.
</dd><dt class="description">
<!--l. 111--><p class="noindent" >
<span
class="cmbx-10">On Entry</span> </dt><dd
class="description">
<!--l. 111--><p class="noindent" >
</dd><dt class="description">
<!--l. 112--><p class="noindent" >
<span
class="cmbx-10">device</span> </dt><dd
class="description">
<!--l. 112--><p class="noindent" >ID of CUDA device to attach to.<br
class="newline" />Scope: <span
class="cmbx-10">local</span>.<br
class="newline" />Type: <span
class="cmbx-10">optional</span>.<br
class="newline" />Intent: <span
class="cmbx-10">in</span>.<br
class="newline" />Specified as: an integer value. &#x00A0;Default: use <code class="lstinline"><span style="color:#000000">mod</span><span style="color:#000000">(</span><span style="color:#000000">iam</span><span style="color:#000000">,</span><span style="color:#000000">ngpu</span><span style="color:#000000">)</span></code> where <code class="lstinline"><span style="color:#000000">iam</span></code> is
the calling process index and <code class="lstinline"><span style="color:#000000">ngpu</span></code> is the total number of CUDA devices
available on the current node.</dd></dl>
<!--l. 123--><p class="noindent" ><span
class="cmbx-12">Notes</span>
<ol class="enumerate1" >
<li
class="enumerate" id="x20-154002x1">
<!--l. 125--><p class="noindent" >A call to this routine must precede any other PSBLAS-CUDA call.</li></ol>
<!--l. 129--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-155000"></a>psb_cuda_exit &#8212; Exit from PSBLAS-CUDA environment</h4>
<a
id="Q1-20-193"></a>
<div class="center"
>
<!--l. 137--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-111">
call&#x00A0;psb_cuda_exit(ctxt)
</pre>
<!--l. 141--><p class="nopar" > </div></div>
<!--l. 146--><p class="noindent" >This subroutine exits from the PSBLAS CUDA context.
<dl class="description"><dt class="description">
<!--l. 148--><p class="noindent" >
<span
class="cmbx-10">Type:</span> </dt><dd
class="description">
<!--l. 148--><p class="noindent" >Synchronous.
</dd><dt class="description">
<!--l. 149--><p class="noindent" >
<span
class="cmbx-10">On Entry</span> </dt><dd
class="description">
<!--l. 149--><p class="noindent" >
</dd><dt class="description">
<!--l. 150--><p class="noindent" >
<span
class="cmbx-10">ctxt</span> </dt><dd
class="description">
<!--l. 150--><p class="noindent" >the communication context identifying the virtual parallel machine.<br
class="newline" />Scope: <span
class="cmbx-10">global</span>.<br
class="newline" />Type: <span
class="cmbx-10">required</span>.<br
class="newline" />Intent: <span
class="cmbx-10">in</span>.<br
class="newline" />Specified as: an integer variable.</dd></dl>
<!--l. 161--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-156000"></a>psb_cuda_DeviceSync &#8212; Synchronize CUDA device</h4>
<a
id="Q1-20-195"></a>
<div class="center"
>
<!--l. 169--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-112">
call&#x00A0;psb_cuda_DeviceSync()
</pre>
<!--l. 173--><p class="nopar" > </div></div>
<!--l. 178--><p class="noindent" >This subroutine ensures that all previosly invoked kernels, i.e. all invocation of
CUDA-side code, have completed.
<!--l. 182--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-157000"></a>psb_cuda_getDeviceCount </h4>
<a
id="Q1-20-197"></a>
<div class="center"
>
<!--l. 190--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-113">
ngpus&#x00A0;=&#x00A0;&#x00A0;psb_cuda_getDeviceCount()
</pre>
<!--l. 194--><p class="nopar" > </div></div>
<!--l. 199--><p class="noindent" >Get number of devices available on current computing node.
<!--l. 201--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-158000"></a>psb_cuda_getDevice </h4>
<a
id="Q1-20-199"></a>
<div class="center"
>
<!--l. 209--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-114">
ngpus&#x00A0;=&#x00A0;&#x00A0;psb_cuda_getDevice()
</pre>
<!--l. 213--><p class="nopar" > </div></div>
<!--l. 218--><p class="noindent" >Get device in use by current process.
<!--l. 220--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-159000"></a>psb_cuda_setDevice </h4>
<a
id="Q1-20-201"></a>
<div class="center"
>
<!--l. 228--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-115">
info&#x00A0;=&#x00A0;psb_cuda_setDevice(dev)
</pre>
<!--l. 232--><p class="nopar" > </div></div>
<!--l. 237--><p class="noindent" >Set device to be used by current process.
<!--l. 239--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-160000"></a>psb_cuda_DeviceHasUVA </h4>
<a
id="Q1-20-203"></a>
<div class="center"
>
<!--l. 247--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-116">
hasUva&#x00A0;=&#x00A0;psb_cuda_DeviceHasUVA()
</pre>
<!--l. 251--><p class="nopar" > </div></div>
<!--l. 256--><p class="noindent" >Returns true if device currently in use supports UVA (Unified Virtual Addressing).
<!--l. 259--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-161000"></a>psb_cuda_WarpSize </h4>
<a
id="Q1-20-205"></a>
<div class="center"
>
<!--l. 267--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-117">
nw&#x00A0;=&#x00A0;psb_cuda_WarpSize()
</pre>
<!--l. 271--><p class="nopar" > </div></div>
<!--l. 276--><p class="noindent" >Returns the warp size.
<!--l. 279--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-162000"></a>psb_cuda_MultiProcessors </h4>
<a
id="Q1-20-207"></a>
<div class="center"
>
<!--l. 287--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-118">
nmp&#x00A0;=&#x00A0;psb_cuda_MultiProcessors()
</pre>
<!--l. 291--><p class="nopar" > </div></div>
<!--l. 296--><p class="noindent" >Returns the number of multiprocessors in the CUDA device.
<!--l. 298--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-163000"></a>psb_cuda_MaxThreadsPerMP </h4>
<a
id="Q1-20-209"></a>
<div class="center"
>
<!--l. 306--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-119">
nt&#x00A0;=&#x00A0;psb_cuda_MaxThreadsPerMP()
</pre>
<!--l. 310--><p class="nopar" > </div></div>
<!--l. 315--><p class="noindent" >Returns the maximum number of threads per multiprocessor.
<!--l. 318--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-164000"></a>psb_cuda_MaxRegistersPerBlock </h4>
<a
id="Q1-20-211"></a>
<div class="center"
>
<!--l. 326--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-120">
nr&#x00A0;=&#x00A0;psb_cuda_MaxRegistersPerBlock()
</pre>
<!--l. 330--><p class="nopar" > </div></div>
<!--l. 335--><p class="noindent" >Returns the maximum number of register per thread block.
<!--l. 338--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-165000"></a>psb_cuda_MemoryClockRate </h4>
<a
id="Q1-20-213"></a>
<div class="center"
>
<!--l. 346--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-121">
cl&#x00A0;=&#x00A0;psb_cuda_MemoryClockRate()
</pre>
<!--l. 350--><p class="nopar" > </div></div>
<!--l. 355--><p class="noindent" >Returns the memory clock rate in KHz, as an integer.
<!--l. 357--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-166000"></a>psb_cuda_MemoryBusWidth </h4>
<a
id="Q1-20-215"></a>
<div class="center"
>
<!--l. 365--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-122">
nb&#x00A0;=&#x00A0;psb_cuda_MemoryBusWidth()
</pre>
<!--l. 369--><p class="nopar" > </div></div>
<!--l. 374--><p class="noindent" >Returns the memory bus width in bits.
<!--l. 376--><p class="noindent" >
<h4 class="likesubsectionHead"><a
id="x20-167000"></a>psb_cuda_MemoryPeakBandwidth </h4>
<a
id="Q1-20-217"></a>
<div class="center"
>
<!--l. 384--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-123">
bw&#x00A0;=&#x00A0;psb_cuda_MemoryPeakBandwidth()
</pre>
<!--l. 388--><p class="nopar" > </div></div>
<!--l. 392--><p class="noindent" >Returns the peak memory bandwidth in MB/s (real double precision).
<!--l. 126--><p class="indent" >
<!--l. 2--><div class="crosslinks"><p class="noindent">[<a
href="userhtmlse12.html" >prev</a>] [<a
href="userhtmlse12.html#tailuserhtmlse12.html" >prev-tail</a>] [<a
href="userhtmlse13.html" >front</a>] [<a
href="userhtml.html# " >up</a>] </p></div>
<!--l. 2--><p class="indent" > <a
id="tailuserhtmlse13.html"></a>
</body></html>