|
|
@ -16,7 +16,7 @@ href="userhtmlse11.html#tailuserhtmlse11.html" >prev-tail</a>] [<a
|
|
|
|
href="userhtmlse9.html#tailuserhtmlse12.html">tail</a>] [<a
|
|
|
|
href="userhtmlse9.html#tailuserhtmlse12.html">tail</a>] [<a
|
|
|
|
href="userhtml.html# " >up</a>] </p></div>
|
|
|
|
href="userhtml.html# " >up</a>] </p></div>
|
|
|
|
<h3 class="sectionHead"><span class="titlemark">12 </span> <a
|
|
|
|
<h3 class="sectionHead"><span class="titlemark">12 </span> <a
|
|
|
|
id="x19-14400012"></a>Extensions</h3>
|
|
|
|
id="x19-14500012"></a>Extensions</h3>
|
|
|
|
<!--l. 3--><p class="noindent" >The EXT, CUDA and RSB subdirectories contains a set of extensions to the base
|
|
|
|
<!--l. 3--><p class="noindent" >The EXT, CUDA and RSB subdirectories contains a set of extensions to the base
|
|
|
|
library. The extensions provide additional storage formats beyond the ones already
|
|
|
|
library. The extensions provide additional storage formats beyond the ones already
|
|
|
|
contained in the base library, as well as interfaces to:
|
|
|
|
contained in the base library, as well as interfaces to:
|
|
|
@ -49,7 +49,7 @@ in <span class="cite">[<a
|
|
|
|
href="userhtmlli2.html#XOurTechRep">22</a>]</span>.
|
|
|
|
href="userhtmlli2.html#XOurTechRep">22</a>]</span>.
|
|
|
|
<!--l. 19--><p class="noindent" >
|
|
|
|
<!--l. 19--><p class="noindent" >
|
|
|
|
<h4 class="subsectionHead"><span class="titlemark">12.1 </span> <a
|
|
|
|
<h4 class="subsectionHead"><span class="titlemark">12.1 </span> <a
|
|
|
|
id="x19-14500012.1"></a>Using the extensions</h4>
|
|
|
|
id="x19-14600012.1"></a>Using the extensions</h4>
|
|
|
|
<!--l. 21--><p class="noindent" >A sample application using the PSBLAS extensions will contain the following
|
|
|
|
<!--l. 21--><p class="noindent" >A sample application using the PSBLAS extensions will contain the following
|
|
|
|
steps:
|
|
|
|
steps:
|
|
|
|
<ul class="itemize1">
|
|
|
|
<ul class="itemize1">
|
|
|
@ -142,7 +142,7 @@ speed of the sparse matrix-vector product with the various data structures inclu
|
|
|
|
in the library.
|
|
|
|
in the library.
|
|
|
|
<!--l. 146--><p class="noindent" >
|
|
|
|
<!--l. 146--><p class="noindent" >
|
|
|
|
<h4 class="subsectionHead"><span class="titlemark">12.2 </span> <a
|
|
|
|
<h4 class="subsectionHead"><span class="titlemark">12.2 </span> <a
|
|
|
|
id="x19-14600012.2"></a>Extensions’ Data Structures</h4>
|
|
|
|
id="x19-14700012.2"></a>Extensions’ Data Structures</h4>
|
|
|
|
<!--l. 150--><p class="noindent" >Access to the facilities provided by the EXT library is mainly achieved through
|
|
|
|
<!--l. 150--><p class="noindent" >Access to the facilities provided by the EXT library is mainly achieved through
|
|
|
|
the data types that are provided within. The data classes are derived from
|
|
|
|
the data types that are provided within. The data classes are derived from
|
|
|
|
the base classes in PSBLAS, through the Fortran 2003 mechanism of <span
|
|
|
|
the base classes in PSBLAS, through the Fortran 2003 mechanism of <span
|
|
|
@ -153,20 +153,20 @@ href="userhtmlli2.html#XMRC:11">17</a>]</span>.
|
|
|
|
<!--l. 155--><p class="indent" > The data classes are divided between the general purpose CPU extensions, the
|
|
|
|
<!--l. 155--><p class="indent" > The data classes are divided between the general purpose CPU extensions, the
|
|
|
|
GPU interfaces and the RSB interfaces. In the description we will make use of the
|
|
|
|
GPU interfaces and the RSB interfaces. In the description we will make use of the
|
|
|
|
notation introduced in Table <a
|
|
|
|
notation introduced in Table <a
|
|
|
|
href="#x19-146001r21">21<!--tex4ht:ref: tab:notation --></a>.
|
|
|
|
href="#x19-147001r21">21<!--tex4ht:ref: tab:notation --></a>.
|
|
|
|
<div class="table">
|
|
|
|
<div class="table">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--l. 160--><p class="indent" > <a
|
|
|
|
<!--l. 160--><p class="indent" > <a
|
|
|
|
id="x19-146001r21"></a><hr class="float"><div class="float"
|
|
|
|
id="x19-147001r21"></a><hr class="float"><div class="float"
|
|
|
|
>
|
|
|
|
>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<div class="caption"
|
|
|
|
<div class="caption"
|
|
|
|
><span class="id">Table 21: </span><span
|
|
|
|
><span class="id">Table 21: </span><span
|
|
|
|
class="content">Notation for parameters describing a sparse matrix</span></div><!--tex4ht:label?: x19-146001r21 -->
|
|
|
|
class="content">Notation for parameters describing a sparse matrix</span></div><!--tex4ht:label?: x19-147001r21 -->
|
|
|
|
<div class="center"
|
|
|
|
<div class="center"
|
|
|
|
>
|
|
|
|
>
|
|
|
|
<!--l. 162--><p class="noindent" >
|
|
|
|
<!--l. 162--><p class="noindent" >
|
|
|
@ -274,7 +274,7 @@ class="td11"> </td></tr></table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-146002r5"></a>
|
|
|
|
id="x19-147002r5"></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -283,18 +283,18 @@ src="mat.png" alt="PIC"
|
|
|
|
width="147" height="147" >
|
|
|
|
width="147" height="147" >
|
|
|
|
<br /> <div class="caption"
|
|
|
|
<br /> <div class="caption"
|
|
|
|
><span class="id">Figure 5: </span><span
|
|
|
|
><span class="id">Figure 5: </span><span
|
|
|
|
class="content">Example of sparse matrix</span></div><!--tex4ht:label?: x19-146002r5 -->
|
|
|
|
class="content">Example of sparse matrix</span></div><!--tex4ht:label?: x19-147002r5 -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--l. 198--><p class="indent" > </div><hr class="endfigure">
|
|
|
|
<!--l. 198--><p class="indent" > </div><hr class="endfigure">
|
|
|
|
<h4 class="subsectionHead"><span class="titlemark">12.3 </span> <a
|
|
|
|
<h4 class="subsectionHead"><span class="titlemark">12.3 </span> <a
|
|
|
|
id="x19-14700012.3"></a>CPU-class extensions</h4>
|
|
|
|
id="x19-14800012.3"></a>CPU-class extensions</h4>
|
|
|
|
<!--l. 203--><p class="noindent" >
|
|
|
|
<!--l. 203--><p class="noindent" >
|
|
|
|
<h5 class="likesubsubsectionHead"><a
|
|
|
|
<h5 class="likesubsubsectionHead"><a
|
|
|
|
id="x19-148000"></a>ELLPACK</h5>
|
|
|
|
id="x19-149000"></a>ELLPACK</h5>
|
|
|
|
<!--l. 205--><p class="noindent" >The ELLPACK/ITPACK format (shown in Figure <a
|
|
|
|
<!--l. 205--><p class="noindent" >The ELLPACK/ITPACK format (shown in Figure <a
|
|
|
|
href="#x19-148001r6">6<!--tex4ht:ref: fig:ell --></a>) comprises two 2-dimensional
|
|
|
|
href="#x19-149001r6">6<!--tex4ht:ref: fig:ell --></a>) comprises two 2-dimensional
|
|
|
|
arrays <span class="obeylines-h"><span class="verb"><span
|
|
|
|
arrays <span class="obeylines-h"><span class="verb"><span
|
|
|
|
class="cmtt-10">AS</span></span></span> and <span class="obeylines-h"><span class="verb"><span
|
|
|
|
class="cmtt-10">AS</span></span></span> and <span class="obeylines-h"><span class="verb"><span
|
|
|
|
class="cmtt-10">JA</span></span></span> with <span class="obeylines-h"><span class="verb"><span
|
|
|
|
class="cmtt-10">JA</span></span></span> with <span class="obeylines-h"><span class="verb"><span
|
|
|
@ -315,7 +315,7 @@ row.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-148001r6"></a>
|
|
|
|
id="x19-149001r6"></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -325,13 +325,13 @@ width="233" height="233" >
|
|
|
|
<br /> <div class="caption"
|
|
|
|
<br /> <div class="caption"
|
|
|
|
><span class="id">Figure 6: </span><span
|
|
|
|
><span class="id">Figure 6: </span><span
|
|
|
|
class="content">ELLPACK compression of matrix in Figure <a
|
|
|
|
class="content">ELLPACK compression of matrix in Figure <a
|
|
|
|
href="#x19-146002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:label?: x19-148001r6 -->
|
|
|
|
href="#x19-147002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:label?: x19-149001r6 -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--l. 225--><p class="indent" > </div><hr class="endfigure">
|
|
|
|
<!--l. 225--><p class="indent" > </div><hr class="endfigure">
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-148002r1"></a>
|
|
|
|
id="x19-149002r1"></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -341,8 +341,8 @@ href="#x19-146002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:l
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--l. 231-->
|
|
|
|
<!--l. 231-->
|
|
|
|
<pre class="lstlisting" id="listing-168"><span class="label"><a
|
|
|
|
<pre class="lstlisting" id="listing-169"><span class="label"><a
|
|
|
|
id="x19-148003r1"></a></span><span
|
|
|
|
id="x19-149003r1"></a></span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
@ -352,7 +352,7 @@ class="cmtt-9">i</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">=1,</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">=1,</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">n</span></span>
|
|
|
|
class="cmtt-9">n</span></span>
|
|
|
|
<span class="label"><a
|
|
|
|
<span class="label"><a
|
|
|
|
id="x19-148004r2"></a></span><span
|
|
|
|
id="x19-149004r2"></a></span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
@ -362,7 +362,7 @@ class="cmtt-9"> </span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">t</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">t</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">=0</span></span>
|
|
|
|
class="cmtt-9">=0</span></span>
|
|
|
|
<span class="label"><a
|
|
|
|
<span class="label"><a
|
|
|
|
id="x19-148005r3"></a></span><span
|
|
|
|
id="x19-149005r3"></a></span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
@ -374,7 +374,7 @@ class="cmtt-9">j</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">=1,</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">=1,</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">maxnzr</span></span>
|
|
|
|
class="cmtt-9">maxnzr</span></span>
|
|
|
|
<span class="label"><a
|
|
|
|
<span class="label"><a
|
|
|
|
id="x19-148006r4"></a></span><span
|
|
|
|
id="x19-149006r4"></a></span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
@ -403,7 +403,7 @@ class="cmtt-9">,</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">j</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">j</span></span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">))</span></span>
|
|
|
|
class="cmtt-9">))</span></span>
|
|
|
|
<span class="label"><a
|
|
|
|
<span class="label"><a
|
|
|
|
id="x19-148007r5"></a></span><span
|
|
|
|
id="x19-149007r5"></a></span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
@ -413,7 +413,7 @@ class="cmtt-9"> </span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">end</span></span><span style="color:#000000"> </span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">end</span></span><span style="color:#000000"> </span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">do</span></span>
|
|
|
|
class="cmtt-9">do</span></span>
|
|
|
|
<span class="label"><a
|
|
|
|
<span class="label"><a
|
|
|
|
id="x19-148008r6"></a></span><span
|
|
|
|
id="x19-149008r6"></a></span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
@ -427,7 +427,7 @@ class="cmtt-9">)</span></span><span style="color:#000000"> </span><span style="c
|
|
|
|
class="cmtt-9">=</span></span><span style="color:#000000"> </span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">=</span></span><span style="color:#000000"> </span><span style="color:#000000"><span
|
|
|
|
class="cmtt-9">t</span></span>
|
|
|
|
class="cmtt-9">t</span></span>
|
|
|
|
<span class="label"><a
|
|
|
|
<span class="label"><a
|
|
|
|
id="x19-148009r7"></a></span><span
|
|
|
|
id="x19-149009r7"></a></span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
|
class="cmtt-9"> </span><span
|
|
|
@ -436,9 +436,9 @@ class="cmtt-9">end</span></span><span style="color:#000000"> </span><span style=
|
|
|
|
class="cmtt-9">do</span></span></pre>
|
|
|
|
class="cmtt-9">do</span></span></pre>
|
|
|
|
|
|
|
|
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-148010r1"></a>
|
|
|
|
id="x19-149010r1"></a>
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-148011"></a>
|
|
|
|
id="x19-149011"></a>
|
|
|
|
<span
|
|
|
|
<span
|
|
|
|
class="cmbx-10">Algorithm</span><span
|
|
|
|
class="cmbx-10">Algorithm</span><span
|
|
|
|
class="cmbx-10"> 1:</span>  Matrix-Vector product in ELL format
|
|
|
|
class="cmbx-10"> 1:</span>  Matrix-Vector product in ELL format
|
|
|
@ -450,7 +450,7 @@ class="cmbx-10"> 1:</span>  Matrix-Vector product in ELL format
|
|
|
|
class="cmmi-10">y </span>= <span
|
|
|
|
class="cmmi-10">y </span>= <span
|
|
|
|
class="cmmi-10">Ax </span>can be computed with the code shown in
|
|
|
|
class="cmmi-10">Ax </span>can be computed with the code shown in
|
|
|
|
Alg. <a
|
|
|
|
Alg. <a
|
|
|
|
href="#x19-148010r1">1<!--tex4ht:ref: alg:ell --></a>; it costs one memory write per outer iteration, plus three memory reads and
|
|
|
|
href="#x19-149010r1">1<!--tex4ht:ref: alg:ell --></a>; it costs one memory write per outer iteration, plus three memory reads and
|
|
|
|
two floating-point operations per inner iteration.
|
|
|
|
two floating-point operations per inner iteration.
|
|
|
|
<!--l. 247--><p class="indent" > Unless all rows have exactly the same number of nonzeros, some of the coefficients
|
|
|
|
<!--l. 247--><p class="indent" > Unless all rows have exactly the same number of nonzeros, some of the coefficients
|
|
|
|
in the <span class="obeylines-h"><span class="verb"><span
|
|
|
|
in the <span class="obeylines-h"><span class="verb"><span
|
|
|
@ -459,12 +459,12 @@ in terms of memory space and redundant operations (multiplications by zero). The
|
|
|
|
overhead can be acceptable if:
|
|
|
|
overhead can be acceptable if:
|
|
|
|
<ol class="enumerate1" >
|
|
|
|
<ol class="enumerate1" >
|
|
|
|
<li
|
|
|
|
<li
|
|
|
|
class="enumerate" id="x19-148013x1">
|
|
|
|
class="enumerate" id="x19-149013x1">
|
|
|
|
<!--l. 253--><p class="noindent" >The maximum number of nonzeros per row is not much larger than the
|
|
|
|
<!--l. 253--><p class="noindent" >The maximum number of nonzeros per row is not much larger than the
|
|
|
|
average;
|
|
|
|
average;
|
|
|
|
</li>
|
|
|
|
</li>
|
|
|
|
<li
|
|
|
|
<li
|
|
|
|
class="enumerate" id="x19-148015x2">
|
|
|
|
class="enumerate" id="x19-149015x2">
|
|
|
|
<!--l. 255--><p class="noindent" >The regularity of the data structure allows for faster code, e.g. by allowing
|
|
|
|
<!--l. 255--><p class="noindent" >The regularity of the data structure allows for faster code, e.g. by allowing
|
|
|
|
vectorization, thereby offsetting the additional storage requirements.</li></ol>
|
|
|
|
vectorization, thereby offsetting the additional storage requirements.</li></ol>
|
|
|
|
<!--l. 259--><p class="noindent" >In the extreme case where the input matrix has one full row, the ELLPACK
|
|
|
|
<!--l. 259--><p class="noindent" >In the extreme case where the input matrix has one full row, the ELLPACK
|
|
|
@ -492,7 +492,7 @@ class="cmtt-10">psb_T_ell_sparse_mat</span></span></span>:
|
|
|
|
</pre>
|
|
|
|
</pre>
|
|
|
|
<!--l. 295--><p class="nopar" > </div></div>
|
|
|
|
<!--l. 295--><p class="nopar" > </div></div>
|
|
|
|
<h5 class="likesubsubsectionHead"><a
|
|
|
|
<h5 class="likesubsubsectionHead"><a
|
|
|
|
id="x19-149000"></a>Hacked ELLPACK</h5>
|
|
|
|
id="x19-150000"></a>Hacked ELLPACK</h5>
|
|
|
|
<!--l. 303--><p class="noindent" >The <span
|
|
|
|
<!--l. 303--><p class="noindent" >The <span
|
|
|
|
class="cmti-10">hacked ELLPACK </span>(<span
|
|
|
|
class="cmti-10">hacked ELLPACK </span>(<span
|
|
|
|
class="cmbx-10">HLL</span>) format alleviates the main problem of the ELLPACK
|
|
|
|
class="cmbx-10">HLL</span>) format alleviates the main problem of the ELLPACK
|
|
|
@ -558,7 +558,7 @@ format.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-149001r7"></a>
|
|
|
|
id="x19-150001r7"></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -568,7 +568,7 @@ width="248" height="248" >
|
|
|
|
<br /> <div class="caption"
|
|
|
|
<br /> <div class="caption"
|
|
|
|
><span class="id">Figure 7: </span><span
|
|
|
|
><span class="id">Figure 7: </span><span
|
|
|
|
class="content">Hacked ELLPACK compression of matrix in Figure <a
|
|
|
|
class="content">Hacked ELLPACK compression of matrix in Figure <a
|
|
|
|
href="#x19-146002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:label?: x19-149001r7 -->
|
|
|
|
href="#x19-147002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:label?: x19-150001r7 -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -595,9 +595,9 @@ class="cmtt-10">psb_T_hll_sparse_mat</span></span></span>:
|
|
|
|
</pre>
|
|
|
|
</pre>
|
|
|
|
<!--l. 388--><p class="nopar" > </div></div>
|
|
|
|
<!--l. 388--><p class="nopar" > </div></div>
|
|
|
|
<h5 class="likesubsubsectionHead"><a
|
|
|
|
<h5 class="likesubsubsectionHead"><a
|
|
|
|
id="x19-150000"></a>Diagonal storage</h5>
|
|
|
|
id="x19-151000"></a>Diagonal storage</h5>
|
|
|
|
<!--l. 396--><p class="noindent" >The DIAgonal (DIA) format (shown in Figure <a
|
|
|
|
<!--l. 396--><p class="noindent" >The DIAgonal (DIA) format (shown in Figure <a
|
|
|
|
href="#x19-150001r8">8<!--tex4ht:ref: fig:dia --></a>) has a 2-dimensional array <span class="obeylines-h"><span class="verb"><span
|
|
|
|
href="#x19-151001r8">8<!--tex4ht:ref: fig:dia --></a>) has a 2-dimensional array <span class="obeylines-h"><span class="verb"><span
|
|
|
|
class="cmtt-10">AS</span></span></span>
|
|
|
|
class="cmtt-10">AS</span></span></span>
|
|
|
|
containing in each column the coefficients along a diagonal of the matrix, and an
|
|
|
|
containing in each column the coefficients along a diagonal of the matrix, and an
|
|
|
|
integer array <span class="obeylines-h"><span class="verb"><span
|
|
|
|
integer array <span class="obeylines-h"><span class="verb"><span
|
|
|
@ -607,7 +607,7 @@ are padded with zeros as necessary.
|
|
|
|
<!--l. 402--><p class="indent" > The code to compute the matrix-vector product <span
|
|
|
|
<!--l. 402--><p class="indent" > The code to compute the matrix-vector product <span
|
|
|
|
class="cmmi-10">y </span>= <span
|
|
|
|
class="cmmi-10">y </span>= <span
|
|
|
|
class="cmmi-10">Ax </span>is shown in Alg. <a
|
|
|
|
class="cmmi-10">Ax </span>is shown in Alg. <a
|
|
|
|
href="#x19-150003r2">2<!--tex4ht:ref: alg:dia --></a>; it
|
|
|
|
href="#x19-151003r2">2<!--tex4ht:ref: alg:dia --></a>; it
|
|
|
|
costs one memory read per outer iteration, plus three memory reads, one memory
|
|
|
|
costs one memory read per outer iteration, plus three memory reads, one memory
|
|
|
|
write and two floating-point operations per inner iteration. The accesses to
|
|
|
|
write and two floating-point operations per inner iteration. The accesses to
|
|
|
|
<span class="obeylines-h"><span class="verb"><span
|
|
|
|
<span class="obeylines-h"><span class="verb"><span
|
|
|
@ -620,7 +620,7 @@ required.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-150001r8"></a>
|
|
|
|
id="x19-151001r8"></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -630,13 +630,13 @@ width="248" height="248" >
|
|
|
|
<br /> <div class="caption"
|
|
|
|
<br /> <div class="caption"
|
|
|
|
><span class="id">Figure 8: </span><span
|
|
|
|
><span class="id">Figure 8: </span><span
|
|
|
|
class="content">DIA compression of matrix in Figure <a
|
|
|
|
class="content">DIA compression of matrix in Figure <a
|
|
|
|
href="#x19-146002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:label?: x19-150001r8 -->
|
|
|
|
href="#x19-147002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:label?: x19-151001r8 -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--l. 419--><p class="indent" > </div><hr class="endfigure">
|
|
|
|
<!--l. 419--><p class="indent" > </div><hr class="endfigure">
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-150002r2"></a>
|
|
|
|
id="x19-151002r2"></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -662,9 +662,9 @@ href="#x19-146002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:l
|
|
|
|
</pre>
|
|
|
|
</pre>
|
|
|
|
<!--l. 450--><p class="nopar" > </div></div>
|
|
|
|
<!--l. 450--><p class="nopar" > </div></div>
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-150003r2"></a>
|
|
|
|
id="x19-151003r2"></a>
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-150004"></a>
|
|
|
|
id="x19-151004"></a>
|
|
|
|
<span
|
|
|
|
<span
|
|
|
|
class="cmbx-10">Algorithm</span><span
|
|
|
|
class="cmbx-10">Algorithm</span><span
|
|
|
|
class="cmbx-10"> 2:</span>  Matrix-Vector product in DIA format
|
|
|
|
class="cmbx-10"> 2:</span>  Matrix-Vector product in DIA format
|
|
|
@ -691,7 +691,7 @@ class="cmtt-10">psb_T_dia_sparse_mat</span></span></span>:
|
|
|
|
</pre>
|
|
|
|
</pre>
|
|
|
|
<!--l. 486--><p class="nopar" > </div></div>
|
|
|
|
<!--l. 486--><p class="nopar" > </div></div>
|
|
|
|
<h5 class="likesubsubsectionHead"><a
|
|
|
|
<h5 class="likesubsubsectionHead"><a
|
|
|
|
id="x19-151000"></a>Hacked DIA</h5>
|
|
|
|
id="x19-152000"></a>Hacked DIA</h5>
|
|
|
|
<!--l. 495--><p class="noindent" >Storage by DIAgonals is an attractive option for matrices whose coefficients are
|
|
|
|
<!--l. 495--><p class="noindent" >Storage by DIAgonals is an attractive option for matrices whose coefficients are
|
|
|
|
located on a small set of diagonals, since they do away with storing explicitly the
|
|
|
|
located on a small set of diagonals, since they do away with storing explicitly the
|
|
|
|
indices and therefore reduce significantly memory traffic. However, having a few
|
|
|
|
indices and therefore reduce significantly memory traffic. However, having a few
|
|
|
@ -738,7 +738,7 @@ class="cmti-10">hackOffsets[k]</span>.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<a
|
|
|
|
<a
|
|
|
|
id="x19-151001r9"></a>
|
|
|
|
id="x19-152001r9"></a>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -748,7 +748,7 @@ width="248" height="248" >
|
|
|
|
<br /> <div class="caption"
|
|
|
|
<br /> <div class="caption"
|
|
|
|
><span class="id">Figure 9: </span><span
|
|
|
|
><span class="id">Figure 9: </span><span
|
|
|
|
class="content">Hacked DIA compression of matrix in Figure <a
|
|
|
|
class="content">Hacked DIA compression of matrix in Figure <a
|
|
|
|
href="#x19-146002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:label?: x19-151001r9 -->
|
|
|
|
href="#x19-147002r5">5<!--tex4ht:ref: fig:dense --></a></span></div><!--tex4ht:label?: x19-152001r9 -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -793,7 +793,7 @@ class="cmtt-10">psb_T_hdia_sparse_mat</span></span></span>:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<h4 class="subsectionHead"><span class="titlemark">12.4 </span> <a
|
|
|
|
<h4 class="subsectionHead"><span class="titlemark">12.4 </span> <a
|
|
|
|
id="x19-15200012.4"></a>CUDA-class extensions</h4>
|
|
|
|
id="x19-15300012.4"></a>CUDA-class extensions</h4>
|
|
|
|
<!--l. 4--><p class="noindent" >For computing with CUDA we define a dual memorization strategy in which each
|
|
|
|
<!--l. 4--><p class="noindent" >For computing with CUDA we define a dual memorization strategy in which each
|
|
|
|
variable on the CPU (“host”) side has a GPU (“device”) side. When a GPU-type
|
|
|
|
variable on the CPU (“host”) side has a GPU (“device”) side. When a GPU-type
|
|
|
|
variable is initialized, the data contained is (usually) the same on both sides. Each
|
|
|
|
variable is initialized, the data contained is (usually) the same on both sides. Each
|
|
|
|