Merge branch 'development' of github.com:sfilippone/amg4psblas into development

savebcmatch
Salvatore Filippone 4 years ago
commit 4e177ce926

Binary file not shown.

@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR</span><br
class="newline" /> <span class="newline" /> <span
class="cmr-12">Software version: 1.0</span><br class="cmr-12">Software version: 1.0</span><br
class="newline" /><span class="newline" /><span
class="cmr-12">April 12, 2021</span> class="cmr-12">May 11th, 2021</span>

@ -149,6 +149,7 @@ div.abstract {width:100%;}
.Ovalbox-thick { padding-left:3pt; padding-right:3pt; border:solid thick; } .Ovalbox-thick { padding-left:3pt; padding-right:3pt; border:solid thick; }
.shadowbox { padding-left:3pt; padding-right:3pt; border:solid thin; border-right:solid thick; border-bottom:solid thick; } .shadowbox { padding-left:3pt; padding-right:3pt; border:solid thin; border-right:solid thick; border-bottom:solid thick; }
.doublebox { padding-left:3pt; padding-right:3pt; border-style:double; border:solid thick; } .doublebox { padding-left:3pt; padding-right:3pt; border-style:double; border:solid thick; }
.rotatebox{display: inline-block;}
.figure img.graphics {margin-left:10%;} .figure img.graphics {margin-left:10%;}
.lstlisting .label{margin-right:0.5em; } .lstlisting .label{margin-right:0.5em; }
div.lstlisting{font-family: monospace; white-space: nowrap; margin-top:0.5em; margin-bottom:0.5em; } div.lstlisting{font-family: monospace; white-space: nowrap; margin-top:0.5em; margin-bottom:0.5em; }

@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR</span><br
class="newline" /> <span class="newline" /> <span
class="cmr-12">Software version: 1.0</span><br class="cmr-12">Software version: 1.0</span><br
class="newline" /><span class="newline" /><span
class="cmr-12">April 12, 2021</span> class="cmr-12">May 11th, 2021</span>

@ -211,7 +211,7 @@ class="cmr-12">&#x00A0;Pothen, </span><span
class="cmti-12">Distributed-memory parallel algorithms for matching and</span> class="cmti-12">Distributed-memory parallel algorithms for matching and</span>
<span <span
class="cmti-12">coloring</span><span class="cmti-12">coloring</span><span
class="cmr-12">, in PCO11 New Trends in Parallel Computing and Optimization,</span> class="cmr-12">, in PCO&#8217;11 New Trends in Parallel Computing and Optimization,</span>
<span <span
class="cmr-12">IEEE International Symposium on Parallel and Distributed Processing</span> class="cmr-12">IEEE International Symposium on Parallel and Distributed Processing</span>
<span <span

@ -122,7 +122,7 @@ class="cmr-12">abide by its terms:</span>
&#x00A0;<br /> &#x00A0;<br />
</div> </div>
<!--l. 87--><p class="nopar" > <span <!--l. 87--><p class="nopar" > <span
class="cmr-12">AMG4PSBLAS is distributed together with (a small part) of the graph-matching</span> class="cmr-12">AMG4PSBLAS is distributed together with (a small part of) the graph-matching</span>
@ -133,7 +133,7 @@ class="cmr-12">[</span><a
href="userhtmlli5.html#XMatchBoxP"><span href="userhtmlli5.html#XMatchBoxP"><span
class="cmr-12">9</span></a><span class="cmr-12">9</span></a><span
class="cmr-12">]</span></span><span class="cmr-12">]</span></span><span
class="cmr-12">. Per the license requirements, we reproduce the relative part</span> class="cmr-12">. Per the license requirements, we reproduce the relevant part</span>
<span <span
class="cmr-12">here.</span> class="cmr-12">here.</span>

@ -91,7 +91,7 @@ class="cmr-12">Trolling, insulting or derogatory comments, and personal or polit
class="cmr-12">Public or private harassment</span> class="cmr-12">Public or private harassment</span>
</li> </li>
<li class="itemize"><span <li class="itemize"><span
class="cmr-12">Publishing others private information, such as a physical or email address,</span> class="cmr-12">Publishing others&#8217; private information, such as a physical or email address,</span>
<span <span
class="cmr-12">without their explicit permission</span> class="cmr-12">without their explicit permission</span>
</li> </li>
@ -234,7 +234,7 @@ class="cmr-12">_of</span><span
class="cmr-12">_conduct</span> class="cmr-12">_conduct</span>
<span <span
class="cmr-12">.html</span></a><span class="cmr-12">.html</span></a><span
class="cmr-12">. Community Impact Guidelines were inspired by Mozillas code of conduct</span> class="cmr-12">. Community Impact Guidelines were inspired by Mozilla&#8217;s code of conduct</span>
<span <span
class="cmr-12">enforcement ladder. For answers to common questions about this code of conduct, see</span> class="cmr-12">enforcement ladder. For answers to common questions about this code of conduct, see</span>
<span <span

@ -4227,9 +4227,9 @@ class="cmtt-12">github</span><span
class="cmtt-12">.</span><span class="cmtt-12">.</span><span
class="cmtt-12">com</span><span class="cmtt-12">com</span><span
class="cmtt-12">/</span><span class="cmtt-12">/</span><span
class="cmtt-12">sfilippone</span><span class="cmtt-12">psctoolkit</span><span
class="cmtt-12">/</span><span class="cmtt-12">/</span><span
class="cmtt-12">amg4psblas</span><span class="cmtt-12">psctoolkit</span><span
class="cmtt-12">/</span><span class="cmtt-12">/</span><span
class="cmtt-12">issues</span><span class="cmtt-12">issues</span><span
class="cmtt-12">&#x003E;.</span> class="cmtt-12">&#x003E;.</span>

@ -36,8 +36,8 @@ class="cmr-12">If you find any bugs in our codes, please report them through our
<span <span
class="cmr-12">on</span><br class="cmr-12">on</span><br
class="newline" /> <a class="newline" /> <a
href="https://github.com/psctoolkit/amg4psblas/issues" class="url" ><span href="https://github.com/psctoolkit/psctoolkit/issues" class="url" ><span
class="cmtt-12">https://github.com/psctoolkit/amg4psblas/issues</span></a><br class="cmtt-12">https://github.com/psctoolkit/psctoolkit/issues</span></a><br
class="newline" /> class="newline" />
<!--l. 195--><p class="indent" > <span <!--l. 195--><p class="indent" > <span
class="cmr-12">To enable us to track the bug, please provide a log from the failing application, the</span> class="cmr-12">To enable us to track the bug, please provide a log from the failing application, the</span>

@ -29,40 +29,41 @@ class="cmr-12">3.5 </span></span> <a
id="x13-120003.5"></a><span id="x13-120003.5"></a><span
class="cmr-12">Example and test programs</span></h4> class="cmr-12">Example and test programs</span></h4>
<!--l. 200--><p class="noindent" ><span <!--l. 200--><p class="noindent" ><span
class="cmr-12">The package contains the </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">The package contains a </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">examples</span></span></span> <span class="cmtt-12">samples</span></span></span> <span
class="cmr-12">and </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">directory, divided in two subdirs </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">tests</span></span></span> <span class="cmtt-12">simple</span></span></span> <span
class="cmr-12">directories; both of them are</span> class="cmr-12">and</span>
<span <span class="obeylines-h"><span class="verb"><span
class="cmr-12">further divided into </span><span class="obeylines-h"><span class="verb"><span class="cmtt-12">advanced</span></span></span><span
class="cmr-12">; both of them are further divided into </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">fileread</span></span></span> <span class="cmtt-12">fileread</span></span></span> <span
class="cmr-12">and </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">and </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">pdegen</span></span></span> <span class="cmtt-12">pdegen</span></span></span> <span
class="cmr-12">subdirectories. Their purpose is as</span> class="cmr-12">subdirectories.</span>
<span <span
class="cmr-12">follows:</span> class="cmr-12">Their purpose is as follows:</span>
<dl class="description"><dt class="description"> <dl class="description"><dt class="description">
<span <span
class="cmtt-12">examples</span> </dt><dd class="cmtt-12">simple</span> </dt><dd
class="description"><span class="description"><span
class="cmr-12">contains a set of simple example programs with a predefined choice</span> class="cmr-12">contains a set of simple example programs with a predefined choice of</span>
<span <span
class="cmr-12">of preconditioners, selectable via integer values. These are intended to get</span> class="cmr-12">preconditioners, selectable via integer values. These are intended to get</span>
<span <span
class="cmr-12">acquainted with the multilevel preconditioners available in AMG4PSBLAS.</span> class="cmr-12">acquainted with the multilevel preconditioners available in AMG4PSBLAS.</span>
</dd><dt class="description"> </dd><dt class="description">
<span <span
class="cmtt-12">tests</span> </dt><dd class="cmtt-12">advanced</span> </dt><dd
class="description"><span class="description"><span
class="cmr-12">contains a set of more sophisticated examples that will allow the user, via</span> class="cmr-12">contains a set of more sophisticated examples that will allow the user,</span>
<span <span
class="cmr-12">the input files in the </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">via the input files in the </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">runs</span></span></span> <span class="cmtt-12">runs</span></span></span> <span
class="cmr-12">subdirectories, to experiment with the full range</span> class="cmr-12">subdirectories, to experiment with the full</span>
<span <span
class="cmr-12">of preconditioners implemented in the package.</span></dd></dl> class="cmr-12">range of preconditioners implemented in the package.</span></dd></dl>
<!--l. 213--><p class="noindent" ><span <!--l. 214--><p class="noindent" ><span
class="cmr-12">The </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">The </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">fileread</span></span></span> <span class="cmtt-12">fileread</span></span></span> <span
class="cmr-12">directories contain sample programs that read sparse matrices from files,</span> class="cmr-12">directories contain sample programs that read sparse matrices from files,</span>

@ -71,38 +71,39 @@ class="cmr-12">The part of the code dealing with reading and assembling the spar
<span <span
class="cmr-12">right-hand side vector and the deallocation of the relevant data structures, performed</span> class="cmr-12">right-hand side vector and the deallocation of the relevant data structures, performed</span>
<span <span
class="cmr-12">through the PSBLAS routines for sparse matrix and vector management, is not</span> class="cmr-12">through the PSBLAS routines for sparse matrix and vector management,</span>
<span <span
class="cmr-12">reported here for the sake of conciseness. The complete code can be found in the</span> class="cmr-12">is not reported here for the sake of conciseness. The complete code can be</span>
<span <span
class="cmr-12">example program file </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">found in the example program file </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">amg_dexample_ml.f90</span></span></span><span class="cmtt-12">amg_dexample_ml.f90</span></span></span><span
class="cmr-12">, in the directory </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">, in the directory</span>
class="cmtt-12">examples/fileread</span></span></span> <span <span class="obeylines-h"><span class="verb"><span
class="cmr-12">of</span> class="cmtt-12">samples/simple/file</span></span></span><span class="obeylines-h"><span class="verb"><span
<span class="cmtt-12">read</span></span></span> <span
class="cmr-12">the AMG4PSBLAS implementation (see Section</span><span class="cmr-12">of the AMG4PSBLAS implementation (see Section</span><span
class="cmr-12">&#x00A0;</span><a class="cmr-12">&#x00A0;</span><a
href="userhtmlsu5.html#x13-120003.5"><span href="userhtmlsu5.html#x13-120003.5"><span
class="cmr-12">3.5</span><!--tex4ht:ref: sec:ex_and_test --></a><span class="cmr-12">3.5</span><!--tex4ht:ref: sec:ex_and_test --></a><span
class="cmr-12">). A sample test problem along</span> class="cmr-12">). A</span>
<span <span
class="cmr-12">with the relevant input data is available in </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">sample test problem along with the relevant input data is available in</span>
class="cmtt-12">examples/fileread/runs</span></span></span><span <span class="obeylines-h"><span class="verb"><span
class="cmr-12">. For details on</span> class="cmtt-12">samples/simple/fileread/runs</span></span></span><span
class="cmr-12">. For details on the use of the PSBLAS routines, see</span>
<span <span
class="cmr-12">the use of the PSBLAS routines, see the PSBLAS User&#8217;s Guide</span><span class="cmr-12">the PSBLAS User&#8217;s Guide</span><span
class="cmr-12">&#x00A0;</span><span class="cite"><span class="cmr-12">&#x00A0;</span><span class="cite"><span
class="cmr-12">[</span><a class="cmr-12">[</span><a
href="userhtmlli5.html#XPSBLASGUIDE"><span href="userhtmlli5.html#XPSBLASGUIDE"><span
class="cmr-12">20</span></a><span class="cmr-12">20</span></a><span
class="cmr-12">]</span></span><span class="cmr-12">]</span></span><span
class="cmr-12">.</span> class="cmr-12">.</span>
<!--l. 138--><p class="indent" > <span
class="cmr-12">The setup and application of the default multilevel preconditioner for the real single</span>
<!--l. 138--><p class="indent" > <span
class="cmr-12">The setup and application of the default multilevel preconditioner for the real single</span>
<span <span
class="cmr-12">precision and the complex, single and double precision, versions are obtained</span> class="cmr-12">precision and the complex, single and double precision, versions are obtained</span>
<span <span
@ -114,7 +115,8 @@ class="cmr-12">for</span>
<span <span
class="cmr-12">details). If these versions are installed, the corresponding codes are available in</span> class="cmr-12">details). If these versions are installed, the corresponding codes are available in</span>
<span class="obeylines-h"><span class="verb"><span <span class="obeylines-h"><span class="verb"><span
class="cmtt-12">examples/fileread/</span></span></span><span class="cmtt-12">samples/simple/file</span></span></span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">read</span></span></span><span
class="cmr-12">.</span> class="cmr-12">.</span>
@ -278,7 +280,7 @@ class="cmr-12">For all the previous preconditioners, example programs where the
class="cmr-12">and the right-hand side are generated by discretizing a PDE with Dirichlet</span> class="cmr-12">and the right-hand side are generated by discretizing a PDE with Dirichlet</span>
<span <span
class="cmr-12">boundary conditions are also available in the directory </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">boundary conditions are also available in the directory </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">examples/pdegen</span></span></span><span class="cmtt-12">samples/simple/pdegen</span></span></span><span
class="cmr-12">.</span> class="cmr-12">.</span>

@ -145,7 +145,7 @@ class="cmr-12">GPU environment</span>
<div class="center" <div class="center"
> >
<!--l. 552--><p class="noindent" > <!--l. 553--><p class="noindent" >
<div class="minipage"><div class="verbatim" id="verbatim-12"> <div class="minipage"><div class="verbatim" id="verbatim-12">
&#x00A0;&#x00A0;call&#x00A0;desc_a%cnv(mold=igmold) &#x00A0;&#x00A0;call&#x00A0;desc_a%cnv(mold=igmold)
&#x00A0;<br />&#x00A0;&#x00A0;call&#x00A0;a%cscnv(info,mold=agmold) &#x00A0;<br />&#x00A0;&#x00A0;call&#x00A0;a%cscnv(info,mold=agmold)
@ -172,7 +172,7 @@ class="cmr-12">GPU environment</span>
&#x00A0;<br /> &#x00A0;<br />
&#x00A0;<br />&#x00A0; &#x00A0;<br />&#x00A0;
</div> </div>
<!--l. 579--><p class="nopar" ></div></div> <!--l. 580--><p class="nopar" ></div></div>
<br /> <div class="caption" <br /> <div class="caption"
><span class="id">Listing 7: </span><span ><span class="id">Listing 7: </span><span
class="content">setup of a GPU-enabled test program part three.</span></div><!--tex4ht:label?: x16-15003r7 --> class="content">setup of a GPU-enabled test program part three.</span></div><!--tex4ht:label?: x16-15003r7 -->
@ -180,25 +180,30 @@ class="content">setup of a GPU-enabled test program part three.</span></div><!--
</div><hr class="endfloat" /> </div><hr class="endfloat" />
<!--l. 587--><p class="indent" > <span <!--l. 588--><p class="indent" > <span
class="cmr-12">It is very important to employ solvers that are suited to the GPU, i.e. solvers that</span> class="cmr-12">It is very important to employ smoothers and coarsest solvers that are suited to the</span>
<span <span
class="cmr-12">do NOT employ triangular system solve kernels. Solvers that satisfy this constraint</span> class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that</span>
<span <span
class="cmr-12">include:</span> class="cmr-12">satisfy this constraint include:</span>
<ul class="itemize1"> <ul class="itemize1">
<li class="itemize"><span class="obeylines-h"><span class="verb"><span <li class="itemize"><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">JACOBI</span></span></span> class="cmtt-12">JACOBI</span></span></span>
</li> </li>
<li class="itemize"><span class="obeylines-h"><span class="verb"><span <li class="itemize"><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">BJAC</span></span></span> <span
class="cmr-12">with the following methods on the local blocks:</span>
<ul class="itemize2">
<li class="itemize"><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">INVK</span></span></span> class="cmtt-12">INVK</span></span></span>
</li> </li>
<li class="itemize"><span class="obeylines-h"><span class="verb"><span <li class="itemize"><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">INVT</span></span></span> class="cmtt-12">INVT</span></span></span>
</li> </li>
<li class="itemize"><span class="obeylines-h"><span class="verb"><span <li class="itemize"><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">AINV</span></span></span></li></ul> class="cmtt-12">AINV</span></span></span></li></ul>
<!--l. 596--><p class="noindent" ><span </li></ul>
<!--l. 600--><p class="noindent" ><span
class="cmr-12">and their </span><span class="cmr-12">and their </span><span
class="cmmi-12">&#x2113;</span><sub><span class="cmmi-12">&#x2113;</span><sub><span
class="cmr-8">1</span></sub> <span class="cmr-8">1</span></sub> <span

@ -325,7 +325,7 @@ class="td11"><!--l. 124--><p class="noindent" ><span
class="cmr-12">An auxiliary input argument that can be passed to the underlying</span> class="cmr-12">An auxiliary input argument that can be passed to the underlying</span>
<span <span
class="cmr-12">objects.</span> </td> class="cmr-12">objects.</span> </td>
</tr></table></div> </tr></table></div>
<!--l. 129--><p class="noindent" > <!--l. 129--><p class="noindent" >
<!--l. 134--><p class="indent" > <span <!--l. 134--><p class="indent" > <span
class="cmr-12">A variety of preconditioners can be obtained by setting the appropriate</span> class="cmr-12">A variety of preconditioners can be obtained by setting the appropriate</span>

@ -190,22 +190,23 @@ make install
\subsection{Bug reporting} \subsection{Bug reporting}
If you find any bugs in our codes, please report them through our If you find any bugs in our codes, please report them through our
issues page on \\[2mm] issues page on \\[2mm]
\url{https://github.com/psctoolkit/amg4psblas/issues}\\ \url{https://github.com/psctoolkit/psctoolkit/issues}\\
To enable us to track the bug, please provide a log from the failing To enable us to track the bug, please provide a log from the failing
application, the test conditions, and ideally a self-contained test application, the test conditions, and ideally a self-contained test
program reproducing the issue. program reproducing the issue.
\subsection{Example and test programs\label{sec:ex_and_test}} \subsection{Example and test programs\label{sec:ex_and_test}}
The package contains the \verb|examples| and \verb|tests| directories; The package contains a \verb|samples| directory, divided in two
subdirs \verb|simple| and \verb|advanced|;
both of them are further divided into \verb|fileread| and both of them are further divided into \verb|fileread| and
\verb|pdegen| subdirectories. Their purpose is as follows: \verb|pdegen| subdirectories. Their purpose is as follows:
\begin{description} \begin{description}
\item[\tt examples] contains a set of simple example programs with a \item[\tt simple] contains a set of simple example programs with a
predefined choice of preconditioners, selectable via integer predefined choice of preconditioners, selectable via integer
values. These are intended to get acquainted with the values. These are intended to get acquainted with the
multilevel preconditioners available in AMG4PSBLAS. multilevel preconditioners available in AMG4PSBLAS.
\item[\tt tests] contains a set of more sophisticated examples that \item[\tt advanced] contains a set of more sophisticated examples that
will allow the user, via the input files in the \verb|runs| will allow the user, via the input files in the \verb|runs|
subdirectories, to experiment with the full range of preconditioners subdirectories, to experiment with the full range of preconditioners
implemented in the package. implemented in the package.

@ -159,4 +159,4 @@ Some influential environment variables:
Use these variables to override the choices made by `configure' or to help Use these variables to override the choices made by `configure' or to help
it to find libraries and programs with nonstandard names/locations. it to find libraries and programs with nonstandard names/locations.
Report bugs to <https://github.com/sfilippone/amg4psblas/issues>. Report bugs to <https://github.com/psctoolkit/psctoolkit/issues>.

@ -129,9 +129,9 @@ relevant data structures, performed
through the PSBLAS routines for sparse matrix and vector management, is not reported through the PSBLAS routines for sparse matrix and vector management, is not reported
here for the sake of conciseness. here for the sake of conciseness.
The complete code can be found in the example program file \verb|amg_dexample_ml.f90|, The complete code can be found in the example program file \verb|amg_dexample_ml.f90|,
in the directory \verb|examples/fileread| of the AMG4PSBLAS implementation (see in the directory \verb|samples/simple/file|\-\verb|read| of the AMG4PSBLAS implementation (see
Section~\ref{sec:ex_and_test}). A sample test problem along with the relevant Section~\ref{sec:ex_and_test}). A sample test problem along with the relevant
input data is available in \verb|examples/fileread/runs|. input data is available in \verb|samples/simple/fileread/runs|.
For details on the use of the PSBLAS routines, see the PSBLAS User's For details on the use of the PSBLAS routines, see the PSBLAS User's
Guide~\cite{PSBLASGUIDE}. Guide~\cite{PSBLASGUIDE}.
@ -139,7 +139,7 @@ The setup and application of the default multilevel preconditioner
for the real single precision and the complex, single and double for the real single precision and the complex, single and double
precision, versions are obtained with straightforward modifications of the previous precision, versions are obtained with straightforward modifications of the previous
example (see Section~\ref{sec:userinterface} for details). If these versions are installed, example (see Section~\ref{sec:userinterface} for details). If these versions are installed,
the corresponding codes are available in \verb|examples/fileread/|. the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|.
\begin{listing}[tbp] \begin{listing}[tbp]
\begin{center} \begin{center}
@ -300,7 +300,7 @@ The corresponding example program is available in the file
For all the previous preconditioners, example programs where the sparse matrix and For all the previous preconditioners, example programs where the sparse matrix and
the right-hand side are generated by discretizing a PDE with Dirichlet the right-hand side are generated by discretizing a PDE with Dirichlet
boundary conditions are also available in the directory \verb|examples/pdegen|. boundary conditions are also available in the directory \verb|samples/simple/pdegen|.
\vspace{-1em}\begin{listing}[tbh] \vspace{-1em}\begin{listing}[tbh]
\ifpdf% \ifpdf%
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran} \begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
@ -535,7 +535,8 @@ Krylov method. At the end of the code, we close the GPU environment
call prec%allocate_wrk(info) call prec%allocate_wrk(info)
t1 = psb_wtime() t1 = psb_wtime()
call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,& call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,&
& desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,& & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,&
& itrace=s_choice%itrace,&
& istop=s_choice%istopc,irst=s_choice%irst) & istop=s_choice%istopc,irst=s_choice%irst)
call prec%deallocate_wrk(info) call prec%deallocate_wrk(info)
call psb_barrier(ctxt) call psb_barrier(ctxt)
@ -584,15 +585,18 @@ Krylov method. At the end of the code, we close the GPU environment
\caption{setup of a GPU-enabled test program part three.\label{fig:gpu-ex3}} \caption{setup of a GPU-enabled test program part three.\label{fig:gpu-ex3}}
\end{listing} \end{listing}
It is very important to employ solvers that are suited It is very important to employ smoothers and coarsest solvers that are suited
to the GPU, i.e. solvers that do NOT employ triangular to the GPU, i.e. methods that do NOT employ triangular
system solve kernels. Solvers that satisfy this constraint include: system solve kernels. Methods that satisfy this constraint include:
\begin{itemize} \begin{itemize}
\item \verb|JACOBI| \item \verb|JACOBI|
\item \verb|BJAC| with the following methods on the local blocks:
\begin{itemize}
\item \verb|INVK| \item \verb|INVK|
\item \verb|INVT| \item \verb|INVT|
\item \verb|AINV| \item \verb|AINV|
\end{itemize} \end{itemize}
\end{itemize}
and their $\ell_1$ variants. and their $\ell_1$ variants.
%%% Local Variables: %%% Local Variables:

@ -87,9 +87,9 @@ terms: {\small
\end{verbatim} \end{verbatim}
} }
\pagebreak \pagebreak
AMG4PSBLAS is distributed together with (a small part) of the graph-matching AMG4PSBLAS is distributed together with (a small part of) the graph-matching
library MatchBox-P~\cite{MatchBoxP}. Per the license requirements, we reproduce library MatchBox-P~\cite{MatchBoxP}. Per the license requirements, we reproduce
the relative part here. the relevant part here.
{\small {\small
\begin{verbatim} \begin{verbatim}
// *********************************************************************** // ***********************************************************************

@ -154,7 +154,7 @@ Preconditioners Package based on PSBLAS}
\flushright \flushright
\large Software version: 1.0\\ \large Software version: 1.0\\
%\todaym %\todaym
\large April 12, 2021 \large May 11th, 2021
\end{minipage}} \end{minipage}}
%\addtolength{\textwidth}{\centeroffset} %\addtolength{\textwidth}{\centeroffset}
\vspace{\stretch{2}} \vspace{\stretch{2}}

@ -114,7 +114,7 @@
%\today %\today
Software version: 1.0\\ Software version: 1.0\\
%\today %\today
April 12, 2021 May 11th, 2021
\clearpage \clearpage
\ \\ \ \\
\thispagestyle{empty} \thispagestyle{empty}

@ -39,23 +39,18 @@
! !
! This sample program solves a linear system obtained by discretizing a ! This sample program solves a linear system obtained by discretizing a
! PDE with Dirichlet BCs. The solver is CG, coupled with one of the ! PDE with Dirichlet BCs. The solver is CG, coupled with one of the
! following multi-level preconditioner, as explained in Section 4.1 of ! following multi-level preconditioner, as explained in Section 4.2 of
! the AMG4PSBLAS User's and Reference Guide: ! the AMG4PSBLAS User's and Reference Guide:
! !
! - choice = 1, the default multi-level preconditioner solver, i.e., ! - choice = 1, a V-cycle with decoupled smoothed aggregation, 4 Jacobi
! V-cycle with decoupled smoothed aggregation, 1 hybrid forward/backward ! sweeps as pre/post-smoother and 8 Jacobi sweeps as coarsest-level
! GS sweep as pre/post-smoother and UMFPACK as coarsest-level ! solver with replicated coarsest matrix
! solver (Sec. 4.1, Listing 1)
! !
! - choice = 2, a V-cycle preconditioner with 1 block-Jacobi sweep ! - choice = 2, a W-cycle based on the coupled aggregation relying on matching,
! (with ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi ! with maximum size of aggregates equal to 8 and smoothed prolongators,
! sweeps (with ILU(0) on the blocks) as coarsest-level solver (Sec. 4.1, Listing 2) ! 2 sweeps of Block-Jacobi ipre/post-smoother using approximate inverse INVK and
! ! 4 sweeps of Block-Jacobi with INVK as coarsest-level solver on distributed
! - choice = 3, W-cycle preconditioner based on the coupled aggregation relying ! coarsest matrix
! on matching, with maximum size of aggregates equal to 8 and smoothed prolongators,
! 2 hybrid forward/backward GS sweeps as pre/post-smoother, a distributed coarsest
! matrix, and preconditioned Flexible Conjugate Gradient as coarsest-level solver
! (Sec. 4.1, Listing 3)
! !
! The matrix and the rhs are read from files (if an rhs is not available, the ! The matrix and the rhs are read from files (if an rhs is not available, the
! unit rhs is set). ! unit rhs is set).
@ -183,8 +178,9 @@ program amg_dexample_gpu
case(1) case(1)
! initialize a V-cycle preconditioner with 4 Jacobi sweep ! initialize a V-cycle preconditioner, relying on decoupled smoothed aggregation
! and 8 Jacobi sweeps as coarsest-level solver ! with 4 Jacobi sweeps as pre/post-smoother
! and 8 Jacobi sweeps as coarsest-level solver on replicated coarsest matrix
call P%init(ctxt,'ML',info) call P%init(ctxt,'ML',info)
call P%set('SMOOTHER_TYPE','JACOBI',info) call P%set('SMOOTHER_TYPE','JACOBI',info)
@ -195,19 +191,22 @@ program amg_dexample_gpu
case(2) case(2)
! initialize a V-cycle preconditioner based on the coupled aggregation relying on matching, ! initialize a W-cycle preconditioner based on the coupled aggregation relying on matching,
! with maximum size of aggregates equal to 8 and smoothed prolongators, ! with maximum size of aggregates equal to 8 and smoothed prolongators,
! Block-Jacobi smoother using approximate inverse INVK and ! 2 sweeps of Block-Jacobi pre/post-smoother using approximate inverse INVK and
! and 4 sweeps of INVK on he coarsest level ! 4 sweeps of Block-Jacobi with INVK on the coarsest level distributed matrix
call P%init(ctxt,'ML',info) call P%init(ctxt,'ML',info)
call P%set('PAR_AGGR_ALG','COUPLED',info) call P%set('PAR_AGGR_ALG','COUPLED',info)
call P%set('AGGR_TYPE','MATCHBOXP',info) call P%set('AGGR_TYPE','MATCHBOXP',info)
call P%set('AGGR_SIZE',8,info) call P%set('AGGR_SIZE',8,info)
call P%set('ML_CYCLE','WCYCLE',info) call P%set('ML_CYCLE','WCYCLE',info)
call P%set('SMOOTHER_TYPE','BJAC',info)
call P%set('SMOOTHER_SWEEPS',2,info) call P%set('SMOOTHER_SWEEPS',2,info)
call P%set('SUB_SOLVE','INVK',info) call P%set('SUB_SOLVE','INVK',info)
call P%set('COARSE_SOLVE','INVK',info) call P%set('COARSE_SOLVE','BJAC',info)
call P%set('COARSE_SUBSOLVE','INVK',info)
call P%set('COARSE_SWEEPS',4,info)
call P%set('COARSE_MAT','DIST',info) call P%set('COARSE_MAT','DIST',info)
kmethod = 'CG' kmethod = 'CG'

@ -1,14 +1,14 @@
! !
! !
! MLD2P4 version 2.2 ! AMG4PSBLAS version 1.0
! MultiLevel Domain Decomposition Parallel Preconditioners Package ! Algebraic Multigrid Package
! based on PSBLAS (Parallel Sparse BLAS version 3.5) ! based on PSBLAS (Parallel Sparse BLAS version 3.7)
! !
! (C) Copyright 2008-2018 ! (C) Copyright 2021
! !
! Salvatore Filippone ! Salvatore Filippone
! Pasqua D'Ambra ! Pasqua D'Ambra
! Daniela di Serafino ! Fabio Durastante
! !
! Redistribution and use in source and binary forms, with or without ! Redistribution and use in source and binary forms, with or without
! modification, are permitted provided that the following conditions ! modification, are permitted provided that the following conditions
@ -18,14 +18,14 @@
! 2. Redistributions in binary form must reproduce the above copyright ! 2. Redistributions in binary form must reproduce the above copyright
! notice, this list of conditions, and the following disclaimer in the ! notice, this list of conditions, and the following disclaimer in the
! documentation and/or other materials provided with the distribution. ! documentation and/or other materials provided with the distribution.
! 3. The name of the MLD2P4 group or the names of its contributors may ! 3. The name of the AMG4PSBLAS group or the names of its contributors may
! not be used to endorse or promote products derived from this ! not be used to endorse or promote products derived from this
! software without specific written permission. ! software without specific written permission.
! !
! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
! ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ! ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
! TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR ! TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS ! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS
! BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR ! BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF ! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS ! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS

@ -1,14 +1,14 @@
! !
! !
! MLD2P4 version 2.2 ! AMG4PSBLAS version 1.0
! MultiLevel Domain Decomposition Parallel Preconditioners Package ! Algebraic Multigrid Package
! based on PSBLAS (Parallel Sparse BLAS version 3.5) ! based on PSBLAS (Parallel Sparse BLAS version 3.7)
! !
! (C) Copyright 2008-2018 ! (C) Copyright 2021
! !
! Salvatore Filippone ! Salvatore Filippone
! Pasqua D'Ambra ! Pasqua D'Ambra
! Daniela di Serafino ! Fabio Durastante
! !
! Redistribution and use in source and binary forms, with or without ! Redistribution and use in source and binary forms, with or without
! modification, are permitted provided that the following conditions ! modification, are permitted provided that the following conditions
@ -18,14 +18,14 @@
! 2. Redistributions in binary form must reproduce the above copyright ! 2. Redistributions in binary form must reproduce the above copyright
! notice, this list of conditions, and the following disclaimer in the ! notice, this list of conditions, and the following disclaimer in the
! documentation and/or other materials provided with the distribution. ! documentation and/or other materials provided with the distribution.
! 3. The name of the MLD2P4 group or the names of its contributors may ! 3. The name of the AMG4PSBLAS group or the names of its contributors may
! not be used to endorse or promote products derived from this ! not be used to endorse or promote products derived from this
! software without specific written permission. ! software without specific written permission.
! !
! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
! ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ! ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
! TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR ! TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS ! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS
! BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR ! BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF ! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS ! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS

Loading…
Cancel
Save