Update documentation for release

gpucinterfaces
sfilippone 5 months ago
parent 43a75e3171
commit 06f8f83114

Binary file not shown.

@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR</span><br
class="newline" /> <span
class="cmr-12">Software version: 1.2</span><br
class="newline" /><span
class="cmr-12">December 31st, 2025</span>
class="cmr-12">December 23rd, 2025</span>

@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR</span><br
class="newline" /> <span
class="cmr-12">Software version: 1.2</span><br
class="newline" /><span
class="cmr-12">December 31st, 2025</span>
class="cmr-12">December 23rd, 2025</span>

@ -72,32 +72,34 @@ class="small-caps">n</span></span>
class="cmcsc-10x-x-120">PSBLAS</span><span
class="cmr-12">) is a package of parallel algebraic multilevel preconditioners included in the</span>
<span
class="cmr-12">PSCToolkit (Parallel Sparse Computation Toolkit) software framework. It is a progress</span>
class="cmr-12">PSCToolkit (Parallel Sparse Computation Toolkit) software framework. It</span>
<span
class="cmr-12">of a software development project started in 2007, named MLD2P4, which originally</span>
class="cmr-12">is an evolutiuon of a software development project started in 2007, named</span>
<span
class="cmr-12">implemented a multilevel version of some domain decomposition preconditioners of</span>
class="cmr-12">MLD2P4, which originally implemented a multilevel version of some domain</span>
<span
class="cmr-12">additive-Schwarz type, and was based on a parallel decoupled version of the well known</span>
class="cmr-12">decomposition preconditioners of additive-Schwarz type, and was based on a parallel</span>
<span
class="cmr-12">smoothed aggregation method to generate the multilevel hierarchy of coarser</span>
class="cmr-12">decoupled version of the well known smoothed aggregation method to generate the</span>
<span
class="cmr-12">matrices. In the last years, within the context of the EU-H2020 EoCoE project</span>
class="cmr-12">multilevel hierarchy of coarser matrices. In the last few years the package</span>
<span
class="cmr-12">(Energy Oriented Center of Excellence), the package was extended for including</span>
class="cmr-12">was extended for including new algorithms and functionalities for the setup</span>
<span
class="cmr-12">new algorithms and functionalities for the setup and application new AMG</span>
class="cmr-12">and application new AMG preconditioners with the final aims of improving</span>
<span
class="cmr-12">preconditioners with the final aims of improving efficiency and scalability when tens of</span>
class="cmr-12">efficiency and scalability when tens of thousands cores are used, and of boosting</span>
<span
class="cmr-12">thousands cores are used, and of boosting reliability in dealing with general</span>
class="cmr-12">reliability in dealing with general symmetric positive definite linear systems; these</span>
<span
class="cmr-12">symmetric positive definite linear systems. Due to the significant number</span>
class="cmr-12">developments have been supported in the context of the EU-H2020 EoCoE</span>
<span
class="cmr-12">project (Energy Oriented Center of Excellence). Due to the significant number</span>
<span
class="cmr-12">of changes and the increase in scope, we decided to rename the package as</span>
<span
class="cmr-12">AMG4PSBLAS.</span>
<!--l. 16--><p class="indent" > <span
<!--l. 27--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS has been designed to provide scalable and easy-to-use</span>
<span
class="cmr-12">preconditioners in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra</span>
@ -111,14 +113,14 @@ class="cmr-12">algebraic approach; therefore users level interfaces assume that
class="cmr-12">and preconditioners are represented as PSBLAS distributed sparse matrices.</span>
<span
class="cmr-12">AMG4PSBLAS enables the user to easily specify different features of an algebraic</span>
<span
class="cmr-12">multilevel preconditioner, thus allowing to experiment with different preconditioners for</span>
<span
class="cmr-12">multilevel preconditioner, thus allowing to experiment with different preconditioners for</span>
<span
class="cmr-12">the problem and parallel computers at hand.</span>
<!--l. 27--><p class="indent" > <span
<!--l. 39--><p class="indent" > <span
class="cmr-12">The package employs object-oriented design techniques in Fortran</span><span
class="cmr-12">&#x00A0;2003, with</span>
<span
@ -132,7 +134,7 @@ class="cmr-12">parallel implementation is based on a Single Program Multiple Dat
class="cmr-12">paradigm; the inter-process communication is based on MPI and is managed mainly</span>
<span
class="cmr-12">through PSBLAS.</span>
<!--l. 35--><p class="indent" > <span
<!--l. 47--><p class="indent" > <span
class="cmr-12">This guide provides a brief description of the functionalities and the user interface</span>
<span
class="cmr-12">of AMG4PSBLAS.</span>

@ -100,18 +100,18 @@ src="userhtml0x.png" alt="Ax = b,
id="x4-3001r1"></a></div>
</td><td class="equation-label"><span
class="cmr-12">(1)</span></td></tr></table>
<!--l. 11--><p class="nopar" ><span
<!--l. 13--><p class="nopar" ><span
class="cmr-12">where </span><span
class="cmmi-12">A </span><span
class="cmr-12">is a square, real or complex, sparse symmetric positive definite (s.p.d)</span>
<span
class="cmr-12">matrix.</span>
<!--l. 19--><p class="indent" > <span
<!--l. 21--><p class="indent" > <span
class="cmr-12">The preconditioners implemented in AMG4PSBLAS are obtained by combining 3</span>
<span
class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. Available</span>
class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. We provide a</span>
<span
class="cmr-12">multigrid cycles include the V-, W-, and a version of a Krylov-type cycle</span>
class="cmr-12">number of multigrid cycles, including the V-, W-, and a version of a Krylov-type cycle</span>
<span
class="cmr-12">(K-cycle)</span><span
class="cmr-12">&#x00A0;</span><span class="cite"><span
@ -140,7 +140,7 @@ href="userhtmlli3.html#XDDF2020"><span
class="cmr-12">14</span></a><span
class="cmr-12">]</span></span><span
class="cmr-12">.</span>
<!--l. 30--><p class="indent" > <span
<!--l. 34--><p class="indent" > <span
class="cmr-12">An algebraic approach is used to generate a hierarchy of coarse-level matrices and</span>
<span
class="cmr-12">operators, without explicitly using any information on the geometry of the original</span>
@ -150,7 +150,7 @@ class="cmr-12">problem, e.g., the discretization of a PDE. To this end, two diff
class="cmr-12">strategies, based on aggregation, are available:</span>
<ul class="itemize1">
<li class="itemize">
<!--l. 35--><p class="noindent" ><span
<!--l. 39--><p class="noindent" ><span
class="cmr-12">a decoupled version of the smoothed aggregation procedure proposed in</span><span
class="cmr-12">&#x00A0;</span><span class="cite"><span
class="cmr-12">[</span><a
@ -178,7 +178,7 @@ class="cmr-12">;</span>
</li>
<li class="itemize">
<!--l. 39--><p class="noindent" ><span
<!--l. 43--><p class="noindent" ><span
class="cmr-12">a coupled, parallel implementation of the Coarsening based on Compatible</span>
<span
class="cmr-12">Weighted Matching introduced in</span><span
@ -198,7 +198,7 @@ href="userhtmlli3.html#XDDF2020"><span
class="cmr-12">14</span></a><span
class="cmr-12">]</span></span><span
class="cmr-12">;</span></li></ul>
<!--l. 43--><p class="noindent" ><span
<!--l. 47--><p class="noindent" ><span
class="cmr-12">Either exact or approximate solvers can be used on the coarsest-level system. We provide</span>
<span
class="cmr-12">interfaces to various parallel and sequential sparse LU factorizations from external</span>
@ -210,7 +210,7 @@ class="cmr-12">parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solve
class="cmr-12">preconditioned Krylov methods; all smoothers can be also exploited as one-level</span>
<span
class="cmr-12">preconditioners.</span>
<!--l. 50--><p class="indent" > <span
<!--l. 55--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS is written in Fortran</span><span
class="cmr-12">&#x00A0;2003, following an object-oriented design</span>
<span
@ -225,12 +225,12 @@ class="cmr-12">Single and double precision implementations of AMG4PSBLAS are ava
class="cmr-12">for both the real and the complex case, which can be used through a single</span>
<span
class="cmr-12">interface.</span>
<!--l. 60--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS has been designed to implement scalable and easy-to-use</span>
<!--l. 65--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS has been designed to implement scalable and easy-to-use multilevel</span>
<span
class="cmr-12">multilevel preconditioners in the context of the PSBLAS (Parallel Sparse BLAS)</span>
class="cmr-12">preconditioners in the context of the PSBLAS (Parallel Sparse BLAS) computational</span>
<span
class="cmr-12">computational framework</span><span
class="cmr-12">framework</span><span
class="cmr-12">&#x00A0;</span><span class="cite"><span
class="cmr-12">[</span><a
href="userhtmlli3.html#Xpsblas_00"><span
@ -240,37 +240,35 @@ class="cmr-12">&#x00A0;</span><a
href="userhtmlli3.html#XPSBLAS3"><span
class="cmr-12">22</span></a><span
class="cmr-12">]</span></span><span
class="cmr-12">. PSBLAS provides basic linear algebra operators</span>
class="cmr-12">. PSBLAS provides basic linear algebra operators and data</span>
<span
class="cmr-12">and data management facilities for distributed sparse matrices, kernels for</span>
class="cmr-12">management facilities for distributed sparse matrices, kernels for sequential incomplete</span>
<span
class="cmr-12">sequential incomplete factorizations needed for the parallel block-Jacobi and</span>
class="cmr-12">factorizations needed for the parallel block-Jacobi and additive Schwarz smoothers, and</span>
<span
class="cmr-12">additive Schwarz smoothers, and parallel Krylov solvers which can be used with</span>
class="cmr-12">parallel Krylov solvers which can be used with the AMG4PSBLAS preconditioners.</span>
<span
class="cmr-12">the AMG4PSBLAS preconditioners. The choice of PSBLAS has been mainly</span>
class="cmr-12">The choice of PSBLAS has been mainly motivated by the need of having a portable</span>
<span
class="cmr-12">motivated by the need of having a portable and efficient software infrastructure</span>
class="cmr-12">and efficient software infrastructure implementing &#8220;de facto&#8221; standard parallel sparse</span>
<span
class="cmr-12">implementing &#8220;de facto&#8221; standard parallel sparse linear algebra kernels, to</span>
class="cmr-12">linear algebra kernels, to pursue goals such as performance, portability, modularity</span>
<span
class="cmr-12">pursue goals such as performance, portability, modularity ed extensibility</span>
class="cmr-12">ed extensibility in the development of the preconditioner package. On the</span>
<span
class="cmr-12">in the development of the preconditioner package. On the other hand, the</span>
class="cmr-12">other hand, the implementation of AMG4PSBLAS, which was driven by the</span>
<span
class="cmr-12">implementation of AMG4PSBLAS, which was driven by the need to face the exascale</span>
class="cmr-12">need to face the exascale challenge, has led to some important revisions and</span>
<span
class="cmr-12">challenge, has led to some important revisions and extentions of the PSBLAS</span>
class="cmr-12">extentions of the PSBLAS infrastructure. The inter-process comunication</span>
<span
class="cmr-12">infrastructure. The inter-process comunication required by AMG4PSBLAS</span>
class="cmr-12">required by AMG4PSBLAS is encapsulated in the PSBLAS routines; therefore,</span>
<span
class="cmr-12">is encapsulated in the PSBLAS routines; therefore, AMG4PSBLAS can be</span>
class="cmr-12">AMG4PSBLAS can be run on any parallel machine where PSBLAS implementations</span>
<span
class="cmr-12">run on any parallel machine where PSBLAS implementations are available.</span>
class="cmr-12">are available. The most recent version of PSBLAS (release 3.9) includes a plug-in for</span>
<span
class="cmr-12">In the most recent version of PSBLAS (release 3.7), a plug-in for GPU is</span>
<span
class="cmr-12">included; it includes CUDA versions of main vector operations and of sparse</span>
class="cmr-12">GPU; it contains CUDA versions of main vector operations and of sparse</span>
<span
class="cmr-12">matrix-vector multiplication, so that Krylov methods coupled with AMG4PSBLAS</span>
<span
@ -279,17 +277,17 @@ class="cmr-12">preconditioners relying on Jacobi and block-Jacobi smoothers with
class="cmr-12">approximate inverses on the blocks can be efficiently executed on cluster of</span>
<span
class="cmr-12">GPUs.</span>
<!--l. 85--><p class="indent" > <span
<!--l. 90--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS has a layered and modular software architecture where three main</span>
<span
class="cmr-12">layers can be identified. The lower layer consists of the PSBLAS kernels, the middle</span>
<span
class="cmr-12">one implements the construction and application phases of the preconditioners, and the</span>
<span
class="cmr-12">upper one provides a uniform interface to all the preconditioners. This architecture</span>
<span
class="cmr-12">upper one provides a uniform interface to all the preconditioners. This architecture</span>
<span
class="cmr-12">allows for different levels of use of the package: few black-box routines at the upper</span>
<span
@ -304,7 +302,7 @@ class="cmr-12">&#x00A0;</span><a
href="userhtmlse6.html#x9-310006"><span
class="cmr-12">6</span><!--tex4ht:ref: sec:adding --></a><span
class="cmr-12">).</span>
<!--l. 96--><p class="indent" > <span
<!--l. 102--><p class="indent" > <span
class="cmr-12">This guide is organized as follows. General information on the distribution of the</span>
<span
class="cmr-12">source code is reported in Section</span><span

@ -58,9 +58,9 @@ class="cmr-12">. Most Fortran compilers provide this feature; in particular, thi
<span
class="cmr-12">supported by the GNU Fortran compiler, for which we recommend to use at least</span>
<span
class="cmr-12">version 4.8. The software defines data types and interfaces for real and complex data,</span>
class="cmr-12">version 12. The software defines data types and interfaces for real and complex data, in</span>
<span
class="cmr-12">in both single and double precision.</span>
class="cmr-12">both single and double precision.</span>
<!--l. 20--><p class="indent" > <span
class="cmr-12">Building AMG4PSBLAS requires some base libraries (see Section</span><span
class="cmr-12">&#x00A0;</span><a
@ -85,19 +85,19 @@ class="cmr-12">&#8220;developer&#8221; part; in order to build AMG4PSBLAS you ne
class="cmr-12">the base and optional software used by AMG4PSBLAS is given in the next</span>
<span
class="cmr-12">sections.</span>
<!--l. 30--><p class="noindent" >
<!--l. 31--><p class="noindent" >
<h4 class="subsectionHead"><span class="titlemark"><span
class="cmr-12">3.1 </span></span> <a
id="x6-80003.1"></a><span
class="cmr-12">Prerequisites</span></h4>
<!--l. 32--><p class="noindent" ><span
<!--l. 33--><p class="noindent" ><span
class="cmr-12">The following base libraries are needed:</span>
<dl class="description"><dt class="description">
<!--l. 34--><p class="noindent" >
<!--l. 35--><p class="noindent" >
<span
class="cmbx-12">BLAS</span> </dt><dd
class="description">
<!--l. 34--><p class="noindent" ><span class="cite"><span
<!--l. 35--><p class="noindent" ><span class="cite"><span
class="cmr-12">[</span><a
href="userhtmlli3.html#Xblas3"><span
class="cmr-12">18</span></a><span
@ -152,11 +152,11 @@ class="cmr-12">for including -fPIC compilation option in the make.inc file of th
<span
class="cmr-12">library.</span>
</dd><dt class="description">
<!--l. 51--><p class="noindent" >
<!--l. 52--><p class="noindent" >
<span
class="cmbx-12">MPI</span> </dt><dd
class="description">
<!--l. 51--><p class="noindent" ><span class="cite"><span
<!--l. 52--><p class="noindent" ><span class="cite"><span
class="cmr-12">[</span><a
href="userhtmlli3.html#XMPI2"><span
class="cmr-12">25</span></a><span
@ -169,11 +169,11 @@ class="cmr-12">A version of MPI is available on most high-performance computing<
<span
class="cmr-12">systems.</span>
</dd><dt class="description">
<!--l. 53--><p class="noindent" >
<!--l. 54--><p class="noindent" >
<span
class="cmbx-12">PSBLAS</span> </dt><dd
class="description">
<!--l. 53--><p class="noindent" ><span class="cite"><span
<!--l. 54--><p class="noindent" ><span class="cite"><span
class="cmr-12">[</span><a
href="userhtmlli3.html#XPSBLASGUIDE"><span
class="cmr-12">21</span></a><span
@ -192,13 +192,13 @@ class="cmr-12">; version</span>
class="cmr-12">3.9.0 (or later) is required. Indeed, all the prerequisites listed so far are also</span>
<span
class="cmr-12">prerequisites of PSBLAS.</span></dd></dl>
<!--l. 60--><p class="noindent" ><span
<!--l. 61--><p class="noindent" ><span
class="cmr-12">Please note that the four previous libraries must have Fortran interfaces compatible with</span>
<span
class="cmr-12">AMG4PSBLAS; usually this means that they should all be built with the same</span>
<span
class="cmr-12">compiler being used for AMG4PSBLAS.</span>
<!--l. 64--><p class="indent" > <span
<!--l. 65--><p class="indent" > <span
class="cmr-12">If you want to use the PSBLAS support for NVIDIA GPUs, you will also</span>
<span
class="cmr-12">need a working version of the CUDA Toolkit that is compatible with the</span>
@ -214,7 +214,7 @@ class="cmr-12">options:</span>
<pre class="verbatim" id="verbatim-2">
./configure&#x00A0;--enable-cuda&#x00A0;--with-cudadir=${CUDA_HOME}&#x00A0;--with-cudacc=xx,yy,zz
</pre>
<!--l. 89--><p class="nopar" > <span
<!--l. 90--><p class="nopar" > <span
class="cmr-12">Previous versions required you to have the auxiliary libraries SPGPU and</span>
<span
class="cmr-12">PSBLAS-EXT compiled, this is no longer necessary because they have been integrated</span>
@ -226,24 +226,24 @@ class="cmr-12">&#x00A0;</span><a
href="userhtmlse4.html#x7-150004.2"><span
class="cmr-12">4.2</span><!--tex4ht:ref: sec:gpu-example --></a><span
class="cmr-12">.</span>
<!--l. 96--><p class="noindent" >
<!--l. 97--><p class="noindent" >
<h4 class="subsectionHead"><span class="titlemark"><span
class="cmr-12">3.2 </span></span> <a
id="x6-90003.2"></a><span
class="cmr-12">Optional third party libraries</span></h4>
<!--l. 98--><p class="noindent" ><span
<!--l. 99--><p class="noindent" ><span
class="cmr-12">We provide interfaces to the following third-party software libraries; note that these are</span>
<span
class="cmr-12">optional, but if you enable them some defaults for multilevel preconditioners may</span>
<span
class="cmr-12">change to reflect their presence.</span>
<!--l. 102--><p class="indent" >
<!--l. 103--><p class="indent" >
<dl class="description"><dt class="description">
<!--l. 103--><p class="noindent" >
<!--l. 104--><p class="noindent" >
<span
class="cmbx-12">UMFPACK</span> </dt><dd
class="description">
<!--l. 103--><p class="noindent" ><span class="cite"><span
<!--l. 104--><p class="noindent" ><span class="cite"><span
class="cmr-12">[</span><a
href="userhtmlli3.html#XUMFPACK"><span
class="cmr-12">16</span></a><span
@ -266,11 +266,11 @@ class="cmr-12">provide the right path to the BLAS and LAPACK libraries
class="cmtt-12">SuiteSparse_config/SuiteSparse_config.mk</span></span></span> <span
class="cmr-12">file.</span>
</dd><dt class="description">
<!--l. 110--><p class="noindent" >
<!--l. 111--><p class="noindent" >
<span
class="cmbx-12">MUMPS</span> </dt><dd
class="description">
<!--l. 110--><p class="noindent" ><span class="cite"><span
<!--l. 111--><p class="noindent" ><span class="cite"><span
class="cmr-12">[</span><a
href="userhtmlli3.html#XMUMPS"><span
class="cmr-12">2</span></a><span
@ -286,14 +286,14 @@ class="cmr-12">solution for single and double precision, real and complex data.
<span
class="cmr-12">versions 4.10.0 and 5.0.1.</span>
</dd><dt class="description">
<!--l. 115--><p class="noindent" >
<!--l. 116--><p class="noindent" >
<span
class="cmbx-12">SuperLU</span> </dt><dd
class="description">
<!--l. 115--><p class="noindent" ><span class="cite"><span
<!--l. 116--><p class="noindent" ><span class="cite"><span
class="cmr-12">[</span><a
href="userhtmlli3.html#XSUPERLU"><span
class="cmr-12">17</span></a><span
@ -312,12 +312,12 @@ class="cmr-12">data. We tested versions 4.3 and 5.0. If you installed BLAS from
<span
class="cmr-12">remember to define the BLASLIB variable in the make.inc file.</span>
</dd><dt class="description">
<!--l. 121--><p class="noindent" >
<!--l. 122--><p class="noindent" >
<span
class="cmbx-12">SuperLU</span><span
class="cmbx-12">_Dist</span> </dt><dd
class="description">
<!--l. 121--><p class="noindent" ><span class="cite"><span
<!--l. 122--><p class="noindent" ><span class="cite"><span
class="cmr-12">[</span><a
href="userhtmlli3.html#XSUPERLUDIST"><span
class="cmr-12">28</span></a><span
@ -341,18 +341,18 @@ class="cmr-12">parallel graph partitioning and fill-reducing matrix ordering, av
href="glaros.dtc.umn.edu/gkhome/metis/parmetis/overview" class="url" ><span
class="cmtt-12">glaros.dtc.umn.edu/gkhome/metis/parmetis/overview</span></a><span
class="cmr-12">.</span></dd></dl>
<!--l. 133--><p class="noindent" >
<!--l. 134--><p class="noindent" >
<h4 class="subsectionHead"><span class="titlemark"><span
class="cmr-12">3.3 </span></span> <a
id="x6-100003.3"></a><span
class="cmr-12">Configuration options</span></h4>
<!--l. 135--><p class="noindent" ><span
<!--l. 136--><p class="noindent" ><span
class="cmr-12">In order to build AMG4PSBLAS, the first step is to use the </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">configure</span></span></span> <span
class="cmr-12">script in the</span>
<span
class="cmr-12">main directory to generate the necessary makefile.</span>
<!--l. 139--><p class="indent" > <span
<!--l. 140--><p class="indent" > <span
class="cmr-12">As a minimal example consider the following:</span>
@ -360,7 +360,7 @@ class="cmr-12">As a minimal example consider the following:</span>
<pre class="verbatim" id="verbatim-3">
./configure&#x00A0;--with-psblas=PSB-INSTALL-DIR
</pre>
<!--l. 147--><p class="nopar" > <span
<!--l. 148--><p class="nopar" > <span
class="cmr-12">which assumes that the various MPI compilers and support libraries are available in</span>
<span
class="cmr-12">the standard directories on the system, and specifies only the PSBLAS install directory</span>
@ -374,7 +374,7 @@ class="cmtt-12">./configure</span><span
class="cmtt-12">&#x00A0;--help</span></span></span><span
class="cmr-12">, which</span>
<span
class="cmr-12">produces: </span><!--l. 158--><pre class="lstinputlisting" id="listing-1"><span class="label"><a
class="cmr-12">produces: </span><!--l. 159--><pre class="lstinputlisting" id="listing-1"><span class="label"><a
id="x6-10002r1"></a></span><span style="color:#000000"><span
class="cmtt-12">&#8216;</span></span><span style="color:#000000"><span
class="cmtt-12">configure</span></span><span style="color:#000000"><span
@ -3910,8 +3910,8 @@ class="cmtt-12">/</span></span><span style="color:#000000"><span
class="cmtt-12">issues</span></span><span style="color:#000000"><span
class="cmtt-12">&#x003E;.</span></span>
</pre>
<!--l. 160--><p class="noindent" ><span
class="cmr-12">For instance, if a user has built and installed PSBLAS 3.7 under the </span><span class="obeylines-h"><span class="verb"><span
<!--l. 161--><p class="noindent" ><span
class="cmr-12">For instance, if a user has built and installed PSBLAS 3.9 under the </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">/opt</span></span></span> <span
class="cmr-12">directory and is</span>
<span
@ -3922,10 +3922,10 @@ class="cmr-12">might be configured with:</span>
<pre class="verbatim" id="verbatim-4">
./configure&#x00A0;--with-psblas=/opt/psblas-3.7/&#x00A0;\
./configure&#x00A0;--with-psblas=/opt/psblas-3.9/&#x00A0;\
--with-umfpackincdir=/usr/include/suitesparse/
</pre>
<!--l. 172--><p class="nopar" > <span
<!--l. 173--><p class="nopar" > <span
class="cmr-12">Once the configure script has completed execution, it will have generated the file</span>
<span class="obeylines-h"><span class="verb"><span
class="cmtt-12">Make.inc</span></span></span> <span
@ -3934,7 +3934,7 @@ class="cmr-12">which will then be used by all Makefiles in the directory tree; t
class="cmr-12">copied in the install directory under the name </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">Make.inc.AMG4PSBLAS</span></span></span><span
class="cmr-12">.</span>
<!--l. 179--><p class="indent" > <span
<!--l. 180--><p class="indent" > <span
class="cmr-12">To use the MUMPS solver package, the user has to add the appropriate options to</span>
<span
class="cmr-12">the configure script; by default we are looking for the libraries </span><span class="obeylines-h"><span class="verb"><span
@ -3954,7 +3954,7 @@ class="cmtt-12">--with-extra-libs</span></span></span> <span
class="cmr-12">configure</span>
<span
class="cmr-12">option.</span>
<!--l. 187--><p class="indent" > <span
<!--l. 188--><p class="indent" > <span
class="cmr-12">To build the library the user will now enter</span>
@ -3962,7 +3962,7 @@ class="cmr-12">To build the library the user will now enter</span>
<pre class="verbatim" id="verbatim-5">
make
</pre>
<!--l. 195--><p class="nopar" > <span
<!--l. 196--><p class="nopar" > <span
class="cmr-12">followed (optionally) by</span>
@ -3970,12 +3970,12 @@ class="cmr-12">followed (optionally) by</span>
<pre class="verbatim" id="verbatim-6">
make&#x00A0;install
</pre>
<!--l. 205--><p class="nopar" >
<!--l. 206--><p class="nopar" >
<h4 class="subsectionHead"><span class="titlemark"><span
class="cmr-12">3.4 </span></span> <a
id="x6-110003.4"></a><span
class="cmr-12">Bug reporting</span></h4>
<!--l. 208--><p class="noindent" ><span
<!--l. 209--><p class="noindent" ><span
class="cmr-12">If you find any bugs in our codes, please report them through our issues page</span>
<span
class="cmr-12">on</span><br
@ -3983,18 +3983,18 @@ class="newline" /> <a
href="https://github.com/psctoolkit/psctoolkit/issues" class="url" ><span
class="cmtt-12">https://github.com/psctoolkit/psctoolkit/issues</span></a><br
class="newline" />
<!--l. 212--><p class="indent" > <span
<!--l. 213--><p class="indent" > <span
class="cmr-12">To enable us to track the bug, please provide a log from the failing application, the</span>
<span
class="cmr-12">test conditions, and ideally a self-contained test program reproducing the</span>
<span
class="cmr-12">issue.</span>
<!--l. 216--><p class="noindent" >
<!--l. 217--><p class="noindent" >
<h4 class="subsectionHead"><span class="titlemark"><span
class="cmr-12">3.5 </span></span> <a
id="x6-120003.5"></a><span
class="cmr-12">Example and test programs</span></h4>
<!--l. 217--><p class="noindent" ><span
<!--l. 218--><p class="noindent" ><span
class="cmr-12">The package contains a </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">samples</span></span></span> <span
class="cmr-12">directory, divided in two subdirs </span><span class="obeylines-h"><span class="verb"><span
@ -4010,22 +4010,22 @@ class="cmr-12">subdirectories.</span>
<span
class="cmr-12">Their purpose is as follows:</span>
<dl class="description"><dt class="description">
<!--l. 222--><p class="noindent" >
<!--l. 223--><p class="noindent" >
<span
class="cmtt-12">simple</span> </dt><dd
class="description">
<!--l. 222--><p class="noindent" ><span
<!--l. 223--><p class="noindent" ><span
class="cmr-12">contains a set of simple example programs with a predefined choice of</span>
<span
class="cmr-12">preconditioners, selectable via integer values. These are intended to get</span>
<span
class="cmr-12">acquainted with the multilevel preconditioners available in AMG4PSBLAS.</span>
</dd><dt class="description">
<!--l. 226--><p class="noindent" >
<!--l. 227--><p class="noindent" >
<span
class="cmtt-12">advanced</span> </dt><dd
class="description">
<!--l. 226--><p class="noindent" ><span
<!--l. 227--><p class="noindent" ><span
class="cmr-12">contains a set of more sophisticated examples that will allow the user, via</span>
<span
class="cmr-12">the input files in the </span><span class="obeylines-h"><span class="verb"><span
@ -4033,7 +4033,7 @@ class="cmtt-12">runs</span></span></span> <span
class="cmr-12">subdirectories, to experiment with the full range</span>
<span
class="cmr-12">of preconditioners implemented in the package.</span></dd></dl>
<!--l. 231--><p class="noindent" ><span
<!--l. 232--><p class="noindent" ><span
class="cmr-12">The </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">fileread</span></span></span> <span
class="cmr-12">directories contain sample programs that read sparse matrices from files,</span>

@ -351,7 +351,7 @@ class="content">Preconditioner types, corresponding strings and default choices.
</div><hr class="endfloat" />
</div>
<!--l. 98--><p class="indent" > <span
<!--l. 97--><p class="indent" > <span
class="cmr-12">Note that the module </span><code class="lstinline"><span style="color:#000000">amg_prec_mod</span></code><span
class="cmr-12">, containing the definition of the preconditioner</span>
<span
@ -370,7 +370,7 @@ class="cmr-12">4.1</span><!--tex4ht:ref: sec:examples --></a><span
class="cmr-12">).</span>
<br
class="newline" />
<!--l. 105--><p class="indent" > <span
<!--l. 104--><p class="indent" > <span
class="cmbx-12">Remark 1. </span><span
class="cmr-12">Coarsest-level solvers based on the LU factorization, such as those</span>
<span
@ -385,11 +385,24 @@ class="cmr-12">problems. However, this does not necessarily correspond to the sh
<span
class="cmr-12">on parallel</span><span
class="cmr-12">&#x00A0;computers.</span>
<!--l. 112--><p class="indent" > <span
class="cmbx-12">Remark 2. </span><span
class="cmr-12">Memory allocation on GPUs is a costly operation implying a</span>
<span
class="cmr-12">synchronization; therefore, it is convenient to preallocate internal preconditioner</span>
<span
class="cmr-12">workspace with the method </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">prec%allocate_wrk(info)</span></span></span> <span
class="cmr-12">before invoking an iterative</span>
<span
class="cmr-12">method, and release it upon exit with </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">prec%deallocate_wrk(info)</span></span></span><span
class="cmr-12">.</span>
<h4 class="subsectionHead"><span class="titlemark"><span
class="cmr-12">4.1 </span></span> <a
id="x7-140004.1"></a><span
class="cmr-12">Examples</span></h4>
<!--l. 116--><p class="noindent" ><span
<!--l. 121--><p class="noindent" ><span
class="cmr-12">The code reported in Figure</span><span
class="cmr-12">&#x00A0;</span><a
href="#x7-14001r1"><span
@ -418,7 +431,7 @@ class="cmr-12">and </span><code class="lstinline"><span style="color:#000000">ps
class="cmr-12">must be used by the example</span>
<span
class="cmr-12">program.</span>
<!--l. 126--><p class="indent" > <span
<!--l. 131--><p class="indent" > <span
class="cmr-12">The part of the code dealing with reading and assembling the sparse matrix and the</span>
<span
class="cmr-12">right-hand side vector and the deallocation of the relevant data structures, performed</span>
@ -451,7 +464,7 @@ href="userhtmlli3.html#XPSBLASGUIDE"><span
class="cmr-12">21</span></a><span
class="cmr-12">]</span></span><span
class="cmr-12">.</span>
<!--l. 138--><p class="indent" > <span
<!--l. 143--><p class="indent" > <span
class="cmr-12">The setup and application of the default multilevel preconditioner for the real single</span>
<span
class="cmr-12">precision and the complex, single and double precision, versions are obtained</span>
@ -461,6 +474,9 @@ class="cmr-12">&#x00A0;</span><a
href="userhtmlse5.html#x8-160005"><span
class="cmr-12">5</span><!--tex4ht:ref: sec:userinterface --></a> <span
class="cmr-12">for</span>
<span
class="cmr-12">details). If these versions are installed, the corresponding codes are available in</span>
<span class="obeylines-h"><span class="verb"><span
@ -470,7 +486,7 @@ class="cmr-12">.</span>
<!--l. 144--><p class="indent" > <a
<!--l. 148--><p class="indent" > <a
id="x7-14001r1"></a><hr class="float"><div class="float"
>
@ -478,7 +494,7 @@ class="cmr-12">.</span>
<div class="center"
>
<!--l. 145--><p class="noindent" >
<!--l. 149--><p class="noindent" >
@ -535,7 +551,7 @@ class="cmr-12">.</span>
&#x00A0;&#x00A0;call&#x00A0;psb_exit(ctxt)
&#x00A0;&#x00A0;stop
</pre>
<!--l. 255--><p class="nopar" > </div>
<!--l. 259--><p class="nopar" > </div>
@ -548,7 +564,7 @@ class="content">setup and application of the default multilevel preconditioner (
</div><hr class="endfloat" />
<!--l. 264--><p class="indent" > <span
<!--l. 267--><p class="indent" > <span
class="cmr-12">Different versions of the multilevel preconditioner can be obtained by changing the</span>
<span
class="cmr-12">default values of the preconditioner parameters. The code reported in Figure</span><span
@ -557,42 +573,40 @@ href="#x7-14002r2"><span
class="cmr-12">2</span><!--tex4ht:ref: fig:ex2 --></a> <span
class="cmr-12">shows</span>
<span
class="cmr-12">how to set a V-cycle preconditioner which applies 1 block-Jacobi sweep as pre-</span>
class="cmr-12">how to set a V-cycle preconditioner which applies 1 block-Jacobi sweep as pre- and</span>
<span
class="cmr-12">and post-smoother, and solves the coarsest-level system with 8 block-Jacobi</span>
class="cmr-12">post-smoother, and solves the coarsest-level system with 8 block-Jacobi sweeps. Note</span>
<span
class="cmr-12">sweeps. Note that the ILU(0) factorization (plus triangular solve) is used as</span>
class="cmr-12">that the ILU(0) factorization (plus triangular solve) is used as local solver for the</span>
<span
class="cmr-12">local solver for the block-Jacobi sweeps, since this is the default associated</span>
class="cmr-12">block-Jacobi sweeps, since this is the default associated with block-Jacobi and set</span>
<span
class="cmr-12">with block-Jacobi and set by</span><span
class="cmr-12">by</span><span
class="cmr-12">&#x00A0;</span><code class="lstinline"><span style="color:#000000">P</span><span style="color:#000000">%</span><span style="color:#000000">init</span></code><span
class="cmr-12">. Furthermore, specifying block-Jacobi as</span>
<span
class="cmr-12">coarsest-level solver implies that the coarsest-level matrix is distributed among</span>
class="cmr-12">. Furthermore, specifying block-Jacobi as coarsest-level solver implies that</span>
<span
class="cmr-12">the processes. Figure</span><span
class="cmr-12">the coarsest-level matrix is distributed among the processes. Figure</span><span
class="cmr-12">&#x00A0;</span><a
href="#x7-14003r3"><span
class="cmr-12">3</span><!--tex4ht:ref: fig:ex3 --></a> <span
class="cmr-12">shows how to set a W-cycle preconditioner using the</span>
class="cmr-12">shows how</span>
<span
class="cmr-12">Coarsening based on Compatible Weighted Matching, aggregates of size at</span>
class="cmr-12">to set a W-cycle preconditioner using the Coarsening based on Compatible</span>
<span
class="cmr-12">most 8 and smoothed prolongators. It applies 2 hybrid Gauss-Seidel sweeps as</span>
class="cmr-12">Weighted Matching, aggregates of size at most 8 and smoothed prolongators. It</span>
<span
class="cmr-12">pre- and post-smoother, and solves the coarsest-level system with the parallel</span>
class="cmr-12">applies 2 hybrid Gauss-Seidel sweeps as pre- and post-smoother, and solves the</span>
<span
class="cmr-12">flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi</span>
class="cmr-12">coarsest-level system with the parallel flexible Conjugate Gradient method (KRM)</span>
<span
class="cmr-12">preconditioner having ILU(0) on the blocks. Default parameters are used for stopping</span>
class="cmr-12">coupled with the block-Jacobi preconditioner having ILU(0) on the blocks, with</span>
<span
class="cmr-12">criterion of the coarsest solver. Note that, also in this case, specifying KRM as</span>
class="cmr-12">default parameters used for the coarsest solver. Note that specifying KRM as</span>
<span
class="cmr-12">coarsest-level solver implies that the coarsest-level matrix is distributed among the</span>
<span
class="cmr-12">processes.</span>
<!--l. 291--><p class="indent" > <span
<!--l. 299--><p class="indent" > <span
class="cmr-12">The code fragments shown in Figures</span><span
class="cmr-12">&#x00A0;</span><a
href="#x7-14002r2"><span
@ -605,7 +619,7 @@ class="cmr-12">are included in the example program</span>
class="cmr-12">file </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">amg_dexample_ml.f90</span></span></span> <span
class="cmr-12">too.</span>
<!--l. 294--><p class="indent" > <span
<!--l. 302--><p class="indent" > <span
class="cmr-12">Finally, Figure</span><span
class="cmr-12">&#x00A0;</span><a
href="#x7-14004r4"><span
@ -620,7 +634,7 @@ class="cmr-12">nonsymmetric. The corresponding example program is available in t
<span class="obeylines-h"><span class="verb"><span
class="cmtt-12">amg_dexample_1lev.f90</span></span></span><span
class="cmr-12">.</span>
<!--l. 301--><p class="indent" > <span
<!--l. 309--><p class="indent" > <span
class="cmr-12">For all the previous preconditioners, example programs where the sparse matrix</span>
<span
class="cmr-12">and the right-hand side are generated by discretizing a PDE with Dirichlet</span>
@ -631,7 +645,7 @@ class="cmr-12">.</span>
<!--l. 304--><p class="indent" > <a
<!--l. 312--><p class="indent" > <a
id="x7-14002r2"></a><hr class="float"><div class="float"
>
@ -639,7 +653,7 @@ class="cmr-12">.</span>
<div class="center"
>
<!--l. 318--><p class="noindent" >
<!--l. 326--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-8">
...&#x00A0;...
!&#x00A0;build&#x00A0;a&#x00A0;V-cycle&#x00A0;preconditioner&#x00A0;with&#x00A0;1&#x00A0;block-Jacobi&#x00A0;sweep&#x00A0;(with
@ -653,7 +667,7 @@ class="cmr-12">.</span>
&#x00A0;&#x00A0;call&#x00A0;P%smoothers_build(A,desc_A,info)
...&#x00A0;...
</pre>
<!--l. 333--><p class="nopar" > </div></div>
<!--l. 341--><p class="nopar" > </div></div>
<br /><div class="caption"
><span class="id">Listing 2: </span><span
class="content">setup of a multilevel preconditioner based on the default decoupled coarsening</span></div><!--tex4ht:label?: x7-14002r2 -->
@ -664,7 +678,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
<!--l. 340--><p class="indent" > <a
<!--l. 348--><p class="indent" > <a
id="x7-14003r3"></a><hr class="float"><div class="float"
>
@ -672,7 +686,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
<div class="center"
>
<!--l. 362--><p class="noindent" >
<!--l. 370--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-9">
...&#x00A0;...
!&#x00A0;build&#x00A0;a&#x00A0;W-cycle&#x00A0;preconditioner&#x00A0;with&#x00A0;2&#x00A0;hybrid&#x00A0;Gauss-Seidel&#x00A0;sweeps
@ -692,7 +706,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
&#x00A0;&#x00A0;call&#x00A0;P%smoothers_build(A,desc_A,info)
...&#x00A0;...
</pre>
<!--l. 383--><p class="nopar" > </div></div>
<!--l. 391--><p class="nopar" > </div></div>
<br /> <div class="caption"
><span class="id">Listing 3: </span><span
class="content">setup of a multilevel preconditioner based on the coupled coarsening using
@ -704,7 +718,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
<!--l. 390--><p class="indent" > <a
<!--l. 398--><p class="indent" > <a
id="x7-14004r4"></a><hr class="float"><div class="float"
>
@ -712,7 +726,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
<div class="center"
>
<!--l. 402--><p class="noindent" >
<!--l. 410--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-10">
...&#x00A0;...
!&#x00A0;set&#x00A0;RAS&#x00A0;with&#x00A0;overlap&#x00A0;2&#x00A0;and&#x00A0;ILU(0)&#x00A0;on&#x00A0;the&#x00A0;local&#x00A0;blocks
@ -723,7 +737,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
!&#x00A0;solve&#x00A0;Ax=b&#x00A0;with&#x00A0;preconditioned&#x00A0;BiCGSTAB
&#x00A0;&#x00A0;call&#x00A0;psb_krylov(&#8217;BICGSTAB&#8217;,A,P,b,x,tol,desc_A,info)
</pre>
<!--l. 414--><p class="nopar" > </div></div>
<!--l. 422--><p class="nopar" > </div></div>
<br /> <div class="caption"
><span class="id">Listing 4: </span><span
class="content">setup of a one-level Schwarz preconditioner.</span></div><!--tex4ht:label?: x7-14004r4 -->
@ -735,7 +749,7 @@ class="content">setup of a one-level Schwarz preconditioner.</span></div><!--tex
class="cmr-12">4.2 </span></span> <a
id="x7-150004.2"></a><span
class="cmr-12">GPU example</span></h4>
<!--l. 426--><p class="noindent" ><span
<!--l. 434--><p class="noindent" ><span
class="cmr-12">The code discussed here shows how to set up a program exploiting the combined GPU</span>
<span
class="cmr-12">capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the</span>
@ -743,14 +757,14 @@ class="cmr-12">capabilities of PSBLAS and AMG4PSBLAS. The code example is availa
class="cmr-12">source distribution directory </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">amg4psblas/examples/gpu</span></span></span><span
class="cmr-12">.</span>
<!--l. 431--><p class="indent" > <span
<!--l. 439--><p class="indent" > <span
class="cmr-12">First of all, we need to include the appropriate modules and declare some auxiliary</span>
<span
class="cmr-12">variables:</span>
<!--l. 433--><p class="indent" > <a
<!--l. 441--><p class="indent" > <a
id="x7-15001r5"></a><hr class="float"><div class="float"
>
@ -758,7 +772,7 @@ class="cmr-12">variables:</span>
<div class="center"
>
<!--l. 452--><p class="noindent" >
<!--l. 460--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-11">
program&#x00A0;amg_dexample_gpu
&#x00A0;&#x00A0;use&#x00A0;psb_base_mod
@ -777,7 +791,7 @@ program&#x00A0;amg_dexample_gpu
&#x00A0;
</pre>
<!--l. 471--><p class="nopar" > </div></div>
<!--l. 479--><p class="nopar" > </div></div>
<br /> <div class="caption"
><span class="id">Listing 5: </span><span
class="content">setup of a GPU-enabled test program part one.</span></div><!--tex4ht:label?: x7-15001r5 -->
@ -785,7 +799,7 @@ class="content">setup of a GPU-enabled test program part one.</span></div><!--te
</div><hr class="endfloat" />
<!--l. 478--><p class="indent" > <span
<!--l. 486--><p class="indent" > <span
class="cmr-12">In this particular example we are choosing to employ a </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">HLG</span></span></span> <span
class="cmr-12">data structure for</span>
@ -793,14 +807,14 @@ class="cmr-12">data structure for</span>
class="cmr-12">sparse matrices on GPUs; for more information please refer to the PSBLAS users&#8217;</span>
<span
class="cmr-12">guide.</span>
<!--l. 482--><p class="indent" > <span
<!--l. 490--><p class="indent" > <span
class="cmr-12">We then have to initialize the GPU environment, and pass the appropriate MOLD</span>
<span
class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217; guide).</span>
<!--l. 485--><p class="indent" > <a
<!--l. 493--><p class="indent" > <a
id="x7-15002r6"></a><hr class="float"><div class="float"
>
@ -808,7 +822,7 @@ class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217;
<div class="center"
>
<!--l. 501--><p class="noindent" >
<!--l. 509--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-12">
&#x00A0;&#x00A0;call&#x00A0;psb_init(ctxt)
&#x00A0;&#x00A0;call&#x00A0;psb_info(ctxt,iam,np)
@ -823,7 +837,7 @@ class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217;
&#x00A0;
</pre>
<!--l. 516--><p class="nopar" > </div></div>
<!--l. 524--><p class="nopar" > </div></div>
<br /> <div class="caption"
><span class="id">Listing 6: </span><span
class="content">setup of a GPU-enabled test program part two.</span></div><!--tex4ht:label?: x7-15002r6 -->
@ -831,7 +845,7 @@ class="content">setup of a GPU-enabled test program part two.</span></div><!--te
</div><hr class="endfloat" />
<!--l. 523--><p class="indent" > <span
<!--l. 531--><p class="indent" > <span
class="cmr-12">Finally, we convert the input matrix, the descriptor and the vectors to use a</span>
<span
class="cmr-12">GPU-enabled internal storage format. We then preallocate the preconditioner</span>
@ -842,7 +856,7 @@ class="cmr-12">GPU environment</span>
<!--l. 527--><p class="indent" > <a
<!--l. 535--><p class="indent" > <a
id="x7-15003r7"></a><hr class="float"><div class="float"
>
@ -850,7 +864,7 @@ class="cmr-12">GPU environment</span>
<div class="center"
>
<!--l. 557--><p class="noindent" >
<!--l. 565--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-13">
&#x00A0;&#x00A0;call&#x00A0;desc_a%cnv(mold=igmold)
&#x00A0;&#x00A0;call&#x00A0;a%cscnv(info,mold=agmold)
@ -877,7 +891,7 @@ class="cmr-12">GPU environment</span>
&#x00A0;
</pre>
<!--l. 584--><p class="nopar" > </div></div>
<!--l. 592--><p class="nopar" > </div></div>
<br /> <div class="caption"
><span class="id">Listing 7: </span><span
class="content">setup of a GPU-enabled test program part three.</span></div><!--tex4ht:label?: x7-15003r7 -->
@ -885,7 +899,7 @@ class="content">setup of a GPU-enabled test program part three.</span></div><!--
</div><hr class="endfloat" />
<!--l. 592--><p class="indent" > <span
<!--l. 600--><p class="indent" > <span
class="cmr-12">It is very important to employ smoothers and coarsest solvers that are suited to the</span>
<span
class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that</span>
@ -893,30 +907,30 @@ class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kern
class="cmr-12">satisfy this constraint include:</span>
<ul class="itemize1">
<li class="itemize">
<!--l. 596--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
<!--l. 604--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">JACOBI</span></span></span>
</li>
<li class="itemize">
<!--l. 597--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
<!--l. 605--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">BJAC</span></span></span> <span
class="cmr-12">with the following methods on the local blocks:</span>
<ul class="itemize2">
<li class="itemize">
<!--l. 599--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
<!--l. 607--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">INVK</span></span></span>
</li>
<li class="itemize">
<!--l. 600--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
<!--l. 608--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">INVT</span></span></span>
</li>
<li class="itemize">
<!--l. 601--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
<!--l. 609--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">AINV</span></span></span></li></ul>
</li>
<li class="itemize">
<!--l. 603--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
<!--l. 611--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">POLY</span></span></span></li></ul>
<!--l. 605--><p class="noindent" ><span
<!--l. 613--><p class="noindent" ><span
class="cmr-12">and their </span><span
class="cmmi-12">&#x2113;</span><sub><span
class="cmr-8">1</span></sub> <span

File diff suppressed because it is too large Load Diff

@ -64,6 +64,10 @@ class="cmr-12">.</span>
<!--l. 148--><p class="indent" >

@ -38,46 +38,47 @@ class="cmr-12">AMG4PSBLAS is freely distributable under the following copyright
<pre class="verbatim" id="verbatim-15">
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;AMG4PSBLAS&#x00A0;&#x00A0;version&#x00A0;1.0
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Algebraic&#x00A0;MultiGrid&#x00A0;Preconditioners&#x00A0;Package
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;based&#x00A0;on&#x00A0;PSBLAS&#x00A0;(Parallel&#x00A0;Sparse&#x00A0;BLAS&#x00A0;version&#x00A0;3.7)
&#x00A0;&#x00A0;(C)&#x00A0;Copyright&#x00A0;2021
&#x00A0;&#x00A0;Pasqua&#x00A0;D&#8217;Ambra&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;IAC-CNR,&#x00A0;IT
&#x00A0;&#x00A0;Fabio&#x00A0;Durastante&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;University&#x00A0;of&#x00A0;Pisa&#x00A0;and&#x00A0;IAC-CNR,&#x00A0;IT
&#x00A0;&#x00A0;Salvatore&#x00A0;Filippone&#x00A0;&#x00A0;&#x00A0;&#x00A0;University&#x00A0;of&#x00A0;Rome&#x00A0;Tor-Vergata&#x00A0;and&#x00A0;IAC-CNR,&#x00A0;IT
&#x00A0;&#x00A0;Redistribution&#x00A0;and&#x00A0;use&#x00A0;in&#x00A0;source&#x00A0;and&#x00A0;binary&#x00A0;forms,&#x00A0;with&#x00A0;or&#x00A0;without
&#x00A0;&#x00A0;modification,&#x00A0;are&#x00A0;permitted&#x00A0;provided&#x00A0;that&#x00A0;the&#x00A0;following&#x00A0;conditions
&#x00A0;&#x00A0;are&#x00A0;met:
&#x00A0;&#x00A0;&#x00A0;&#x00A0;1.&#x00A0;Redistributions&#x00A0;of&#x00A0;source&#x00A0;code&#x00A0;must&#x00A0;retain&#x00A0;the&#x00A0;above&#x00A0;copyright
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer.
&#x00A0;&#x00A0;&#x00A0;&#x00A0;2.&#x00A0;Redistributions&#x00A0;in&#x00A0;binary&#x00A0;form&#x00A0;must&#x00A0;reproduce&#x00A0;the&#x00A0;above&#x00A0;copyright
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions,&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer&#x00A0;in&#x00A0;the
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;documentation&#x00A0;and/or&#x00A0;other&#x00A0;materials&#x00A0;provided&#x00A0;with&#x00A0;the&#x00A0;distribution.
&#x00A0;&#x00A0;&#x00A0;&#x00A0;3.&#x00A0;The&#x00A0;name&#x00A0;of&#x00A0;the&#x00A0;MLD2P4&#x00A0;group&#x00A0;or&#x00A0;the&#x00A0;names&#x00A0;of&#x00A0;its&#x00A0;contributors&#x00A0;may
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;not&#x00A0;be&#x00A0;used&#x00A0;to&#x00A0;endorse&#x00A0;or&#x00A0;promote&#x00A0;products&#x00A0;derived&#x00A0;from&#x00A0;this
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;software&#x00A0;without&#x00A0;specific&#x00A0;written&#x00A0;permission.
&#x00A0;&#x00A0;THIS&#x00A0;SOFTWARE&#x00A0;IS&#x00A0;PROVIDED&#x00A0;BY&#x00A0;THE&#x00A0;COPYRIGHT&#x00A0;HOLDERS&#x00A0;AND&#x00A0;CONTRIBUTORS
&#x00A0;&#x00A0;&#8216;&#8216;AS&#x00A0;IS&#8217;&#8217;&#x00A0;AND&#x00A0;ANY&#x00A0;EXPRESS&#x00A0;OR&#x00A0;IMPLIED&#x00A0;WARRANTIES,&#x00A0;INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED
&#x00A0;&#x00A0;TO,&#x00A0;THE&#x00A0;IMPLIED&#x00A0;WARRANTIES&#x00A0;OF&#x00A0;MERCHANTABILITY&#x00A0;AND&#x00A0;FITNESS&#x00A0;FOR&#x00A0;A&#x00A0;PARTICULAR
&#x00A0;&#x00A0;PURPOSE&#x00A0;ARE&#x00A0;DISCLAIMED.&#x00A0;IN&#x00A0;NO&#x00A0;EVENT&#x00A0;SHALL&#x00A0;THE&#x00A0;MLD2P4&#x00A0;GROUP&#x00A0;OR&#x00A0;ITS&#x00A0;CONTRIBUTORS
&#x00A0;&#x00A0;BE&#x00A0;LIABLE&#x00A0;FOR&#x00A0;ANY&#x00A0;DIRECT,&#x00A0;INDIRECT,&#x00A0;INCIDENTAL,&#x00A0;SPECIAL,&#x00A0;EXEMPLARY,&#x00A0;OR
&#x00A0;&#x00A0;CONSEQUENTIAL&#x00A0;DAMAGES&#x00A0;(INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED&#x00A0;TO,&#x00A0;PROCUREMENT&#x00A0;OF
&#x00A0;&#x00A0;SUBSTITUTE&#x00A0;GOODS&#x00A0;OR&#x00A0;SERVICES;&#x00A0;LOSS&#x00A0;OF&#x00A0;USE,&#x00A0;DATA,&#x00A0;OR&#x00A0;PROFITS;&#x00A0;OR&#x00A0;BUSINESS
&#x00A0;&#x00A0;INTERRUPTION)&#x00A0;HOWEVER&#x00A0;CAUSED&#x00A0;AND&#x00A0;ON&#x00A0;ANY&#x00A0;THEORY&#x00A0;OF&#x00A0;LIABILITY,&#x00A0;WHETHER&#x00A0;IN
&#x00A0;&#x00A0;CONTRACT,&#x00A0;STRICT&#x00A0;LIABILITY,&#x00A0;OR&#x00A0;TORT&#x00A0;(INCLUDING&#x00A0;NEGLIGENCE&#x00A0;OR&#x00A0;OTHERWISE)
&#x00A0;&#x00A0;ARISING&#x00A0;IN&#x00A0;ANY&#x00A0;WAY&#x00A0;OUT&#x00A0;OF&#x00A0;THE&#x00A0;USE&#x00A0;OF&#x00A0;THIS&#x00A0;SOFTWARE,&#x00A0;EVEN&#x00A0;IF&#x00A0;ADVISED&#x00A0;OF&#x00A0;THE
&#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE.
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;AMG4PSBLAS&#x00A0;version&#x00A0;1.2
&#x00A0;&#x00A0;&#x00A0;&#x00A0;Algebraic&#x00A0;Multigrid&#x00A0;Package
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;based&#x00A0;on&#x00A0;PSBLAS&#x00A0;(Parallel&#x00A0;Sparse&#x00A0;BLAS&#x00A0;version&#x00A0;3.9)
&#x00A0;&#x00A0;&#x00A0;&#x00A0;(C)&#x00A0;Copyright&#x00A0;2025
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Salvatore&#x00A0;Filippone
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Pasqua&#x00A0;D&#8217;Ambra
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Fabio&#x00A0;Durastante
&#x00A0;&#x00A0;&#x00A0;&#x00A0;Redistribution&#x00A0;and&#x00A0;use&#x00A0;in&#x00A0;source&#x00A0;and&#x00A0;binary&#x00A0;forms,&#x00A0;with&#x00A0;or&#x00A0;without
&#x00A0;&#x00A0;&#x00A0;&#x00A0;modification,&#x00A0;are&#x00A0;permitted&#x00A0;provided&#x00A0;that&#x00A0;the&#x00A0;following&#x00A0;conditions
&#x00A0;&#x00A0;&#x00A0;&#x00A0;are&#x00A0;met:
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;1.&#x00A0;Redistributions&#x00A0;of&#x00A0;source&#x00A0;code&#x00A0;must&#x00A0;retain&#x00A0;the&#x00A0;above&#x00A0;copyright
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer.
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;2.&#x00A0;Redistributions&#x00A0;in&#x00A0;binary&#x00A0;form&#x00A0;must&#x00A0;reproduce&#x00A0;the&#x00A0;above&#x00A0;copyright
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions,&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer&#x00A0;in&#x00A0;the
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;documentation&#x00A0;and/or&#x00A0;other&#x00A0;materials&#x00A0;provided&#x00A0;with&#x00A0;the&#x00A0;distribution.
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;3.&#x00A0;The&#x00A0;name&#x00A0;of&#x00A0;the&#x00A0;AMG4PSBLAS&#x00A0;group&#x00A0;or&#x00A0;the&#x00A0;names&#x00A0;of&#x00A0;its&#x00A0;contributors&#x00A0;may
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;not&#x00A0;be&#x00A0;used&#x00A0;to&#x00A0;endorse&#x00A0;or&#x00A0;promote&#x00A0;products&#x00A0;derived&#x00A0;from&#x00A0;this
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;software&#x00A0;without&#x00A0;specific&#x00A0;written&#x00A0;permission.
&#x00A0;&#x00A0;&#x00A0;&#x00A0;THIS&#x00A0;SOFTWARE&#x00A0;IS&#x00A0;PROVIDED&#x00A0;BY&#x00A0;THE&#x00A0;COPYRIGHT&#x00A0;HOLDERS&#x00A0;AND&#x00A0;CONTRIBUTORS
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#8216;&#8216;AS&#x00A0;IS&#8217;&#8217;&#x00A0;AND&#x00A0;ANY&#x00A0;EXPRESS&#x00A0;OR&#x00A0;IMPLIED&#x00A0;WARRANTIES,&#x00A0;INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED
&#x00A0;&#x00A0;&#x00A0;&#x00A0;TO,&#x00A0;THE&#x00A0;IMPLIED&#x00A0;WARRANTIES&#x00A0;OF&#x00A0;MERCHANTABILITY&#x00A0;AND&#x00A0;FITNESS&#x00A0;FOR&#x00A0;A&#x00A0;PARTICULAR
&#x00A0;&#x00A0;&#x00A0;&#x00A0;PURPOSE&#x00A0;ARE&#x00A0;DISCLAIMED.&#x00A0;IN&#x00A0;NO&#x00A0;EVENT&#x00A0;SHALL&#x00A0;THE&#x00A0;AMG4PSBLAS&#x00A0;GROUP&#x00A0;OR&#x00A0;ITS&#x00A0;CONTRIBUTORS
&#x00A0;&#x00A0;&#x00A0;&#x00A0;BE&#x00A0;LIABLE&#x00A0;FOR&#x00A0;ANY&#x00A0;DIRECT,&#x00A0;INDIRECT,&#x00A0;INCIDENTAL,&#x00A0;SPECIAL,&#x00A0;EXEMPLARY,&#x00A0;OR
&#x00A0;&#x00A0;&#x00A0;&#x00A0;CONSEQUENTIAL&#x00A0;DAMAGES&#x00A0;(INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED&#x00A0;TO,&#x00A0;PROCUREMENT&#x00A0;OF
&#x00A0;&#x00A0;&#x00A0;&#x00A0;SUBSTITUTE&#x00A0;GOODS&#x00A0;OR&#x00A0;SERVICES;&#x00A0;LOSS&#x00A0;OF&#x00A0;USE,&#x00A0;DATA,&#x00A0;OR&#x00A0;PROFITS;&#x00A0;OR&#x00A0;BUSINESS
&#x00A0;&#x00A0;&#x00A0;&#x00A0;INTERRUPTION)&#x00A0;HOWEVER&#x00A0;CAUSED&#x00A0;AND&#x00A0;ON&#x00A0;ANY&#x00A0;THEORY&#x00A0;OF&#x00A0;LIABILITY,&#x00A0;WHETHER&#x00A0;IN
&#x00A0;&#x00A0;&#x00A0;&#x00A0;CONTRACT,&#x00A0;STRICT&#x00A0;LIABILITY,&#x00A0;OR&#x00A0;TORT&#x00A0;(INCLUDING&#x00A0;NEGLIGENCE&#x00A0;OR&#x00A0;OTHERWISE)
&#x00A0;&#x00A0;&#x00A0;&#x00A0;ARISING&#x00A0;IN&#x00A0;ANY&#x00A0;WAY&#x00A0;OUT&#x00A0;OF&#x00A0;THE&#x00A0;USE&#x00A0;OF&#x00A0;THIS&#x00A0;SOFTWARE,&#x00A0;EVEN&#x00A0;IF&#x00A0;ADVISED&#x00A0;OF&#x00A0;THE
&#x00A0;&#x00A0;&#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE.
</pre>
<!--l. 44--><p class="nopar" >
<!--l. 45--><p class="nopar" >
<!--l. 47--><p class="indent" > <span
<!--l. 48--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS is an evolution of MLD2P4, whose license we reproduce here to</span>
<span
class="cmr-12">abide by its terms:</span>
@ -123,7 +124,7 @@ class="cmr-12">abide by its terms:</span>
&#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE.
</pre>
<!--l. 87--><p class="nopar" > <span
<!--l. 88--><p class="nopar" > <span
class="cmr-12">AMG4PSBLAS is distributed together with (a small part of) the graph-matching</span>
@ -183,7 +184,7 @@ class="cmr-12">here.</span>
//
//&#x00A0;************************************************************************
</pre>
<!--l. 135--><p class="nopar" >
<!--l. 136--><p class="nopar" >

@ -4,30 +4,42 @@
\fi
\textsc{AMG4PSBLAS (Algebraic MultiGrid Preconditioners Package
based on PSBLAS}) is a package of parallel algebraic multilevel preconditioners included in the PSCToolkit (Parallel Sparse Computation Toolkit) software framework.
It is a progress of a software development project started in 2007, named MLD2P4, which originally implemented a
multilevel version of some domain decomposition preconditioners of additive-Schwarz type, and was based on a parallel decoupled version of the well known smoothed
based on PSBLAS}) is a package of parallel algebraic multilevel
preconditioners included in the PSCToolkit (Parallel Sparse
Computation Toolkit) software framework.
It is an evolutiuon of a software development project started in 2007,
named MLD2P4, which originally implemented a
multilevel version of some domain decomposition preconditioners of
additive-Schwarz type, and was based on a parallel decoupled version
of the well known smoothed
aggregation method to generate the multilevel hierarchy of coarser matrices.
In the last years, within the context of the EU-H2020 EoCoE project (Energy Oriented Center of Excellence), the package was extended for including new algorithms and
functionalities for the setup and application new AMG preconditioners with the final aims of improving efficiency and scalability when tens of thousands cores are
used, and of boosting reliability in dealing with general symmetric positive definite linear systems.
Due to the significant number of changes and the increase in scope, we decided to rename the package as AMG4PSBLAS.
In the last few years the package was extended for
including new algorithms and
functionalities for the setup and application new AMG preconditioners
with the final aims of improving efficiency and scalability when tens
of thousands cores are used, and of boosting reliability in dealing
with general symmetric positive definite linear systems; these
developments have been supported in the context of the EU-H2020 EoCoE
project (Energy Oriented Center of Excellence).
Due to the significant number of changes and the increase in scope, we
decided to rename the package as AMG4PSBLAS.
AMG4PSBLAS has been designed to provide scalable and easy-to-use preconditioners
in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra Subprograms)
computational framework and can be used in conjuction with the Krylov solvers
available in this framework.
AMG4PSBLAS has been designed to provide scalable and easy-to-use
preconditioners in the context of the PSBLAS (Parallel Sparse Basic
Linear Algebra Subprograms) computational framework and can be used in
conjuction with the Krylov solvers available in this framework.
Our package is based on a completely algebraic approach; therefore
users level interfaces assume that the system matrix and
preconditioners are represented as PSBLAS distributed sparse matrices.
AMG4PSBLAS enables the user to easily specify different
features of an algebraic multilevel preconditioner, thus allowing to experiment
with different preconditioners for the problem and parallel computers at hand.
features of an algebraic multilevel preconditioner, thus allowing to
experiment with different preconditioners for the problem and parallel
computers at hand.
The package employs object-oriented design techniques in
Fortran~2003, with interfaces to additional third party libraries
Fortran~2003, with interfaces to additional third party libraries
such as MUMPS, UMFPACK, SuperLU, and SuperLU\_Dist, which
can be exploited in building multilevel preconditioners. The parallel
can be exploited in building multilevel preconditioners. The parallel
implementation is based on a Single Program Multiple Data (SPMD)
paradigm; the inter-process communication is based on MPI and
is managed mainly through PSBLAS.

@ -13,19 +13,20 @@ must support the Fortran~2003 standard plus the extension \verb|MOLD=|
feature, which enhances the usability of \verb|ALLOCATE|.
Most Fortran compilers provide this feature; in particular, this is
supported by the GNU Fortran compiler, for which we
recommend to use at least version 4.8.
recommend to use at least version 12.
The software defines data types and interfaces for
real and complex data, in both single and double precision.
Building AMG4PSBLAS requires some base libraries (see
Section~\ref{sec:prerequisites}); interfaces to optional third-party
Section~\ref{sec:prerequisites}); interfaces to optional third-party
libraries, which extend the functionalities of AMG4PSBLAS (see
Section~\ref{sec:third-party}), are also available. A number of Linux
distributions (e.g., Ubuntu, Fedora, CentOS) provide precompiled
Section~\ref{sec:third-party}), are also available. A number of Linux
distributions (e.g., Ubuntu, Fedora, CentOS) provide precompiled
packages for the prerequisite and optional software. In many cases
these packages are split between a runtime part and a ``developer''
part; in order to build AMG4PSBLAS you need both. A description of the
base and optional software used by AMG4PSBLAS is given in the next sections.
base and optional software used by AMG4PSBLAS is given in the next
sections.
\subsection{Prerequisites\label{sec:prerequisites}}
@ -157,17 +158,17 @@ The full set of options may be looked at by issuing the command
\else
\lstinputlisting{../configureout.txt}
\fi
For instance, if a user has built and installed PSBLAS 3.7 under the
For instance, if a user has built and installed PSBLAS 3.9 under the
\verb|/opt| directory and is
using the SuiteSparse package (which includes UMFPACK), then AMG4PSBLAS
might be configured with:
\ifpdf
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{console}
./configure --with-psblas=/opt/psblas-3.7/ --with-umfpackincdir=/usr/include/suitesparse/
./configure --with-psblas=/opt/psblas-3.9/ --with-umfpackincdir=/usr/include/suitesparse/
\end{minted}
\else
\begin{verbatim}
./configure --with-psblas=/opt/psblas-3.7/ \
./configure --with-psblas=/opt/psblas-3.9/ \
--with-umfpackincdir=/usr/include/suitesparse/
\end{verbatim}
\fi

@ -94,7 +94,6 @@ Multilevel &\fortinline|'ML'| & V-cycle with one hybrid forward Gauss-
\label{tab:precinit}}
\end{center}
\end{table}
Note that the module \fortinline|amg_prec_mod|, containing the definition of the
preconditioner data type and the interfaces to the routines of AMG4PSBLAS,
must be used in any program calling such routines.
@ -110,6 +109,12 @@ a standard discretization of basic scalar elliptic PDE problems. However,
this does not necessarily correspond to the shortest execution time
on parallel~computers.
\textbf{Remark 2.} Memory allocation on GPUs is a costly operation
implying a synchronization; therefore, it is convenient to preallocate
internal preconditioner workspace with the method
\verb|prec%allocate_wrk(info)| before invoking an iterative method,
and release it upon exit with \verb|prec%deallocate_wrk(info)|.
\subsection{Examples\label{sec:examples}}
@ -140,7 +145,6 @@ for the real single precision and the complex, single and double
precision, versions are obtained with straightforward modifications of the previous
example (see Section~\ref{sec:userinterface} for details). If these versions are installed,
the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|.
\begin{listing}[tbp]
\begin{center}
\begin{minipage}{.90\textwidth}
@ -260,7 +264,6 @@ stop
\label{fig:ex1}}
\end{center}
\end{listing}
Different versions of the multilevel preconditioner can be obtained by changing
the default values of the preconditioner parameters. The code reported in
Figure~\ref{fig:ex2} shows how to set a V-cycle preconditioner
@ -272,10 +275,15 @@ with block-Jacobi and set by~\fortinline|P%init|.
Furthermore, specifying block-Jacobi as coarsest-level
solver implies that the coarsest-level matrix is distributed
among the processes.
Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using the Coarsening based on Compatible Weighted Matching, aggregates of size at most $8$ and smoothed prolongators. It applies
Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using
the Coarsening based on Compatible Weighted Matching, aggregates of
size at most $8$ and smoothed prolongators. It applies
2 hybrid Gauss-Seidel sweeps as pre- and post-smoother,
and solves the coarsest-level system with the parallel flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi preconditioner having ILU(0) on the blocks. Default parameters are used for stopping criterion of the coarsest solver.
Note that, also in this case, specifying KRM as coarsest-level
and solves the coarsest-level system with the parallel flexible
Conjugate Gradient method (KRM) coupled with the block-Jacobi
preconditioner having ILU(0) on the blocks, with default parameters
used for the coarsest solver.
Note that specifying KRM as coarsest-level
solver implies that the coarsest-level matrix is distributed
among the processes.
%It is specified that the coarsest-level

@ -6,40 +6,41 @@
AMG4PSBLAS is freely distributable under the following copyright
terms: {\small
\begin{verbatim}
AMG4PSBLAS version 1.0
Algebraic MultiGrid Preconditioners Package
based on PSBLAS (Parallel Sparse BLAS version 3.7)
(C) Copyright 2021
Pasqua D'Ambra IAC-CNR, IT
Fabio Durastante University of Pisa and IAC-CNR, IT
Salvatore Filippone University of Rome Tor-Vergata and IAC-CNR, IT
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions, and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The name of the MLD2P4 group or the names of its contributors may
not be used to endorse or promote products derived from this
software without specific written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
AMG4PSBLAS version 1.2
Algebraic Multigrid Package
based on PSBLAS (Parallel Sparse BLAS version 3.9)
(C) Copyright 2025
Salvatore Filippone
Pasqua D'Ambra
Fabio Durastante
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions, and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The name of the AMG4PSBLAS group or the names of its contributors may
not be used to endorse or promote products derived from this
software without specific written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
\end{verbatim}
}

@ -3,7 +3,9 @@
{\textsc{\ref{sec:overview} General Overview}}
The \textsc{Algebraic MultiGrid Preconditioners Package based on
PSBLAS} (\textsc{AMG\-4\-PSBLAS}) provides parallel Algebraic MultiGrid (AMG) preconditioners (see, e.g., \cite{Briggs2000,Stuben_01}),
PSBLAS} (\textsc{AMG\-4\-PSBLAS}) provides parallel Algebraic
MultiGrid (AMG) preconditioners (see, e.g.,
\cite{Briggs2000,Stuben_01}),
to be used in the iterative solution of linear systems,
\begin{equation}
Ax=b,
@ -18,12 +20,14 @@ where $A$ is a square, real or complex, sparse symmetric positive definite (s.p.
The preconditioners implemented in AMG4PSBLAS are obtained by combining
3 different types of AMG cycles with smoothers and coarsest-level
solvers. Available multigrid cycles include the V-, W-, and a version of a Krylov-type cycle
solvers. We provide a number of multigrid cycles, including the V-,
W-, and a version of a Krylov-type cycle
(K-cycle)~\cite{Briggs2000,Notay2008}; they can be
combined with Jacobi, hybrid
%\footnote{see Note 2 in Table~\ref{tab:p_coarse}, p.~28.}
forward/backward Gauss-Seidel, block-Jacobi and additive Schwarz
smoothers with various versions of local incomplete factorizations and approximate inverses
smoothers with various versions of local incomplete factorizations and
approximate inverses
on the blocks. The Jacobi, block-Jacobi and
Gauss-Seidel smoothers are also available in the $\ell_1$ version~\cite{DDF2020}.
@ -41,7 +45,8 @@ two different coarsening strategies, based on aggregation, are available:
and described in detail in~\cite{DDF2020};
\end{itemize}
Either exact or approximate solvers can be used on the coarsest-level
system. We provide interfaces to various parallel and sequential sparse LU factorizations from external
system. We provide interfaces to various parallel and sequential
sparse LU factorizations from external
packages, sequential native incomplete LU and approximate inverse factorizations,
parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solvers and
calls to preconditioned Krylov methods; all
@ -74,36 +79,41 @@ important revisions and extentions of the PSBLAS infrastructure.
The inter-process comunication required by AMG4PSBLAS is encapsulated
in the PSBLAS routines;
therefore, AMG4PSBLAS can be run on any parallel machine where PSBLAS
implementations are available. In the most recent version of PSBLAS
(release 3.7), a plug-in for GPU is included; it includes CUDA
implementations are available. The most recent version of PSBLAS
(release 3.9) includes a plug-in for GPU; it contains CUDA
versions of main vector operations and of sparse matrix-vector
multiplication, so that Krylov methods coupled with AMG4PSBLAS
preconditioners relying on Jacobi and block-Jacobi smoothers with
sparse approximate inverses on the blocks can be efficiently executed
preconditioners relying on Jacobi and block-Jacobi smoothers with
sparse approximate inverses on the blocks can be efficiently executed
on cluster of GPUs.
AMG4PSBLAS has a layered and modular software architecture where three main layers can be
identified. The lower layer consists of the PSBLAS kernels, the middle one implements
the construction and application phases of the preconditioners, and the upper one
provides a uniform interface to all the preconditioners.
This architecture allows for different levels of use of the package:
few black-box routines at the upper layer allow all users to easily
build and apply any preconditioner available in AMG4PSBLAS;
facilities are also available allowing expert users to extend the set of smoothers
and solvers for building new versions of the preconditioners (see
Section~\ref{sec:adding}).
AMG4PSBLAS has a layered and modular software architecture where three
main layers can be identified. The lower layer consists of the PSBLAS
kernels, the middle one implements the construction and application
phases of the preconditioners, and the upper one provides a uniform
interface to all the preconditioners. This architecture allows for
different levels of use of the package: few black-box routines at the
upper layer allow all users to easily build and apply any
preconditioner available in AMG4PSBLAS; facilities are also available
allowing expert users to extend the set of smoothers and solvers for
building new versions of the preconditioners (see
Section~\ref{sec:adding}).
This guide is organized as follows. General information on the distribution of the source
code is reported in Section~\ref{sec:distribution}, while details on the configuration
and installation of the package are given in Section~\ref{sec:building}. The basics for building and applying the
preconditioners with the Krylov solvers implemented in PSBLAS are reported
in~Section~\ref{sec:started}, where the Fortran codes of a few sample programs
are also shown. A reference guide for the user interface routines is provided
in Section~\ref{sec:userinterface}. Information on the extension of the package
through the addition of new smoothers and solvers is reported in Section~\ref{sec:adding}.
The error handling mechanism used by the package
is briefly described in Section~\ref{sec:errors}. The copyright terms concerning the
distribution and modification of AMG4PSBLAS are reported in Appendix~\ref{sec:license}.
This guide is organized as follows. General information on the
distribution of the source code is reported in
Section~\ref{sec:distribution}, while details on the configuration and
installation of the package are given in
Section~\ref{sec:building}. The basics for building and applying the
preconditioners with the Krylov solvers implemented in PSBLAS are
reported in~Section~\ref{sec:started}, where the Fortran codes of a
few sample programs are also shown. A reference guide for the user
interface routines is provided in
Section~\ref{sec:userinterface}. Information on the extension of the
package through the addition of new smoothers and solvers is reported
in Section~\ref{sec:adding}. The error handling mechanism used by the
package is briefly described in Section~\ref{sec:errors}. The
copyright terms concerning the distribution and modification of
AMG4PSBLAS are reported in Appendix~\ref{sec:license}.
%%% Local Variables:
%%% mode: latex

@ -154,7 +154,7 @@ Preconditioners Package based on PSBLAS}
\flushright
\large Software version: 1.2\\
%\todaym
\large December 31st, 2025
\large December 23rd, 2025
\end{minipage}}
%\addtolength{\textwidth}{\centeroffset}
\vspace{\stretch{2}}

@ -114,7 +114,7 @@
%\today
Software version: 1.2\\
%\today
December 31st, 2025
December 23rd, 2025
\clearpage
\ \\
\thispagestyle{empty}

@ -160,7 +160,7 @@ the smoothers. However, for simplicity, shortcuts are
provided to set all versions of point-Jacobi, hybrid (forward) Gauss-Seidel, and
hybrid backward Gauss-Seidel, i.e., the previous smoothers can be defined
just by setting \fortinline|'SMOOTHER_TYPE'| to certain specific
values (see Tables~\ref{tab:p_smoother}), without the need to set
values (see Table~\ref{tab:p_smoother}), without the need to set
\fortinline|'SUB_SOLVE'| as well.
The smoother and solver objects are arranged in a
@ -182,47 +182,50 @@ the polynomial used. Consequently, the \fortinline|'SMOOTHER_SWEEPS'| option is
the \fortinline|'POLY_DEGREE'| option. This smoother is paired with a base smoother
object, whose iterations are accelerated using the specified polynomial smoothing technique.
By default, the $\ell_1$-Jacobi smoother serves as the base smoother, offering theoretical
guarantees on the resulting convergence factor~\cite{DDFMT2024,LOTTES}. Alternative combinations
are experimental and lack established guarantees.\\
guarantees on the resulting convergence
factor~\cite{DDFMT2024,LOTTES}. Alternative combinations are
experimental.\\
% and lack established guarantees.\\
\textbf{Remark 4.} Many of the coarsest-level solvers apply to a
specific coarsest-matrix layout;
therefore, setting the solver after the layout may change the layout
to either distributed or replicated.
Similarly, setting the layout after the solver may change the solver.
More precisely, UMFPACK and SuperLU require the coarsest-level
matrix to be replicated, while SuperLU\_Dist and KRM require it to be distributed.
In these cases, setting the coarsest-level solver implies that
the layout is redefined according to the solver, ovverriding any
specific coarsest-matrix layout; therefore, setting the solver after
the layout may change the layout to either distributed or replicated,
and similarly, setting the layout after the solver may change the
solver. More specifically, UMFPACK and SuperLU require the coarsest-level
matrix to be replicated, while SuperLU\_Dist and KRM require it to be
distributed; therefore, setting the coarsest-level solver implies
that the layout is redefined according to the solver, ovverriding any
previous settings. MUMPS, point-Jacobi,
hybrid Gauss-Seidel and block-Jacobi can be applied to
replicated and distributed matrices, thus their choice
does not modify any previously specified layout.
It is worth noting that, when the matrix is replicated,
the point-Jacobi, hybrid Gauss-Seidel and block-Jacobi solvers and their $\ell_1-$ versions
reduce to the corresponding local solver objects (see Remark~2).
For the point-Jacobi and Gauss-Seidel solvers, these objects
correspond to a \emph{single} point-Jacobi sweep and a \emph{single}
Gauss-Seidel sweep, respectively, which are very poor solvers.
On the other hand, the distributed layout can be used with any solver
but UMFPACK and SuperLU; therefore, if any of these two solvers has already
been selected, the coarsest-level solver is changed to block-Jacobi,
with the previously chosen solver applied to the local blocks.
Likewise, the replicated layout can be used with any solver but SuperLu\_Dist and KRM;
therefore, if SuperLu\_Dist or KRM have been previously set, the coarsest-level
solver is changed to the default sequential solver.
In a parallel setting with many cores, we suggest to the users to change the default
coarsest solver for using the KRM choice, i.e. a parallel distributed iterative solution of the
coarsest system based on Krylov methods.
\textbf{Remark 4.} The argument \fortinline|idx| can be used to allow finer
control for those solvers; for instance, by specifying the keyword
\fortinline|'MUMPS_IPAR_ENTRY'| and an appropriate value for \fortinline|idx|, it is
possible to set any entry in the MUMPS integer control array.
See also Sec.~\ref{sec:adding}.
the point-Jacobi, hybrid Gauss-Seidel and block-Jacobi solvers and
their $\ell_1-$ versions reduce to the corresponding local solver
objects (see Remark~2). For the point-Jacobi and Gauss-Seidel solvers,
these objects correspond to a \emph{single} point-Jacobi sweep and a
\emph{single} Gauss-Seidel sweep, respectively, which are very poor
solvers.
On the other hand, the distributed layout can be used with any solver
except and SuperLU; therefore, if any of these two solvers has
already been selected, the coarsest-level solver is changed to
block-Jacobi, with the previously chosen solver applied to the local
blocks. Likewise, the replicated layout can be used with any solver
but SuperLu\_Dist and KRM; therefore, if SuperLu\_Dist or KRM have
been previously set, the coarsest-level solver is changed to the
default sequential solver.
In a parallel setting with many cores, we suggest to the users to
change the default coarsest solver for using the KRM choice, i.e. a
parallel distributed iterative solution of the coarsest system based
on Krylov methods.
\textbf{Remark 4.} The argument \fortinline|idx| can be used to allow
finer control for those solvers; for instance, by specifying the
keyword \fortinline|'MUMPS_IPAR_ENTRY'| and an appropriate value for
\fortinline|idx|, it is possible to set any entry in the MUMPS integer
control array. See also Sec.~\ref{sec:adding}.
%The \verb|what,val| pairs described here are those of the predefined
%moother/solver objects; newly developed solvers may define new pairs
%according to their needs.

Loading…
Cancel
Save