Update documentation for release

5 months ago · 06f8f83114
parent 43a75e3171
commit 06f8f83114
18 changed files with 975 additions and 922 deletions
--- a/docs/amg4psblas_1.2-guide.pdf
+++ b/docs/amg4psblas_1.2-guide.pdf
--- a/docs/html/index.html
+++ b/docs/html/index.html
@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR</span><br
 class="newline" /> <span 
 class="cmr-12">Software version: 1.2</span><br 
 class="newline" /><span 
-class="cmr-12">December 31st, 2025</span>
+class="cmr-12">December 23rd, 2025</span>
                                                                               

                                                                               
--- a/docs/html/userhtml.html
+++ b/docs/html/userhtml.html
@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR</span><br
 class="newline" /> <span 
 class="cmr-12">Software version: 1.2</span><br 
 class="newline" /><span 
-class="cmr-12">December 31st, 2025</span>
+class="cmr-12">December 23rd, 2025</span>
                                                                               

                                                                               
--- a/docs/html/userhtmlli1.html
+++ b/docs/html/userhtmlli1.html
@ -72,32 +72,34 @@ class="small-caps">n</span></span>
 class="cmcsc-10x-x-120">PSBLAS</span><span 
 class="cmr-12">) is a package of parallel algebraic multilevel preconditioners included in the</span>
 <span 
-class="cmr-12">PSCToolkit (Parallel Sparse Computation Toolkit) software framework. It is a progress</span>
+class="cmr-12">PSCToolkit (Parallel Sparse Computation Toolkit) software framework. It</span>
 <span 
-class="cmr-12">of a software development project started in 2007, named MLD2P4, which originally</span>
+class="cmr-12">is an evolutiuon of a software development project started in 2007, named</span>
 <span 
-class="cmr-12">implemented a multilevel version of some domain decomposition preconditioners of</span>
+class="cmr-12">MLD2P4, which originally implemented a multilevel version of some domain</span>
 <span 
-class="cmr-12">additive-Schwarz type, and was based on a parallel decoupled version of the well known</span>
+class="cmr-12">decomposition preconditioners of additive-Schwarz type, and was based on a parallel</span>
 <span 
-class="cmr-12">smoothed aggregation method to generate the multilevel hierarchy of coarser</span>
+class="cmr-12">decoupled version of the well known smoothed aggregation method to generate the</span>
 <span 
-class="cmr-12">matrices. In the last years, within the context of the EU-H2020 EoCoE project</span>
+class="cmr-12">multilevel hierarchy of coarser matrices. In the last few years the package</span>
 <span 
-class="cmr-12">(Energy Oriented Center of Excellence), the package was extended for including</span>
+class="cmr-12">was extended for including new algorithms and functionalities for the setup</span>
 <span 
-class="cmr-12">new algorithms and functionalities for the setup and application new AMG</span>
+class="cmr-12">and application new AMG preconditioners with the final aims of improving</span>
 <span 
-class="cmr-12">preconditioners with the final aims of improving efficiency and scalability when tens of</span>
+class="cmr-12">efficiency and scalability when tens of thousands cores are used, and of boosting</span>
 <span 
-class="cmr-12">thousands cores are used, and of boosting reliability in dealing with general</span>
+class="cmr-12">reliability in dealing with general symmetric positive definite linear systems; these</span>
 <span 
-class="cmr-12">symmetric positive definite linear systems. Due to the significant number</span>
+class="cmr-12">developments have been supported in the context of the EU-H2020 EoCoE</span>
+<span 
+class="cmr-12">project (Energy Oriented Center of Excellence). Due to the significant number</span>
 <span 
 class="cmr-12">of changes and the increase in scope, we decided to rename the package as</span>
 <span 
 class="cmr-12">AMG4PSBLAS.</span>
-<!--l. 16--><p class="indent" >   <span 
+<!--l. 27--><p class="indent" >   <span 
 class="cmr-12">AMG4PSBLAS has been designed to provide scalable and easy-to-use</span>
 <span 
 class="cmr-12">preconditioners in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra</span>
@ -111,14 +113,14 @@ class="cmr-12">algebraic approach; therefore users level interfaces assume that
 class="cmr-12">and preconditioners are represented as PSBLAS distributed sparse matrices.</span>
 <span 
 class="cmr-12">AMG4PSBLAS enables the user to easily specify different features of an algebraic</span>
-<span 
-class="cmr-12">multilevel preconditioner, thus allowing to experiment with different preconditioners for</span>
                                                                               

                                                                               
+<span 
+class="cmr-12">multilevel preconditioner, thus allowing to experiment with different preconditioners for</span>
 <span 
 class="cmr-12">the problem and parallel computers at hand.</span>
-<!--l. 27--><p class="indent" >   <span 
+<!--l. 39--><p class="indent" >   <span 
 class="cmr-12">The package employs object-oriented design techniques in Fortran</span><span 
 class="cmr-12">&#x00A0;2003, with</span>
 <span 
@ -132,7 +134,7 @@ class="cmr-12">parallel implementation is based on a Single Program Multiple Dat
 class="cmr-12">paradigm; the inter-process communication is based on MPI and is managed mainly</span>
 <span 
 class="cmr-12">through PSBLAS.</span>
-<!--l. 35--><p class="indent" >   <span 
+<!--l. 47--><p class="indent" >   <span 
 class="cmr-12">This guide provides a brief description of the functionalities and the user interface</span>
 <span 
 class="cmr-12">of AMG4PSBLAS.</span>
--- a/docs/html/userhtmlse1.html
+++ b/docs/html/userhtmlse1.html
@ -100,18 +100,18 @@ src="userhtml0x.png" alt="Ax  = b,
 id="x4-3001r1"></a></div>
   </td><td class="equation-label"><span 
 class="cmr-12">(1)</span></td></tr></table>
-<!--l. 11--><p class="nopar" ><span 
+<!--l. 13--><p class="nopar" ><span 
 class="cmr-12">where </span><span 
 class="cmmi-12">A </span><span 
 class="cmr-12">is a square, real or complex, sparse symmetric positive definite (s.p.d)</span>
 <span 
 class="cmr-12">matrix.</span>
-<!--l. 19--><p class="indent" >   <span 
+<!--l. 21--><p class="indent" >   <span 
 class="cmr-12">The preconditioners implemented in AMG4PSBLAS are obtained by combining 3</span>
 <span 
-class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. Available</span>
+class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. We provide a</span>
 <span 
-class="cmr-12">multigrid cycles include the V-, W-, and a version of a Krylov-type cycle</span>
+class="cmr-12">number of multigrid cycles, including the V-, W-, and a version of a Krylov-type cycle</span>
 <span 
 class="cmr-12">(K-cycle)</span><span 
 class="cmr-12">&#x00A0;</span><span class="cite"><span 
@ -140,7 +140,7 @@ href="userhtmlli3.html#XDDF2020"><span
 class="cmr-12">14</span></a><span 
 class="cmr-12">]</span></span><span 
 class="cmr-12">.</span>
-<!--l. 30--><p class="indent" >   <span 
+<!--l. 34--><p class="indent" >   <span 
 class="cmr-12">An algebraic approach is used to generate a hierarchy of coarse-level matrices and</span>
 <span 
 class="cmr-12">operators, without explicitly using any information on the geometry of the original</span>
@ -150,7 +150,7 @@ class="cmr-12">problem, e.g., the discretization of a PDE. To this end, two diff
 class="cmr-12">strategies, based on aggregation, are available:</span>
     <ul class="itemize1">
     <li class="itemize">
-     <!--l. 35--><p class="noindent" ><span 
+     <!--l. 39--><p class="noindent" ><span 
 class="cmr-12">a decoupled version of the smoothed aggregation procedure proposed in</span><span 
 class="cmr-12">&#x00A0;</span><span class="cite"><span 
 class="cmr-12">[</span><a 
@ -178,7 +178,7 @@ class="cmr-12">;</span>
                                                                               
     </li>
     <li class="itemize">
-     <!--l. 39--><p class="noindent" ><span 
+     <!--l. 43--><p class="noindent" ><span 
 class="cmr-12">a coupled, parallel implementation of the Coarsening based on Compatible</span>
     <span 
 class="cmr-12">Weighted Matching introduced in</span><span 
@ -198,7 +198,7 @@ href="userhtmlli3.html#XDDF2020"><span
 class="cmr-12">14</span></a><span 
 class="cmr-12">]</span></span><span 
 class="cmr-12">;</span></li></ul>
-<!--l. 43--><p class="noindent" ><span 
+<!--l. 47--><p class="noindent" ><span 
 class="cmr-12">Either exact or approximate solvers can be used on the coarsest-level system. We provide</span>
 <span 
 class="cmr-12">interfaces to various parallel and sequential sparse LU factorizations from external</span>
@ -210,7 +210,7 @@ class="cmr-12">parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solve
 class="cmr-12">preconditioned Krylov methods; all smoothers can be also exploited as one-level</span>
 <span 
 class="cmr-12">preconditioners.</span>
-<!--l. 50--><p class="indent" >   <span 
+<!--l. 55--><p class="indent" >   <span 
 class="cmr-12">AMG4PSBLAS is written in Fortran</span><span 
 class="cmr-12">&#x00A0;2003, following an object-oriented design</span>
 <span 
@ -225,12 +225,12 @@ class="cmr-12">Single and double precision implementations of AMG4PSBLAS are ava
 class="cmr-12">for both the real and the complex case, which can be used through a single</span>
 <span 
 class="cmr-12">interface.</span>
-<!--l. 60--><p class="indent" >   <span 
-class="cmr-12">AMG4PSBLAS has been designed to implement scalable and easy-to-use</span>
+<!--l. 65--><p class="indent" >   <span 
+class="cmr-12">AMG4PSBLAS has been designed to implement scalable and easy-to-use multilevel</span>
 <span 
-class="cmr-12">multilevel preconditioners in the context of the PSBLAS (Parallel Sparse BLAS)</span>
+class="cmr-12">preconditioners in the context of the PSBLAS (Parallel Sparse BLAS) computational</span>
 <span 
-class="cmr-12">computational framework</span><span 
+class="cmr-12">framework</span><span 
 class="cmr-12">&#x00A0;</span><span class="cite"><span 
 class="cmr-12">[</span><a 
 href="userhtmlli3.html#Xpsblas_00"><span 
@ -240,37 +240,35 @@ class="cmr-12">&#x00A0;</span><a
 href="userhtmlli3.html#XPSBLAS3"><span 
 class="cmr-12">22</span></a><span 
 class="cmr-12">]</span></span><span 
-class="cmr-12">. PSBLAS provides basic linear algebra operators</span>
+class="cmr-12">. PSBLAS provides basic linear algebra operators and data</span>
 <span 
-class="cmr-12">and data management facilities for distributed sparse matrices, kernels for</span>
+class="cmr-12">management facilities for distributed sparse matrices, kernels for sequential incomplete</span>
 <span 
-class="cmr-12">sequential incomplete factorizations needed for the parallel block-Jacobi and</span>
+class="cmr-12">factorizations needed for the parallel block-Jacobi and additive Schwarz smoothers, and</span>
 <span 
-class="cmr-12">additive Schwarz smoothers, and parallel Krylov solvers which can be used with</span>
+class="cmr-12">parallel Krylov solvers which can be used with the AMG4PSBLAS preconditioners.</span>
 <span 
-class="cmr-12">the AMG4PSBLAS preconditioners. The choice of PSBLAS has been mainly</span>
+class="cmr-12">The choice of PSBLAS has been mainly motivated by the need of having a portable</span>
 <span 
-class="cmr-12">motivated by the need of having a portable and efficient software infrastructure</span>
+class="cmr-12">and efficient software infrastructure implementing &#8220;de facto&#8221; standard parallel sparse</span>
 <span 
-class="cmr-12">implementing &#8220;de facto&#8221; standard parallel sparse linear algebra kernels, to</span>
+class="cmr-12">linear algebra kernels, to pursue goals such as performance, portability, modularity</span>
 <span 
-class="cmr-12">pursue goals such as performance, portability, modularity ed extensibility</span>
+class="cmr-12">ed extensibility in the development of the preconditioner package. On the</span>
 <span 
-class="cmr-12">in the development of the preconditioner package. On the other hand, the</span>
+class="cmr-12">other hand, the implementation of AMG4PSBLAS, which was driven by the</span>
 <span 
-class="cmr-12">implementation of AMG4PSBLAS, which was driven by the need to face the exascale</span>
+class="cmr-12">need to face the exascale challenge, has led to some important revisions and</span>
 <span 
-class="cmr-12">challenge, has led to some important revisions and extentions of the PSBLAS</span>
+class="cmr-12">extentions of the PSBLAS infrastructure. The inter-process comunication</span>
 <span 
-class="cmr-12">infrastructure. The inter-process comunication required by AMG4PSBLAS</span>
+class="cmr-12">required by AMG4PSBLAS is encapsulated in the PSBLAS routines; therefore,</span>
 <span 
-class="cmr-12">is encapsulated in the PSBLAS routines; therefore, AMG4PSBLAS can be</span>
+class="cmr-12">AMG4PSBLAS can be run on any parallel machine where PSBLAS implementations</span>
 <span 
-class="cmr-12">run on any parallel machine where PSBLAS implementations are available.</span>
+class="cmr-12">are available. The most recent version of PSBLAS (release 3.9) includes a plug-in for</span>
 <span 
-class="cmr-12">In the most recent version of PSBLAS (release 3.7), a plug-in for GPU is</span>
-<span 
-class="cmr-12">included; it includes CUDA versions of main vector operations and of sparse</span>
+class="cmr-12">GPU; it contains CUDA versions of main vector operations and of sparse</span>
 <span 
 class="cmr-12">matrix-vector multiplication, so that Krylov methods coupled with AMG4PSBLAS</span>
 <span 
@ -279,17 +277,17 @@ class="cmr-12">preconditioners relying on Jacobi and block-Jacobi smoothers with
 class="cmr-12">approximate inverses on the blocks can be efficiently executed on cluster of</span>
 <span 
 class="cmr-12">GPUs.</span>
-<!--l. 85--><p class="indent" >   <span 
+<!--l. 90--><p class="indent" >   <span 
 class="cmr-12">AMG4PSBLAS has a layered and modular software architecture where three main</span>
 <span 
 class="cmr-12">layers can be identified. The lower layer consists of the PSBLAS kernels, the middle</span>
 <span 
 class="cmr-12">one implements the construction and application phases of the preconditioners, and the</span>
+<span 
+class="cmr-12">upper one provides a uniform interface to all the preconditioners. This architecture</span>
                                                                               

                                                                               
-<span 
-class="cmr-12">upper one provides a uniform interface to all the preconditioners. This architecture</span>
 <span 
 class="cmr-12">allows for different levels of use of the package: few black-box routines at the upper</span>
 <span 
@ -304,7 +302,7 @@ class="cmr-12">&#x00A0;</span><a
 href="userhtmlse6.html#x9-310006"><span 
 class="cmr-12">6</span><!--tex4ht:ref: sec:adding --></a><span 
 class="cmr-12">).</span>
-<!--l. 96--><p class="indent" >   <span 
+<!--l. 102--><p class="indent" >   <span 
 class="cmr-12">This guide is organized as follows. General information on the distribution of the</span>
 <span 
 class="cmr-12">source code is reported in Section</span><span 
--- a/docs/html/userhtmlse3.html
+++ b/docs/html/userhtmlse3.html
@ -58,9 +58,9 @@ class="cmr-12">. Most Fortran compilers provide this feature; in particular, thi
 <span 
 class="cmr-12">supported by the GNU Fortran compiler, for which we recommend to use at least</span>
 <span 
-class="cmr-12">version 4.8. The software defines data types and interfaces for real and complex data,</span>
+class="cmr-12">version 12. The software defines data types and interfaces for real and complex data, in</span>
 <span 
-class="cmr-12">in both single and double precision.</span>
+class="cmr-12">both single and double precision.</span>
 <!--l. 20--><p class="indent" >   <span 
 class="cmr-12">Building AMG4PSBLAS requires some base libraries (see Section</span><span 
 class="cmr-12">&#x00A0;</span><a 
@ -85,19 +85,19 @@ class="cmr-12">&#8220;developer&#8221; part; in order to build AMG4PSBLAS you ne
 class="cmr-12">the base and optional software used by AMG4PSBLAS is given in the next</span>
 <span 
 class="cmr-12">sections.</span>
-<!--l. 30--><p class="noindent" >
+<!--l. 31--><p class="noindent" >
   <h4 class="subsectionHead"><span class="titlemark"><span 
 class="cmr-12">3.1   </span></span> <a 
 id="x6-80003.1"></a><span 
 class="cmr-12">Prerequisites</span></h4>
-<!--l. 32--><p class="noindent" ><span 
+<!--l. 33--><p class="noindent" ><span 
 class="cmr-12">The following base libraries are needed:</span>
     <dl class="description"><dt class="description">
-     <!--l. 34--><p class="noindent" >
+     <!--l. 35--><p class="noindent" >
 <span 
 class="cmbx-12">BLAS</span> </dt><dd 
 class="description">
-     <!--l. 34--><p class="noindent" ><span class="cite"><span 
+     <!--l. 35--><p class="noindent" ><span class="cite"><span 
 class="cmr-12">[</span><a 
 href="userhtmlli3.html#Xblas3"><span 
 class="cmr-12">18</span></a><span 
@ -152,11 +152,11 @@ class="cmr-12">for including -fPIC compilation option in the make.inc file of th
     <span 
 class="cmr-12">library.</span>
     </dd><dt class="description">
-     <!--l. 51--><p class="noindent" >
+     <!--l. 52--><p class="noindent" >
 <span 
 class="cmbx-12">MPI</span> </dt><dd 
 class="description">
-     <!--l. 51--><p class="noindent" ><span class="cite"><span 
+     <!--l. 52--><p class="noindent" ><span class="cite"><span 
 class="cmr-12">[</span><a 
 href="userhtmlli3.html#XMPI2"><span 
 class="cmr-12">25</span></a><span 
@ -169,11 +169,11 @@ class="cmr-12">A version of MPI is available on most high-performance computing<
     <span 
 class="cmr-12">systems.</span>
     </dd><dt class="description">
-     <!--l. 53--><p class="noindent" >
+     <!--l. 54--><p class="noindent" >
 <span 
 class="cmbx-12">PSBLAS</span> </dt><dd 
 class="description">
-     <!--l. 53--><p class="noindent" ><span class="cite"><span 
+     <!--l. 54--><p class="noindent" ><span class="cite"><span 
 class="cmr-12">[</span><a 
 href="userhtmlli3.html#XPSBLASGUIDE"><span 
 class="cmr-12">21</span></a><span 
@ -192,13 +192,13 @@ class="cmr-12">; version</span>
 class="cmr-12">3.9.0 (or later) is required. Indeed, all the prerequisites listed so far are also</span>
     <span 
 class="cmr-12">prerequisites of PSBLAS.</span></dd></dl>
-<!--l. 60--><p class="noindent" ><span 
+<!--l. 61--><p class="noindent" ><span 
 class="cmr-12">Please note that the four previous libraries must have Fortran interfaces compatible with</span>
 <span 
 class="cmr-12">AMG4PSBLAS; usually this means that they should all be built with the same</span>
 <span 
 class="cmr-12">compiler being used for AMG4PSBLAS.</span>
-<!--l. 64--><p class="indent" >   <span 
+<!--l. 65--><p class="indent" >   <span 
 class="cmr-12">If you want to use the PSBLAS support for NVIDIA GPUs, you will also</span>
 <span 
 class="cmr-12">need a working version of the CUDA Toolkit that is compatible with the</span>
@ -214,7 +214,7 @@ class="cmr-12">options:</span>
   <pre class="verbatim" id="verbatim-2">
 ./configure&#x00A0;--enable-cuda&#x00A0;--with-cudadir=${CUDA_HOME}&#x00A0;--with-cudacc=xx,yy,zz
 </pre>
-<!--l. 89--><p class="nopar" > <span 
+<!--l. 90--><p class="nopar" > <span 
 class="cmr-12">Previous versions required you to have the auxiliary libraries SPGPU and</span>
 <span 
 class="cmr-12">PSBLAS-EXT compiled, this is no longer necessary because they have been integrated</span>
@ -226,24 +226,24 @@ class="cmr-12">&#x00A0;</span><a
 href="userhtmlse4.html#x7-150004.2"><span 
 class="cmr-12">4.2</span><!--tex4ht:ref: sec:gpu-example --></a><span 
 class="cmr-12">.</span>
-<!--l. 96--><p class="noindent" >
+<!--l. 97--><p class="noindent" >
   <h4 class="subsectionHead"><span class="titlemark"><span 
 class="cmr-12">3.2   </span></span> <a 
 id="x6-90003.2"></a><span 
 class="cmr-12">Optional third party libraries</span></h4>
-<!--l. 98--><p class="noindent" ><span 
+<!--l. 99--><p class="noindent" ><span 
 class="cmr-12">We provide interfaces to the following third-party software libraries; note that these are</span>
 <span 
 class="cmr-12">optional, but if you enable them some defaults for multilevel preconditioners may</span>
 <span 
 class="cmr-12">change to reflect their presence.</span>
-<!--l. 102--><p class="indent" >
+<!--l. 103--><p class="indent" >
     <dl class="description"><dt class="description">
-     <!--l. 103--><p class="noindent" >
+     <!--l. 104--><p class="noindent" >
 <span 
 class="cmbx-12">UMFPACK</span> </dt><dd 
 class="description">
-     <!--l. 103--><p class="noindent" ><span class="cite"><span 
+     <!--l. 104--><p class="noindent" ><span class="cite"><span 
 class="cmr-12">[</span><a 
 href="userhtmlli3.html#XUMFPACK"><span 
 class="cmr-12">16</span></a><span 
@ -266,11 +266,11 @@ class="cmr-12">provide  the  right  path  to  the  BLAS  and  LAPACK  libraries
 class="cmtt-12">SuiteSparse_config/SuiteSparse_config.mk</span></span></span> <span 
 class="cmr-12">file.</span>
     </dd><dt class="description">
-     <!--l. 110--><p class="noindent" >
+     <!--l. 111--><p class="noindent" >
 <span 
 class="cmbx-12">MUMPS</span> </dt><dd 
 class="description">
-     <!--l. 110--><p class="noindent" ><span class="cite"><span 
+     <!--l. 111--><p class="noindent" ><span class="cite"><span 
 class="cmr-12">[</span><a 
 href="userhtmlli3.html#XMUMPS"><span 
 class="cmr-12">2</span></a><span 
@ -286,14 +286,14 @@ class="cmr-12">solution for single and double precision, real and complex data.
     <span 
 class="cmr-12">versions 4.10.0 and 5.0.1.</span>
     </dd><dt class="description">
-     <!--l. 115--><p class="noindent" >
+     <!--l. 116--><p class="noindent" >
 <span 
 class="cmbx-12">SuperLU</span> </dt><dd 
 class="description">
                                                                               

                                                                               
-     <!--l. 115--><p class="noindent" ><span class="cite"><span 
+     <!--l. 116--><p class="noindent" ><span class="cite"><span 
 class="cmr-12">[</span><a 
 href="userhtmlli3.html#XSUPERLU"><span 
 class="cmr-12">17</span></a><span 
@ -312,12 +312,12 @@ class="cmr-12">data. We tested versions 4.3 and 5.0. If you installed BLAS from
     <span 
 class="cmr-12">remember to define the BLASLIB variable in the make.inc file.</span>
     </dd><dt class="description">
-     <!--l. 121--><p class="noindent" >
+     <!--l. 122--><p class="noindent" >
 <span 
 class="cmbx-12">SuperLU</span><span 
 class="cmbx-12">_Dist</span> </dt><dd 
 class="description">
-     <!--l. 121--><p class="noindent" ><span class="cite"><span 
+     <!--l. 122--><p class="noindent" ><span class="cite"><span 
 class="cmr-12">[</span><a 
 href="userhtmlli3.html#XSUPERLUDIST"><span 
 class="cmr-12">28</span></a><span 
@ -341,18 +341,18 @@ class="cmr-12">parallel graph partitioning and fill-reducing matrix ordering, av
 href="glaros.dtc.umn.edu/gkhome/metis/parmetis/overview" class="url" ><span 
 class="cmtt-12">glaros.dtc.umn.edu/gkhome/metis/parmetis/overview</span></a><span 
 class="cmr-12">.</span></dd></dl>
-<!--l. 133--><p class="noindent" >
+<!--l. 134--><p class="noindent" >
   <h4 class="subsectionHead"><span class="titlemark"><span 
 class="cmr-12">3.3   </span></span> <a 
 id="x6-100003.3"></a><span 
 class="cmr-12">Configuration options</span></h4>
-<!--l. 135--><p class="noindent" ><span 
+<!--l. 136--><p class="noindent" ><span 
 class="cmr-12">In order to build AMG4PSBLAS, the first step is to use the </span><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">configure</span></span></span> <span 
 class="cmr-12">script in the</span>
 <span 
 class="cmr-12">main directory to generate the necessary makefile.</span>
-<!--l. 139--><p class="indent" >   <span 
+<!--l. 140--><p class="indent" >   <span 
 class="cmr-12">As a minimal example consider the following:</span>
                                                                               

@ -360,7 +360,7 @@ class="cmr-12">As a minimal example consider the following:</span>
   <pre class="verbatim" id="verbatim-3">
 ./configure&#x00A0;--with-psblas=PSB-INSTALL-DIR
 </pre>
-<!--l. 147--><p class="nopar" > <span 
+<!--l. 148--><p class="nopar" > <span 
 class="cmr-12">which assumes that the various MPI compilers and support libraries are available in</span>
 <span 
 class="cmr-12">the standard directories on the system, and specifies only the PSBLAS install directory</span>
@ -374,7 +374,7 @@ class="cmtt-12">./configure</span><span
 class="cmtt-12">&#x00A0;--help</span></span></span><span 
 class="cmr-12">, which</span>
 <span 
-class="cmr-12">produces:  </span><!--l. 158--><pre class="lstinputlisting" id="listing-1"><span class="label"><a 
+class="cmr-12">produces:  </span><!--l. 159--><pre class="lstinputlisting" id="listing-1"><span class="label"><a 
 id="x6-10002r1"></a></span><span style="color:#000000"><span 
 class="cmtt-12">&#8216;</span></span><span style="color:#000000"><span 
 class="cmtt-12">configure</span></span><span style="color:#000000"><span 
@ -3910,8 +3910,8 @@ class="cmtt-12">/</span></span><span style="color:#000000"><span
 class="cmtt-12">issues</span></span><span style="color:#000000"><span 
 class="cmtt-12">&#x003E;.</span></span>
   </pre>
-<!--l. 160--><p class="noindent" ><span 
-class="cmr-12">For instance, if a user has built and installed PSBLAS 3.7 under the </span><span class="obeylines-h"><span class="verb"><span 
+<!--l. 161--><p class="noindent" ><span 
+class="cmr-12">For instance, if a user has built and installed PSBLAS 3.9 under the </span><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">/opt</span></span></span> <span 
 class="cmr-12">directory and is</span>
 <span 
@ -3922,10 +3922,10 @@ class="cmr-12">might be configured with:</span>

                                                                               
   <pre class="verbatim" id="verbatim-4">
-./configure&#x00A0;--with-psblas=/opt/psblas-3.7/&#x00A0;\
+./configure&#x00A0;--with-psblas=/opt/psblas-3.9/&#x00A0;\
 --with-umfpackincdir=/usr/include/suitesparse/
 </pre>
-<!--l. 172--><p class="nopar" > <span 
+<!--l. 173--><p class="nopar" > <span 
 class="cmr-12">Once the configure script has completed execution, it will have generated the file</span>
 <span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">Make.inc</span></span></span> <span 
@ -3934,7 +3934,7 @@ class="cmr-12">which will then be used by all Makefiles in the directory tree; t
 class="cmr-12">copied in the install directory under the name </span><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">Make.inc.AMG4PSBLAS</span></span></span><span 
 class="cmr-12">.</span>
-<!--l. 179--><p class="indent" >   <span 
+<!--l. 180--><p class="indent" >   <span 
 class="cmr-12">To use the MUMPS solver package, the user has to add the appropriate options to</span>
 <span 
 class="cmr-12">the configure script; by default we are looking for the libraries </span><span class="obeylines-h"><span class="verb"><span 
@ -3954,7 +3954,7 @@ class="cmtt-12">--with-extra-libs</span></span></span> <span
 class="cmr-12">configure</span>
 <span 
 class="cmr-12">option.</span>
-<!--l. 187--><p class="indent" >   <span 
+<!--l. 188--><p class="indent" >   <span 
 class="cmr-12">To build the library the user will now enter</span>
                                                                               

@ -3962,7 +3962,7 @@ class="cmr-12">To build the library the user will now enter</span>
   <pre class="verbatim" id="verbatim-5">
 make
 </pre>
-<!--l. 195--><p class="nopar" > <span 
+<!--l. 196--><p class="nopar" > <span 
 class="cmr-12">followed (optionally) by</span>
                                                                               

@ -3970,12 +3970,12 @@ class="cmr-12">followed (optionally) by</span>
   <pre class="verbatim" id="verbatim-6">
 make&#x00A0;install
 </pre>
-<!--l. 205--><p class="nopar" >
+<!--l. 206--><p class="nopar" >
   <h4 class="subsectionHead"><span class="titlemark"><span 
 class="cmr-12">3.4   </span></span> <a 
 id="x6-110003.4"></a><span 
 class="cmr-12">Bug reporting</span></h4>
-<!--l. 208--><p class="noindent" ><span 
+<!--l. 209--><p class="noindent" ><span 
 class="cmr-12">If you find any bugs in our codes, please report them through our issues page</span>
 <span 
 class="cmr-12">on</span><br 
@ -3983,18 +3983,18 @@ class="newline" /> <a
 href="https://github.com/psctoolkit/psctoolkit/issues" class="url" ><span 
 class="cmtt-12">https://github.com/psctoolkit/psctoolkit/issues</span></a><br 
 class="newline" />
-<!--l. 212--><p class="indent" >   <span 
+<!--l. 213--><p class="indent" >   <span 
 class="cmr-12">To enable us to track the bug, please provide a log from the failing application, the</span>
 <span 
 class="cmr-12">test conditions, and ideally a self-contained test program reproducing the</span>
 <span 
 class="cmr-12">issue.</span>
-<!--l. 216--><p class="noindent" >
+<!--l. 217--><p class="noindent" >
   <h4 class="subsectionHead"><span class="titlemark"><span 
 class="cmr-12">3.5   </span></span> <a 
 id="x6-120003.5"></a><span 
 class="cmr-12">Example and test programs</span></h4>
-<!--l. 217--><p class="noindent" ><span 
+<!--l. 218--><p class="noindent" ><span 
 class="cmr-12">The package contains a </span><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">samples</span></span></span> <span 
 class="cmr-12">directory, divided in two subdirs </span><span class="obeylines-h"><span class="verb"><span 
@ -4010,22 +4010,22 @@ class="cmr-12">subdirectories.</span>
 <span 
 class="cmr-12">Their purpose is as follows:</span>
     <dl class="description"><dt class="description">
-     <!--l. 222--><p class="noindent" >
+     <!--l. 223--><p class="noindent" >
 <span 
 class="cmtt-12">simple</span> </dt><dd 
 class="description">
-     <!--l. 222--><p class="noindent" ><span 
+     <!--l. 223--><p class="noindent" ><span 
 class="cmr-12">contains  a  set  of  simple  example  programs  with  a  predefined  choice  of</span>
     <span 
 class="cmr-12">preconditioners,  selectable  via  integer  values.  These  are  intended  to  get</span>
     <span 
 class="cmr-12">acquainted with the multilevel preconditioners available in AMG4PSBLAS.</span>
     </dd><dt class="description">
-     <!--l. 226--><p class="noindent" >
+     <!--l. 227--><p class="noindent" >
 <span 
 class="cmtt-12">advanced</span> </dt><dd 
 class="description">
-     <!--l. 226--><p class="noindent" ><span 
+     <!--l. 227--><p class="noindent" ><span 
 class="cmr-12">contains a set of more sophisticated examples that will allow the user, via</span>
     <span 
 class="cmr-12">the input files in the </span><span class="obeylines-h"><span class="verb"><span 
@ -4033,7 +4033,7 @@ class="cmtt-12">runs</span></span></span> <span
 class="cmr-12">subdirectories, to experiment with the full range</span>
     <span 
 class="cmr-12">of preconditioners implemented in the package.</span></dd></dl>
-<!--l. 231--><p class="noindent" ><span 
+<!--l. 232--><p class="noindent" ><span 
 class="cmr-12">The </span><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">fileread</span></span></span> <span 
 class="cmr-12">directories contain sample programs that read sparse matrices from files,</span>
--- a/docs/html/userhtmlse4.html
+++ b/docs/html/userhtmlse4.html
@ -351,7 +351,7 @@ class="content">Preconditioner types, corresponding strings and default choices.
                                                                               
   </div><hr class="endfloat" />
   </div>
-<!--l. 98--><p class="indent" >   <span 
+<!--l. 97--><p class="indent" >   <span 
 class="cmr-12">Note that the module </span><code class="lstinline"><span style="color:#000000">amg_prec_mod</span></code><span 
 class="cmr-12">, containing the definition of the preconditioner</span>
 <span 
@ -370,7 +370,7 @@ class="cmr-12">4.1</span><!--tex4ht:ref: sec:examples --></a><span
 class="cmr-12">).</span>
 <br 
 class="newline" />
-<!--l. 105--><p class="indent" >   <span 
+<!--l. 104--><p class="indent" >   <span 
 class="cmbx-12">Remark 1. </span><span 
 class="cmr-12">Coarsest-level solvers based on the LU factorization, such as those</span>
 <span 
@ -385,11 +385,24 @@ class="cmr-12">problems. However, this does not necessarily correspond to the sh
 <span 
 class="cmr-12">on parallel</span><span 
 class="cmr-12">&#x00A0;computers.</span>
+<!--l. 112--><p class="indent" >   <span 
+class="cmbx-12">Remark 2. </span><span 
+class="cmr-12">Memory allocation on GPUs is a costly operation implying a</span>
+<span 
+class="cmr-12">synchronization; therefore, it is convenient to preallocate internal preconditioner</span>
+<span 
+class="cmr-12">workspace with the method </span><span class="obeylines-h"><span class="verb"><span 
+class="cmtt-12">prec%allocate_wrk(info)</span></span></span> <span 
+class="cmr-12">before invoking an iterative</span>
+<span 
+class="cmr-12">method, and release it upon exit with </span><span class="obeylines-h"><span class="verb"><span 
+class="cmtt-12">prec%deallocate_wrk(info)</span></span></span><span 
+class="cmr-12">.</span>
   <h4 class="subsectionHead"><span class="titlemark"><span 
 class="cmr-12">4.1   </span></span> <a 
 id="x7-140004.1"></a><span 
 class="cmr-12">Examples</span></h4>
-<!--l. 116--><p class="noindent" ><span 
+<!--l. 121--><p class="noindent" ><span 
 class="cmr-12">The code reported in Figure</span><span 
 class="cmr-12">&#x00A0;</span><a 
 href="#x7-14001r1"><span 
@ -418,7 +431,7 @@ class="cmr-12">and </span><code class="lstinline"><span style="color:#000000">ps
 class="cmr-12">must be used by the example</span>
 <span 
 class="cmr-12">program.</span>
-<!--l. 126--><p class="indent" >   <span 
+<!--l. 131--><p class="indent" >   <span 
 class="cmr-12">The part of the code dealing with reading and assembling the sparse matrix and the</span>
 <span 
 class="cmr-12">right-hand side vector and the deallocation of the relevant data structures, performed</span>
@ -451,7 +464,7 @@ href="userhtmlli3.html#XPSBLASGUIDE"><span
 class="cmr-12">21</span></a><span 
 class="cmr-12">]</span></span><span 
 class="cmr-12">.</span>
-<!--l. 138--><p class="indent" >   <span 
+<!--l. 143--><p class="indent" >   <span 
 class="cmr-12">The setup and application of the default multilevel preconditioner for the real single</span>
 <span 
 class="cmr-12">precision and the complex, single and double precision, versions are obtained</span>
@ -461,6 +474,9 @@ class="cmr-12">&#x00A0;</span><a
 href="userhtmlse5.html#x8-160005"><span 
 class="cmr-12">5</span><!--tex4ht:ref: sec:userinterface --></a> <span 
 class="cmr-12">for</span>
+                                                                               
+
+                                                                               
 <span 
 class="cmr-12">details). If these versions are installed, the corresponding codes are available in</span>
 <span class="obeylines-h"><span class="verb"><span 
@ -470,7 +486,7 @@ class="cmr-12">.</span>
                                                                               

                                                                               
-<!--l. 144--><p class="indent" >   <a 
+<!--l. 148--><p class="indent" >   <a 
 id="x7-14001r1"></a><hr class="float"><div class="float" 
 >
                                                                               
@ -478,7 +494,7 @@ class="cmr-12">.</span>
                                                                               
 <div class="center" 
 >
-<!--l. 145--><p class="noindent" >
+<!--l. 149--><p class="noindent" >
                                                                               

                                                                               
@ -535,7 +551,7 @@ class="cmr-12">.</span>
 &#x00A0;&#x00A0;call&#x00A0;psb_exit(ctxt)
 &#x00A0;&#x00A0;stop
 </pre>
-<!--l. 255--><p class="nopar" >                                                                       </div>
+<!--l. 259--><p class="nopar" >                                                                       </div>
                                                                               

                                                                               
@ -548,7 +564,7 @@ class="content">setup and application of the default multilevel preconditioner (

                                                                               
   </div><hr class="endfloat" />
-<!--l. 264--><p class="indent" >   <span 
+<!--l. 267--><p class="indent" >   <span 
 class="cmr-12">Different versions of the multilevel preconditioner can be obtained by changing the</span>
 <span 
 class="cmr-12">default values of the preconditioner parameters. The code reported in Figure</span><span 
@ -557,42 +573,40 @@ href="#x7-14002r2"><span
 class="cmr-12">2</span><!--tex4ht:ref: fig:ex2 --></a> <span 
 class="cmr-12">shows</span>
 <span 
-class="cmr-12">how to set a V-cycle preconditioner which applies 1 block-Jacobi sweep as pre-</span>
+class="cmr-12">how to set a V-cycle preconditioner which applies 1 block-Jacobi sweep as pre- and</span>
 <span 
-class="cmr-12">and post-smoother, and solves the coarsest-level system with 8 block-Jacobi</span>
+class="cmr-12">post-smoother, and solves the coarsest-level system with 8 block-Jacobi sweeps. Note</span>
 <span 
-class="cmr-12">sweeps. Note that the ILU(0) factorization (plus triangular solve) is used as</span>
+class="cmr-12">that the ILU(0) factorization (plus triangular solve) is used as local solver for the</span>
 <span 
-class="cmr-12">local solver for the block-Jacobi sweeps, since this is the default associated</span>
+class="cmr-12">block-Jacobi sweeps, since this is the default associated with block-Jacobi and set</span>
 <span 
-class="cmr-12">with block-Jacobi and set by</span><span 
+class="cmr-12">by</span><span 
 class="cmr-12">&#x00A0;</span><code class="lstinline"><span style="color:#000000">P</span><span style="color:#000000">%</span><span style="color:#000000">init</span></code><span 
-class="cmr-12">. Furthermore, specifying block-Jacobi as</span>
-<span 
-class="cmr-12">coarsest-level solver implies that the coarsest-level matrix is distributed among</span>
+class="cmr-12">. Furthermore, specifying block-Jacobi as coarsest-level solver implies that</span>
 <span 
-class="cmr-12">the processes. Figure</span><span 
+class="cmr-12">the coarsest-level matrix is distributed among the processes. Figure</span><span 
 class="cmr-12">&#x00A0;</span><a 
 href="#x7-14003r3"><span 
 class="cmr-12">3</span><!--tex4ht:ref: fig:ex3 --></a> <span 
-class="cmr-12">shows how to set a W-cycle preconditioner using the</span>
+class="cmr-12">shows how</span>
 <span 
-class="cmr-12">Coarsening based on Compatible Weighted Matching, aggregates of size at</span>
+class="cmr-12">to set a W-cycle preconditioner using the Coarsening based on Compatible</span>
 <span 
-class="cmr-12">most 8 and smoothed prolongators. It applies 2 hybrid Gauss-Seidel sweeps as</span>
+class="cmr-12">Weighted Matching, aggregates of size at most 8 and smoothed prolongators. It</span>
 <span 
-class="cmr-12">pre- and post-smoother, and solves the coarsest-level system with the parallel</span>
+class="cmr-12">applies 2 hybrid Gauss-Seidel sweeps as pre- and post-smoother, and solves the</span>
 <span 
-class="cmr-12">flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi</span>
+class="cmr-12">coarsest-level system with the parallel flexible Conjugate Gradient method (KRM)</span>
 <span 
-class="cmr-12">preconditioner having ILU(0) on the blocks. Default parameters are used for stopping</span>
+class="cmr-12">coupled with the block-Jacobi preconditioner having ILU(0) on the blocks, with</span>
 <span 
-class="cmr-12">criterion of the coarsest solver. Note that, also in this case, specifying KRM as</span>
+class="cmr-12">default parameters used for the coarsest solver. Note that specifying KRM as</span>
 <span 
 class="cmr-12">coarsest-level solver implies that the coarsest-level matrix is distributed among the</span>
 <span 
 class="cmr-12">processes.</span>
-<!--l. 291--><p class="indent" >   <span 
+<!--l. 299--><p class="indent" >   <span 
 class="cmr-12">The code fragments shown in Figures</span><span 
 class="cmr-12">&#x00A0;</span><a 
 href="#x7-14002r2"><span 
@ -605,7 +619,7 @@ class="cmr-12">are included in the example program</span>
 class="cmr-12">file </span><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">amg_dexample_ml.f90</span></span></span> <span 
 class="cmr-12">too.</span>
-<!--l. 294--><p class="indent" >   <span 
+<!--l. 302--><p class="indent" >   <span 
 class="cmr-12">Finally, Figure</span><span 
 class="cmr-12">&#x00A0;</span><a 
 href="#x7-14004r4"><span 
@ -620,7 +634,7 @@ class="cmr-12">nonsymmetric. The corresponding example program is available in t
 <span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">amg_dexample_1lev.f90</span></span></span><span 
 class="cmr-12">.</span>
-<!--l. 301--><p class="indent" >   <span 
+<!--l. 309--><p class="indent" >   <span 
 class="cmr-12">For all the previous preconditioners, example programs where the sparse matrix</span>
 <span 
 class="cmr-12">and the right-hand side are generated by discretizing a PDE with Dirichlet</span>
@ -631,7 +645,7 @@ class="cmr-12">.</span>
                                                                               

                                                                               
-<!--l. 304--><p class="indent" >   <a 
+<!--l. 312--><p class="indent" >   <a 
 id="x7-14002r2"></a><hr class="float"><div class="float" 
 >
                                                                               
@ -639,7 +653,7 @@ class="cmr-12">.</span>
                                                                               
 <div class="center" 
 >
-<!--l. 318--><p class="noindent" >
+<!--l. 326--><p class="noindent" >
 <div class="minipage"><pre class="verbatim" id="verbatim-8">
 ...&#x00A0;...
 !&#x00A0;build&#x00A0;a&#x00A0;V-cycle&#x00A0;preconditioner&#x00A0;with&#x00A0;1&#x00A0;block-Jacobi&#x00A0;sweep&#x00A0;(with
@ -653,7 +667,7 @@ class="cmr-12">.</span>
 &#x00A0;&#x00A0;call&#x00A0;P%smoothers_build(A,desc_A,info)
 ...&#x00A0;...
 </pre>
-<!--l. 333--><p class="nopar" >                                                                       </div></div>
+<!--l. 341--><p class="nopar" >                                                                       </div></div>
 <br /><div class="caption" 
 ><span class="id">Listing 2: </span><span  
 class="content">setup of a multilevel preconditioner based on the default decoupled coarsening</span></div><!--tex4ht:label?: x7-14002r2 -->
@ -664,7 +678,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
                                                                               

                                                                               
-<!--l. 340--><p class="indent" >   <a 
+<!--l. 348--><p class="indent" >   <a 
 id="x7-14003r3"></a><hr class="float"><div class="float" 
 >
                                                                               
@ -672,7 +686,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
                                                                               
 <div class="center" 
 >
-<!--l. 362--><p class="noindent" >
+<!--l. 370--><p class="noindent" >
 <div class="minipage"><pre class="verbatim" id="verbatim-9">
 ...&#x00A0;...
 !&#x00A0;build&#x00A0;a&#x00A0;W-cycle&#x00A0;preconditioner&#x00A0;with&#x00A0;2&#x00A0;hybrid&#x00A0;Gauss-Seidel&#x00A0;sweeps
@ -692,7 +706,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
 &#x00A0;&#x00A0;call&#x00A0;P%smoothers_build(A,desc_A,info)
 ...&#x00A0;...
 </pre>
-<!--l. 383--><p class="nopar" >                                                                       </div></div>
+<!--l. 391--><p class="nopar" >                                                                       </div></div>
 <br /> <div class="caption" 
 ><span class="id">Listing 3: </span><span  
 class="content">setup of a multilevel preconditioner based on the coupled coarsening using
@ -704,7 +718,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
                                                                               

                                                                               
-<!--l. 390--><p class="indent" >   <a 
+<!--l. 398--><p class="indent" >   <a 
 id="x7-14004r4"></a><hr class="float"><div class="float" 
 >
                                                                               
@ -712,7 +726,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
                                                                               
 <div class="center" 
 >
-<!--l. 402--><p class="noindent" >
+<!--l. 410--><p class="noindent" >
 <div class="minipage"><pre class="verbatim" id="verbatim-10">
 ...&#x00A0;...
 !&#x00A0;set&#x00A0;RAS&#x00A0;with&#x00A0;overlap&#x00A0;2&#x00A0;and&#x00A0;ILU(0)&#x00A0;on&#x00A0;the&#x00A0;local&#x00A0;blocks
@ -723,7 +737,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
 !&#x00A0;solve&#x00A0;Ax=b&#x00A0;with&#x00A0;preconditioned&#x00A0;BiCGSTAB
 &#x00A0;&#x00A0;call&#x00A0;psb_krylov(&#8217;BICGSTAB&#8217;,A,P,b,x,tol,desc_A,info)
 </pre>
-<!--l. 414--><p class="nopar" >                                                                       </div></div>
+<!--l. 422--><p class="nopar" >                                                                       </div></div>
 <br /> <div class="caption" 
 ><span class="id">Listing 4: </span><span  
 class="content">setup of a one-level Schwarz preconditioner.</span></div><!--tex4ht:label?: x7-14004r4 -->
@ -735,7 +749,7 @@ class="content">setup of a one-level Schwarz preconditioner.</span></div><!--tex
 class="cmr-12">4.2   </span></span> <a 
 id="x7-150004.2"></a><span 
 class="cmr-12">GPU example</span></h4>
-<!--l. 426--><p class="noindent" ><span 
+<!--l. 434--><p class="noindent" ><span 
 class="cmr-12">The code discussed here shows how to set up a program exploiting the combined GPU</span>
 <span 
 class="cmr-12">capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the</span>
@ -743,14 +757,14 @@ class="cmr-12">capabilities of PSBLAS and AMG4PSBLAS. The code example is availa
 class="cmr-12">source distribution directory </span><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">amg4psblas/examples/gpu</span></span></span><span 
 class="cmr-12">.</span>
-<!--l. 431--><p class="indent" >   <span 
+<!--l. 439--><p class="indent" >   <span 
 class="cmr-12">First of all, we need to include the appropriate modules and declare some auxiliary</span>
 <span 
 class="cmr-12">variables:</span>
                                                                               

                                                                               
-<!--l. 433--><p class="indent" >   <a 
+<!--l. 441--><p class="indent" >   <a 
 id="x7-15001r5"></a><hr class="float"><div class="float" 
 >
                                                                               
@ -758,7 +772,7 @@ class="cmr-12">variables:</span>
                                                                               
 <div class="center" 
 >
-<!--l. 452--><p class="noindent" >
+<!--l. 460--><p class="noindent" >
 <div class="minipage"><pre class="verbatim" id="verbatim-11">
 program&#x00A0;amg_dexample_gpu
 &#x00A0;&#x00A0;use&#x00A0;psb_base_mod
@ -777,7 +791,7 @@ program&#x00A0;amg_dexample_gpu

 &#x00A0;
 </pre>
-<!--l. 471--><p class="nopar" >                                                                       </div></div>
+<!--l. 479--><p class="nopar" >                                                                       </div></div>
 <br /> <div class="caption" 
 ><span class="id">Listing 5: </span><span  
 class="content">setup of a GPU-enabled test program part one.</span></div><!--tex4ht:label?: x7-15001r5 -->
@ -785,7 +799,7 @@ class="content">setup of a GPU-enabled test program part one.</span></div><!--te

                                                                               
   </div><hr class="endfloat" />
-<!--l. 478--><p class="indent" >   <span 
+<!--l. 486--><p class="indent" >   <span 
 class="cmr-12">In this particular example we are choosing to employ a </span><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">HLG</span></span></span> <span 
 class="cmr-12">data structure for</span>
@ -793,14 +807,14 @@ class="cmr-12">data structure for</span>
 class="cmr-12">sparse matrices on GPUs; for more information please refer to the PSBLAS users&#8217;</span>
 <span 
 class="cmr-12">guide.</span>
-<!--l. 482--><p class="indent" >   <span 
+<!--l. 490--><p class="indent" >   <span 
 class="cmr-12">We then have to initialize the GPU environment, and pass the appropriate MOLD</span>
 <span 
 class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217; guide).</span>
                                                                               

                                                                               
-<!--l. 485--><p class="indent" >   <a 
+<!--l. 493--><p class="indent" >   <a 
 id="x7-15002r6"></a><hr class="float"><div class="float" 
 >
                                                                               
@ -808,7 +822,7 @@ class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217;
                                                                               
 <div class="center" 
 >
-<!--l. 501--><p class="noindent" >
+<!--l. 509--><p class="noindent" >
 <div class="minipage"><pre class="verbatim" id="verbatim-12">
 &#x00A0;&#x00A0;call&#x00A0;psb_init(ctxt)
 &#x00A0;&#x00A0;call&#x00A0;psb_info(ctxt,iam,np)
@ -823,7 +837,7 @@ class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217;

 &#x00A0;
 </pre>
-<!--l. 516--><p class="nopar" >                                                                       </div></div>
+<!--l. 524--><p class="nopar" >                                                                       </div></div>
 <br /> <div class="caption" 
 ><span class="id">Listing 6: </span><span  
 class="content">setup of a GPU-enabled test program part two.</span></div><!--tex4ht:label?: x7-15002r6 -->
@ -831,7 +845,7 @@ class="content">setup of a GPU-enabled test program part two.</span></div><!--te

                                                                               
   </div><hr class="endfloat" />
-<!--l. 523--><p class="indent" >   <span 
+<!--l. 531--><p class="indent" >   <span 
 class="cmr-12">Finally, we convert the input matrix, the descriptor and the vectors to use a</span>
 <span 
 class="cmr-12">GPU-enabled internal storage format. We then preallocate the preconditioner</span>
@ -842,7 +856,7 @@ class="cmr-12">GPU environment</span>
                                                                               

                                                                               
-<!--l. 527--><p class="indent" >   <a 
+<!--l. 535--><p class="indent" >   <a 
 id="x7-15003r7"></a><hr class="float"><div class="float" 
 >
                                                                               
@ -850,7 +864,7 @@ class="cmr-12">GPU environment</span>
                                                                               
 <div class="center" 
 >
-<!--l. 557--><p class="noindent" >
+<!--l. 565--><p class="noindent" >
 <div class="minipage"><pre class="verbatim" id="verbatim-13">
 &#x00A0;&#x00A0;call&#x00A0;desc_a%cnv(mold=igmold)
 &#x00A0;&#x00A0;call&#x00A0;a%cscnv(info,mold=agmold)
@ -877,7 +891,7 @@ class="cmr-12">GPU environment</span>

 &#x00A0;
 </pre>
-<!--l. 584--><p class="nopar" >                                                                       </div></div>
+<!--l. 592--><p class="nopar" >                                                                       </div></div>
 <br /> <div class="caption" 
 ><span class="id">Listing 7: </span><span  
 class="content">setup of a GPU-enabled test program part three.</span></div><!--tex4ht:label?: x7-15003r7 -->
@ -885,7 +899,7 @@ class="content">setup of a GPU-enabled test program part three.</span></div><!--

                                                                               
   </div><hr class="endfloat" />
-<!--l. 592--><p class="indent" >   <span 
+<!--l. 600--><p class="indent" >   <span 
 class="cmr-12">It is very important to employ smoothers and coarsest solvers that are suited to the</span>
 <span 
 class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that</span>
@ -893,30 +907,30 @@ class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kern
 class="cmr-12">satisfy this constraint include:</span>
     <ul class="itemize1">
     <li class="itemize">
-     <!--l. 596--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
+     <!--l. 604--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">JACOBI</span></span></span>
     </li>
     <li class="itemize">
-     <!--l. 597--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
+     <!--l. 605--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">BJAC</span></span></span> <span 
 class="cmr-12">with the following methods on the local blocks:</span>
          <ul class="itemize2">
          <li class="itemize">
-          <!--l. 599--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
+          <!--l. 607--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">INVK</span></span></span>
          </li>
          <li class="itemize">
-          <!--l. 600--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
+          <!--l. 608--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">INVT</span></span></span>
          </li>
          <li class="itemize">
-          <!--l. 601--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
+          <!--l. 609--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">AINV</span></span></span></li></ul>
     </li>
     <li class="itemize">
-     <!--l. 603--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
+     <!--l. 611--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span 
 class="cmtt-12">POLY</span></span></span></li></ul>
-<!--l. 605--><p class="noindent" ><span 
+<!--l. 613--><p class="noindent" ><span 
 class="cmr-12">and their </span><span 
 class="cmmi-12">&#x2113;</span><sub><span 
 class="cmr-8">1</span></sub> <span 
--- a/docs/html/userhtmlse5.html
+++ b/docs/html/userhtmlse5.html
--- a/docs/html/userhtmlse7.html
+++ b/docs/html/userhtmlse7.html
@ -64,6 +64,10 @@ class="cmr-12">.</span>
                                                                               

                                                                               
+<!--l. 148--><p class="indent" >
+                                                                               
+
+                                                                               
                                                                               

                                                                               
--- a/docs/html/userhtmlse8.html
+++ b/docs/html/userhtmlse8.html
@ -38,46 +38,47 @@ class="cmr-12">AMG4PSBLAS is freely distributable under the following copyright
                                                                               
   <pre class="verbatim" id="verbatim-15">

-&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;AMG4PSBLAS&#x00A0;&#x00A0;version&#x00A0;1.0
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Algebraic&#x00A0;MultiGrid&#x00A0;Preconditioners&#x00A0;Package
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;based&#x00A0;on&#x00A0;PSBLAS&#x00A0;(Parallel&#x00A0;Sparse&#x00A0;BLAS&#x00A0;version&#x00A0;3.7)

-&#x00A0;&#x00A0;(C)&#x00A0;Copyright&#x00A0;2021
-
-&#x00A0;&#x00A0;Pasqua&#x00A0;D&#8217;Ambra&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;IAC-CNR,&#x00A0;IT
-&#x00A0;&#x00A0;Fabio&#x00A0;Durastante&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;University&#x00A0;of&#x00A0;Pisa&#x00A0;and&#x00A0;IAC-CNR,&#x00A0;IT
-&#x00A0;&#x00A0;Salvatore&#x00A0;Filippone&#x00A0;&#x00A0;&#x00A0;&#x00A0;University&#x00A0;of&#x00A0;Rome&#x00A0;Tor-Vergata&#x00A0;and&#x00A0;IAC-CNR,&#x00A0;IT
-
-&#x00A0;&#x00A0;Redistribution&#x00A0;and&#x00A0;use&#x00A0;in&#x00A0;source&#x00A0;and&#x00A0;binary&#x00A0;forms,&#x00A0;with&#x00A0;or&#x00A0;without
-&#x00A0;&#x00A0;modification,&#x00A0;are&#x00A0;permitted&#x00A0;provided&#x00A0;that&#x00A0;the&#x00A0;following&#x00A0;conditions
-&#x00A0;&#x00A0;are&#x00A0;met:
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;1.&#x00A0;Redistributions&#x00A0;of&#x00A0;source&#x00A0;code&#x00A0;must&#x00A0;retain&#x00A0;the&#x00A0;above&#x00A0;copyright
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer.
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;2.&#x00A0;Redistributions&#x00A0;in&#x00A0;binary&#x00A0;form&#x00A0;must&#x00A0;reproduce&#x00A0;the&#x00A0;above&#x00A0;copyright
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions,&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer&#x00A0;in&#x00A0;the
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;documentation&#x00A0;and/or&#x00A0;other&#x00A0;materials&#x00A0;provided&#x00A0;with&#x00A0;the&#x00A0;distribution.
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;3.&#x00A0;The&#x00A0;name&#x00A0;of&#x00A0;the&#x00A0;MLD2P4&#x00A0;group&#x00A0;or&#x00A0;the&#x00A0;names&#x00A0;of&#x00A0;its&#x00A0;contributors&#x00A0;may
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;not&#x00A0;be&#x00A0;used&#x00A0;to&#x00A0;endorse&#x00A0;or&#x00A0;promote&#x00A0;products&#x00A0;derived&#x00A0;from&#x00A0;this
-&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;software&#x00A0;without&#x00A0;specific&#x00A0;written&#x00A0;permission.
-
-&#x00A0;&#x00A0;THIS&#x00A0;SOFTWARE&#x00A0;IS&#x00A0;PROVIDED&#x00A0;BY&#x00A0;THE&#x00A0;COPYRIGHT&#x00A0;HOLDERS&#x00A0;AND&#x00A0;CONTRIBUTORS
-&#x00A0;&#x00A0;&#8216;&#8216;AS&#x00A0;IS&#8217;&#8217;&#x00A0;AND&#x00A0;ANY&#x00A0;EXPRESS&#x00A0;OR&#x00A0;IMPLIED&#x00A0;WARRANTIES,&#x00A0;INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED
-&#x00A0;&#x00A0;TO,&#x00A0;THE&#x00A0;IMPLIED&#x00A0;WARRANTIES&#x00A0;OF&#x00A0;MERCHANTABILITY&#x00A0;AND&#x00A0;FITNESS&#x00A0;FOR&#x00A0;A&#x00A0;PARTICULAR
-&#x00A0;&#x00A0;PURPOSE&#x00A0;ARE&#x00A0;DISCLAIMED.&#x00A0;IN&#x00A0;NO&#x00A0;EVENT&#x00A0;SHALL&#x00A0;THE&#x00A0;MLD2P4&#x00A0;GROUP&#x00A0;OR&#x00A0;ITS&#x00A0;CONTRIBUTORS
-&#x00A0;&#x00A0;BE&#x00A0;LIABLE&#x00A0;FOR&#x00A0;ANY&#x00A0;DIRECT,&#x00A0;INDIRECT,&#x00A0;INCIDENTAL,&#x00A0;SPECIAL,&#x00A0;EXEMPLARY,&#x00A0;OR
-&#x00A0;&#x00A0;CONSEQUENTIAL&#x00A0;DAMAGES&#x00A0;(INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED&#x00A0;TO,&#x00A0;PROCUREMENT&#x00A0;OF
-&#x00A0;&#x00A0;SUBSTITUTE&#x00A0;GOODS&#x00A0;OR&#x00A0;SERVICES;&#x00A0;LOSS&#x00A0;OF&#x00A0;USE,&#x00A0;DATA,&#x00A0;OR&#x00A0;PROFITS;&#x00A0;OR&#x00A0;BUSINESS
-&#x00A0;&#x00A0;INTERRUPTION)&#x00A0;HOWEVER&#x00A0;CAUSED&#x00A0;AND&#x00A0;ON&#x00A0;ANY&#x00A0;THEORY&#x00A0;OF&#x00A0;LIABILITY,&#x00A0;WHETHER&#x00A0;IN
-&#x00A0;&#x00A0;CONTRACT,&#x00A0;STRICT&#x00A0;LIABILITY,&#x00A0;OR&#x00A0;TORT&#x00A0;(INCLUDING&#x00A0;NEGLIGENCE&#x00A0;OR&#x00A0;OTHERWISE)
-&#x00A0;&#x00A0;ARISING&#x00A0;IN&#x00A0;ANY&#x00A0;WAY&#x00A0;OUT&#x00A0;OF&#x00A0;THE&#x00A0;USE&#x00A0;OF&#x00A0;THIS&#x00A0;SOFTWARE,&#x00A0;EVEN&#x00A0;IF&#x00A0;ADVISED&#x00A0;OF&#x00A0;THE
-&#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE.
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;AMG4PSBLAS&#x00A0;version&#x00A0;1.2
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;Algebraic&#x00A0;Multigrid&#x00A0;Package
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;based&#x00A0;on&#x00A0;PSBLAS&#x00A0;(Parallel&#x00A0;Sparse&#x00A0;BLAS&#x00A0;version&#x00A0;3.9)
+
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;(C)&#x00A0;Copyright&#x00A0;2025
+
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Salvatore&#x00A0;Filippone
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Pasqua&#x00A0;D&#8217;Ambra
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Fabio&#x00A0;Durastante
+
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;Redistribution&#x00A0;and&#x00A0;use&#x00A0;in&#x00A0;source&#x00A0;and&#x00A0;binary&#x00A0;forms,&#x00A0;with&#x00A0;or&#x00A0;without
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;modification,&#x00A0;are&#x00A0;permitted&#x00A0;provided&#x00A0;that&#x00A0;the&#x00A0;following&#x00A0;conditions
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;are&#x00A0;met:
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;1.&#x00A0;Redistributions&#x00A0;of&#x00A0;source&#x00A0;code&#x00A0;must&#x00A0;retain&#x00A0;the&#x00A0;above&#x00A0;copyright
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer.
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;2.&#x00A0;Redistributions&#x00A0;in&#x00A0;binary&#x00A0;form&#x00A0;must&#x00A0;reproduce&#x00A0;the&#x00A0;above&#x00A0;copyright
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions,&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer&#x00A0;in&#x00A0;the
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;documentation&#x00A0;and/or&#x00A0;other&#x00A0;materials&#x00A0;provided&#x00A0;with&#x00A0;the&#x00A0;distribution.
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;3.&#x00A0;The&#x00A0;name&#x00A0;of&#x00A0;the&#x00A0;AMG4PSBLAS&#x00A0;group&#x00A0;or&#x00A0;the&#x00A0;names&#x00A0;of&#x00A0;its&#x00A0;contributors&#x00A0;may
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;not&#x00A0;be&#x00A0;used&#x00A0;to&#x00A0;endorse&#x00A0;or&#x00A0;promote&#x00A0;products&#x00A0;derived&#x00A0;from&#x00A0;this
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;software&#x00A0;without&#x00A0;specific&#x00A0;written&#x00A0;permission.
+
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;THIS&#x00A0;SOFTWARE&#x00A0;IS&#x00A0;PROVIDED&#x00A0;BY&#x00A0;THE&#x00A0;COPYRIGHT&#x00A0;HOLDERS&#x00A0;AND&#x00A0;CONTRIBUTORS
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#8216;&#8216;AS&#x00A0;IS&#8217;&#8217;&#x00A0;AND&#x00A0;ANY&#x00A0;EXPRESS&#x00A0;OR&#x00A0;IMPLIED&#x00A0;WARRANTIES,&#x00A0;INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;TO,&#x00A0;THE&#x00A0;IMPLIED&#x00A0;WARRANTIES&#x00A0;OF&#x00A0;MERCHANTABILITY&#x00A0;AND&#x00A0;FITNESS&#x00A0;FOR&#x00A0;A&#x00A0;PARTICULAR
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;PURPOSE&#x00A0;ARE&#x00A0;DISCLAIMED.&#x00A0;IN&#x00A0;NO&#x00A0;EVENT&#x00A0;SHALL&#x00A0;THE&#x00A0;AMG4PSBLAS&#x00A0;GROUP&#x00A0;OR&#x00A0;ITS&#x00A0;CONTRIBUTORS
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;BE&#x00A0;LIABLE&#x00A0;FOR&#x00A0;ANY&#x00A0;DIRECT,&#x00A0;INDIRECT,&#x00A0;INCIDENTAL,&#x00A0;SPECIAL,&#x00A0;EXEMPLARY,&#x00A0;OR
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;CONSEQUENTIAL&#x00A0;DAMAGES&#x00A0;(INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED&#x00A0;TO,&#x00A0;PROCUREMENT&#x00A0;OF
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;SUBSTITUTE&#x00A0;GOODS&#x00A0;OR&#x00A0;SERVICES;&#x00A0;LOSS&#x00A0;OF&#x00A0;USE,&#x00A0;DATA,&#x00A0;OR&#x00A0;PROFITS;&#x00A0;OR&#x00A0;BUSINESS
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;INTERRUPTION)&#x00A0;HOWEVER&#x00A0;CAUSED&#x00A0;AND&#x00A0;ON&#x00A0;ANY&#x00A0;THEORY&#x00A0;OF&#x00A0;LIABILITY,&#x00A0;WHETHER&#x00A0;IN
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;CONTRACT,&#x00A0;STRICT&#x00A0;LIABILITY,&#x00A0;OR&#x00A0;TORT&#x00A0;(INCLUDING&#x00A0;NEGLIGENCE&#x00A0;OR&#x00A0;OTHERWISE)
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;ARISING&#x00A0;IN&#x00A0;ANY&#x00A0;WAY&#x00A0;OUT&#x00A0;OF&#x00A0;THE&#x00A0;USE&#x00A0;OF&#x00A0;THIS&#x00A0;SOFTWARE,&#x00A0;EVEN&#x00A0;IF&#x00A0;ADVISED&#x00A0;OF&#x00A0;THE
+&#x00A0;&#x00A0;&#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE.

 </pre>
-<!--l. 44--><p class="nopar" >
+<!--l. 45--><p class="nopar" >
                                                                               

                                                                               
-<!--l. 47--><p class="indent" >   <span 
+<!--l. 48--><p class="indent" >   <span 
 class="cmr-12">AMG4PSBLAS is an evolution of MLD2P4, whose license we reproduce here to</span>
 <span 
 class="cmr-12">abide by its terms:</span>
@ -123,7 +124,7 @@ class="cmr-12">abide by its terms:</span>
 &#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE.

 </pre>
-<!--l. 87--><p class="nopar" > <span 
+<!--l. 88--><p class="nopar" > <span 
 class="cmr-12">AMG4PSBLAS is distributed together with (a small part of) the graph-matching</span>
                                                                               

@ -183,7 +184,7 @@ class="cmr-12">here.</span>
 //
 //&#x00A0;************************************************************************
 </pre>
-<!--l. 135--><p class="nopar" >
+<!--l. 136--><p class="nopar" >
                                                                               

                                                                               
--- a/docs/src/abstract.tex
+++ b/docs/src/abstract.tex
@ -4,30 +4,42 @@
 \fi

 \textsc{AMG4PSBLAS (Algebraic MultiGrid Preconditioners Package
-based on PSBLAS}) is a package of parallel algebraic multilevel preconditioners included in the PSCToolkit (Parallel Sparse Computation Toolkit) software framework.
-It is a progress of a software development project started in 2007, named MLD2P4, which originally implemented a 
-multilevel version of some domain decomposition preconditioners of additive-Schwarz type, and was based on a parallel decoupled version of the well known smoothed
+based on PSBLAS}) is a package of parallel algebraic multilevel
+preconditioners included in the PSCToolkit (Parallel Sparse
+Computation Toolkit) software framework. 
+It is an evolutiuon of a software development project started in 2007,
+named MLD2P4, which originally implemented a  
+multilevel version of some domain decomposition preconditioners of
+additive-Schwarz type, and was based on a parallel decoupled version
+of the well known smoothed 
 aggregation method to generate the multilevel hierarchy of coarser matrices. 
-In the last years, within the context of the EU-H2020 EoCoE project (Energy Oriented Center of Excellence), the package was extended for including new algorithms and 
-functionalities for the  setup and application new AMG preconditioners with the final aims of improving efficiency and scalability when tens of thousands cores are
-used, and of boosting reliability in dealing with general symmetric positive definite linear systems. 
-Due to the significant number of changes and the increase in scope, we decided to rename the package as AMG4PSBLAS.
+In the last few years the package was extended for
+including new algorithms and  
+functionalities for the  setup and application new AMG preconditioners
+with the final aims of improving efficiency and scalability when tens
+of thousands cores are  used, and of boosting reliability in dealing
+with general symmetric positive definite linear systems; these
+developments have been supported in the context of the EU-H2020 EoCoE
+project (Energy Oriented Center of Excellence).  
+Due to the significant number of changes and the increase in scope, we
+decided to rename the package as AMG4PSBLAS. 

-AMG4PSBLAS has been designed to provide scalable and easy-to-use preconditioners
-in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra Subprograms)
-computational framework and can be used in conjuction with the Krylov solvers
-available in this framework.
+AMG4PSBLAS has been designed to provide scalable and easy-to-use
+preconditioners in the context of the PSBLAS (Parallel Sparse Basic
+Linear Algebra Subprograms) computational framework and can be used in
+conjuction with the Krylov solvers available in this framework.
 Our package is based on a completely algebraic approach; therefore
 users level interfaces assume that the system matrix and
 preconditioners are represented as PSBLAS distributed sparse matrices.
 AMG4PSBLAS enables the user to easily specify different
-features of an algebraic multilevel preconditioner, thus allowing to experiment
-with different preconditioners for the problem and parallel computers at hand.
+features of an algebraic multilevel preconditioner, thus allowing to
+experiment with different preconditioners for the problem and parallel
+computers at hand. 

 The package employs object-oriented design techniques in
-Fortran~2003, with interfaces to additional third party libraries
+Fortran~2003, with interfaces to additional third party libraries 
 such as MUMPS, UMFPACK, SuperLU, and SuperLU\_Dist, which
-can be exploited in building multilevel preconditioners. The parallel
+can be exploited in building multilevel preconditioners. The parallel 
 implementation is based on a Single Program Multiple Data (SPMD)
 paradigm; the inter-process communication is based on MPI and
 is managed mainly through PSBLAS.
--- a/docs/src/building.tex
+++ b/docs/src/building.tex
@ -13,19 +13,20 @@ must support the Fortran~2003 standard plus the extension \verb|MOLD=|
 feature, which enhances the usability of \verb|ALLOCATE|.
 Most Fortran compilers provide  this feature; in particular, this is
 supported by the GNU Fortran compiler, for which we
-recommend to use at least version 4.8.
+recommend to use at least version 12.
 The software defines data types and interfaces for
 real and complex data, in both single and double precision.

 Building AMG4PSBLAS requires some base libraries (see
-Section~\ref{sec:prerequisites}); interfaces to optional third-party
+Section~\ref{sec:prerequisites}); interfaces to optional third-party 
 libraries, which extend the functionalities of AMG4PSBLAS (see
-Section~\ref{sec:third-party}), are also available.  A number of Linux
-distributions (e.g., Ubuntu, Fedora, CentOS) provide precompiled
+Section~\ref{sec:third-party}), are also available.  A number of Linux 
+distributions (e.g., Ubuntu, Fedora, CentOS) provide precompiled 
 packages for the prerequisite and optional software. In many cases
 these packages are split between a runtime part and a ``developer''
 part; in order to build AMG4PSBLAS you need both. A description of the
-base and optional software used by AMG4PSBLAS is given in the next sections.
+base and optional software used by AMG4PSBLAS is given in the next
+sections. 

 \subsection{Prerequisites\label{sec:prerequisites}}

@ -157,17 +158,17 @@ The full set of options may be looked at by issuing the command
 \else
 \lstinputlisting{../configureout.txt}
 \fi
-For instance, if a user has built and installed PSBLAS 3.7 under the
+For instance, if a user has built and installed PSBLAS 3.9 under the
 \verb|/opt| directory and is
 using the SuiteSparse package (which includes UMFPACK), then AMG4PSBLAS
 might be configured with:
 \ifpdf
 \begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{console}
-./configure --with-psblas=/opt/psblas-3.7/ --with-umfpackincdir=/usr/include/suitesparse/
+./configure --with-psblas=/opt/psblas-3.9/ --with-umfpackincdir=/usr/include/suitesparse/
 \end{minted}
 \else
 \begin{verbatim}
-./configure --with-psblas=/opt/psblas-3.7/ \
+./configure --with-psblas=/opt/psblas-3.9/ \
 --with-umfpackincdir=/usr/include/suitesparse/
 \end{verbatim}
 \fi
--- a/docs/src/gettingstarted.tex
+++ b/docs/src/gettingstarted.tex
@ -94,7 +94,6 @@ Multilevel        &\fortinline|'ML'|    & V-cycle with one hybrid forward Gauss-
 \label{tab:precinit}}
 \end{center}
 \end{table}
-
 Note that the module \fortinline|amg_prec_mod|, containing the definition of the
 preconditioner data type and the interfaces to the routines of AMG4PSBLAS,
 must be used in any program calling such routines.
@ -110,6 +109,12 @@ a standard discretization of basic scalar elliptic PDE problems. However,
 this does not necessarily correspond to the shortest execution time
 on parallel~computers.

+\textbf{Remark 2.} Memory allocation on GPUs is a costly operation
+implying a synchronization; therefore, it is convenient to preallocate
+internal preconditioner workspace with the method
+\verb|prec%allocate_wrk(info)| before invoking an iterative method,
+and release it upon exit with \verb|prec%deallocate_wrk(info)|.
+

 \subsection{Examples\label{sec:examples}}

@ -140,7 +145,6 @@ for the real single precision and the complex, single and double
 precision, versions are obtained with straightforward modifications of the previous
 example (see Section~\ref{sec:userinterface} for details). If these versions are installed,
 the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|.
-
 \begin{listing}[tbp]
 \begin{center}
 \begin{minipage}{.90\textwidth}
@ -260,7 +264,6 @@ stop
 \label{fig:ex1}}
 \end{center}
 \end{listing}
-
 Different versions of the multilevel preconditioner can be obtained by changing
 the default values of the preconditioner parameters. The code reported in
 Figure~\ref{fig:ex2} shows how to set a V-cycle preconditioner
@ -272,10 +275,15 @@ with block-Jacobi and set by~\fortinline|P%init|.
 Furthermore, specifying block-Jacobi as coarsest-level
 solver implies that the coarsest-level matrix is distributed
 among the processes.
-Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using the Coarsening based on Compatible Weighted Matching, aggregates of size at most $8$ and smoothed prolongators. It applies
+Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using
+the Coarsening based on Compatible Weighted Matching, aggregates of
+size at most $8$ and smoothed prolongators. It applies 
 2 hybrid Gauss-Seidel sweeps as pre- and post-smoother,
-and solves the coarsest-level system with the parallel flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi preconditioner having ILU(0) on the blocks. Default parameters are used for stopping criterion of the coarsest solver.
-Note that, also in this case, specifying KRM as coarsest-level
+and solves the coarsest-level system with the parallel flexible
+Conjugate Gradient method (KRM) coupled with the block-Jacobi
+preconditioner having ILU(0) on the blocks, with  default parameters
+used for the coarsest solver. 
+Note that specifying KRM as coarsest-level
 solver implies that the coarsest-level matrix is distributed
 among the processes.
 %It is specified that the coarsest-level
--- a/docs/src/license.tex
+++ b/docs/src/license.tex
@ -6,40 +6,41 @@
 AMG4PSBLAS is freely distributable under the following copyright
 terms: {\small
 \begin{verbatim}
-
-                           AMG4PSBLAS  version 1.0
-              Algebraic MultiGrid Preconditioners Package
-             based on PSBLAS (Parallel Sparse BLAS version 3.7)
-
-  (C) Copyright 2021
-
-  Pasqua D'Ambra         IAC-CNR, IT
-  Fabio Durastante       University of Pisa and IAC-CNR, IT
-  Salvatore Filippone    University of Rome Tor-Vergata and IAC-CNR, IT
-
-  Redistribution and use in source and binary forms, with or without
-  modification, are permitted provided that the following conditions
-  are met:
-    1. Redistributions of source code must retain the above copyright
-       notice, this list of conditions and the following disclaimer.
-    2. Redistributions in binary form must reproduce the above copyright
-       notice, this list of conditions, and the following disclaimer in the
-       documentation and/or other materials provided with the distribution.
-    3. The name of the MLD2P4 group or the names of its contributors may
-       not be used to endorse or promote products derived from this
-       software without specific written permission.
-
-  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
-  TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-  PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS
-  BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-  POSSIBILITY OF SUCH DAMAGE.
+  
+   
+                             AMG4PSBLAS version 1.2
+    Algebraic Multigrid Package
+               based on PSBLAS (Parallel Sparse BLAS version 3.9)
+    
+    (C) Copyright 2025
+  
+        Salvatore Filippone  
+        Pasqua D'Ambra   
+        Fabio Durastante        
+   
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+      1. Redistributions of source code must retain the above copyright
+         notice, this list of conditions and the following disclaimer.
+      2. Redistributions in binary form must reproduce the above copyright
+         notice, this list of conditions, and the following disclaimer in the
+         documentation and/or other materials provided with the distribution.
+      3. The name of the AMG4PSBLAS group or the names of its contributors may
+         not be used to endorse or promote products derived from this
+         software without specific written permission.
+   
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+    TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+    PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS
+    BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+    CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+    SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+    INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+    CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+    ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+    POSSIBILITY OF SUCH DAMAGE.

 \end{verbatim}
 }
--- a/docs/src/overview.tex
+++ b/docs/src/overview.tex
@ -3,7 +3,9 @@
         {\textsc{\ref{sec:overview} General Overview}}

 The \textsc{Algebraic MultiGrid Preconditioners Package based on
-PSBLAS} (\textsc{AMG\-4\-PSBLAS}) provides parallel Algebraic MultiGrid (AMG) preconditioners (see, e.g., \cite{Briggs2000,Stuben_01}),
+PSBLAS} (\textsc{AMG\-4\-PSBLAS}) provides parallel Algebraic
+MultiGrid (AMG) preconditioners (see, e.g.,
+\cite{Briggs2000,Stuben_01}), 
 to be used in the iterative solution of  linear systems,
 \begin{equation}
 Ax=b,
@ -18,12 +20,14 @@ where $A$ is a square, real or complex, sparse symmetric positive definite (s.p.

 The preconditioners implemented in AMG4PSBLAS are obtained by combining
 3 different types of AMG cycles with smoothers and coarsest-level
-solvers. Available multigrid cycles include the V-, W-, and a version of a Krylov-type cycle
+solvers. We provide a number of  multigrid cycles, including the V-,
+W-, and a version of a Krylov-type cycle 
 (K-cycle)~\cite{Briggs2000,Notay2008}; they can  be
 combined with Jacobi, hybrid 
 %\footnote{see Note 2 in Table~\ref{tab:p_coarse}, p.~28.}
 forward/backward Gauss-Seidel, block-Jacobi and additive Schwarz
-smoothers with various versions of local incomplete factorizations and approximate inverses 
+smoothers with various versions of local incomplete factorizations and
+approximate inverses  
 on the blocks. The Jacobi, block-Jacobi and
 Gauss-Seidel smoothers are also available in the $\ell_1$ version~\cite{DDF2020}.

@ -41,7 +45,8 @@ two different coarsening strategies, based on aggregation, are available:
  and described in detail in~\cite{DDF2020};  
 \end{itemize}
 Either exact or approximate solvers can be used on the coarsest-level
-system. We provide interfaces to various parallel and sequential sparse LU factorizations from external 
+system. We provide interfaces to various parallel and sequential
+sparse LU factorizations from external  
 packages, sequential native incomplete LU and approximate inverse factorizations,
 parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solvers and
 calls to preconditioned Krylov methods;  all 
@ -74,36 +79,41 @@ important  revisions and extentions of the PSBLAS infrastructure.
 The inter-process comunication required by AMG4PSBLAS is encapsulated
 in the PSBLAS routines;
 therefore, AMG4PSBLAS can be run on any parallel machine where PSBLAS 
-implementations are available. In the most recent version of PSBLAS
-(release 3.7), a plug-in for GPU is included; it includes CUDA
+implementations are available.  The most recent version of PSBLAS 
+(release 3.9) includes  a plug-in for GPU; it contains CUDA
 versions of main vector operations and of sparse matrix-vector
 multiplication, so that Krylov methods coupled with AMG4PSBLAS
-preconditioners    relying on Jacobi and block-Jacobi smoothers with
-sparse approximate inverses on the blocks can be efficiently executed
+preconditioners    relying on Jacobi and block-Jacobi smoothers with 
+sparse approximate inverses on the blocks can be efficiently executed 
 on cluster of GPUs. 

-AMG4PSBLAS has a layered and modular software architecture where three main layers can be
-identified.  The lower layer consists of the PSBLAS kernels, the middle one implements
-the construction and application phases of the preconditioners, and the upper one
-provides a uniform interface to all the preconditioners.
-This architecture allows for different levels of use of the package:
-few black-box routines at the upper layer allow all users to easily
-build and apply any preconditioner available in AMG4PSBLAS;
-facilities are also available allowing expert users to extend the set of smoothers
-and solvers for building new versions of the preconditioners (see
-Section~\ref{sec:adding}).
+AMG4PSBLAS has a layered and modular software architecture where three
+main layers can be identified.  The lower layer consists of the PSBLAS
+kernels, the middle one implements the construction and application
+phases of the preconditioners, and the upper one provides a uniform
+interface to all the preconditioners. This architecture allows for
+different levels of use of the package: few black-box routines at the
+upper layer allow all users to easily build and apply any
+preconditioner available in AMG4PSBLAS; facilities are also available
+allowing expert users to extend the set of smoothers and solvers for
+building new versions of the preconditioners (see
+Section~\ref{sec:adding}). 

-This guide is organized as follows. General information on the distribution of the source
-code is reported in Section~\ref{sec:distribution}, while details on the configuration
-and installation of the package are given in Section~\ref{sec:building}. The basics for building and applying the
-preconditioners with the Krylov solvers implemented in PSBLAS are reported
-in~Section~\ref{sec:started}, where the Fortran codes of a few sample programs
-are also shown. A reference guide for the user interface routines is provided
-in Section~\ref{sec:userinterface}. Information on the extension of the package
-through the addition of new smoothers and solvers is reported in Section~\ref{sec:adding}.
-The error handling mechanism used by the package
-is briefly described in Section~\ref{sec:errors}. The copyright terms concerning the
-distribution and modification of AMG4PSBLAS are reported in Appendix~\ref{sec:license}.
+This guide is organized as follows. General information on the
+distribution of the source code is reported in
+Section~\ref{sec:distribution}, while details on the configuration and
+installation of the package are given in
+Section~\ref{sec:building}. The basics for building and applying the
+preconditioners with the Krylov solvers implemented in PSBLAS are
+reported in~Section~\ref{sec:started}, where the Fortran codes of a
+few sample programs are also shown. A reference guide for the user
+interface routines is provided in
+Section~\ref{sec:userinterface}. Information on the extension of the
+package through the addition of new smoothers and solvers is reported
+in Section~\ref{sec:adding}. The error handling mechanism used by the
+package is briefly described in Section~\ref{sec:errors}. The
+copyright terms concerning the distribution and modification of
+AMG4PSBLAS are reported in Appendix~\ref{sec:license}. 

 %%% Local Variables:
 %%% mode: latex
--- a/docs/src/userguide.tex
+++ b/docs/src/userguide.tex
@ -154,7 +154,7 @@ Preconditioners Package based on PSBLAS}
 \flushright
 \large Software version: 1.2\\
 %\todaym
-\large December 31st, 2025
+\large December 23rd, 2025
 \end{minipage}}
 %\addtolength{\textwidth}{\centeroffset}
 \vspace{\stretch{2}}
--- a/docs/src/userhtml.tex
+++ b/docs/src/userhtml.tex
@ -114,7 +114,7 @@
 %\today
 Software version: 1.2\\
 %\today
- December 31st, 2025
+ December 23rd, 2025
 \clearpage
 \ \\
 \thispagestyle{empty}
--- a/docs/src/userinterface.tex
+++ b/docs/src/userinterface.tex
@ -160,7 +160,7 @@ the smoothers. However, for simplicity, shortcuts are
 provided to set all versions of point-Jacobi, hybrid (forward) Gauss-Seidel, and
 hybrid backward Gauss-Seidel, i.e., the previous smoothers can be defined
 just by setting \fortinline|'SMOOTHER_TYPE'| to certain specific
-values (see Tables~\ref{tab:p_smoother}), without the need to set
+values (see Table~\ref{tab:p_smoother}), without the need to set
 \fortinline|'SUB_SOLVE'| as well.

 The smoother and solver objects are arranged in a
@ -182,47 +182,50 @@ the polynomial used. Consequently, the \fortinline|'SMOOTHER_SWEEPS'| option is
 the \fortinline|'POLY_DEGREE'| option. This smoother is paired with a base smoother 
 object, whose iterations are accelerated using the specified polynomial smoothing technique. 
 By default, the $\ell_1$-Jacobi smoother serves as the base smoother, offering theoretical 
-guarantees on the resulting convergence factor~\cite{DDFMT2024,LOTTES}. Alternative combinations 
-are experimental and lack established guarantees.\\
+guarantees on the resulting convergence
+factor~\cite{DDFMT2024,LOTTES}. Alternative combinations are
+experimental.\\
+% and lack established guarantees.\\

 \textbf{Remark 4.} Many of the coarsest-level solvers apply to a
-specific coarsest-matrix layout; 
-therefore, setting the solver after the layout may change the layout
-to either distributed or replicated.
-Similarly, setting the layout after the solver may change the solver.
-
-More precisely, UMFPACK and SuperLU require the coarsest-level
-matrix to be replicated, while SuperLU\_Dist and KRM require it to be distributed.
-In these cases, setting the coarsest-level solver implies that
-the layout is redefined according to the solver, ovverriding any
+specific coarsest-matrix layout;  therefore, setting the solver after
+the layout may change the layout to either distributed or replicated,
+and similarly, setting the layout after the solver may change the
+solver. More specifically, UMFPACK and SuperLU require the coarsest-level
+matrix to be replicated, while SuperLU\_Dist and KRM require it to be
+distributed; therefore, setting the coarsest-level solver implies
+that the layout is redefined according to the solver, ovverriding any 
 previous settings. MUMPS,  point-Jacobi,
 hybrid Gauss-Seidel and block-Jacobi can be applied to
 replicated and distributed matrices, thus their choice
 does not modify any previously specified layout.
 It is worth noting that, when the matrix is replicated,
-the point-Jacobi, hybrid Gauss-Seidel and block-Jacobi solvers and their $\ell_1-$ versions
-reduce to the corresponding local solver objects (see Remark~2).
-For the point-Jacobi and Gauss-Seidel solvers, these objects
-correspond to a \emph{single} point-Jacobi sweep and a \emph{single}
-Gauss-Seidel sweep, respectively, which are very poor solvers.
-
-On the other hand, the distributed layout can be used with any solver
-but UMFPACK and SuperLU; therefore, if any of these two solvers has already
-been selected, the coarsest-level solver is changed to block-Jacobi,
-with the previously chosen solver applied to the local blocks.
-Likewise, the replicated layout can be used with any solver but SuperLu\_Dist and KRM;
-therefore, if SuperLu\_Dist or KRM have been previously set, the coarsest-level
-solver is changed to the default sequential solver. 
-
-In a parallel setting with many cores, we suggest to the users to change the default
-coarsest solver for using the KRM choice, i.e. a parallel distributed iterative solution of the
-coarsest system based on Krylov methods.
-
-\textbf{Remark 4.}  The argument \fortinline|idx| can be used to allow finer
-control for those solvers; for instance, by specifying the keyword
-\fortinline|'MUMPS_IPAR_ENTRY'| and an appropriate value for \fortinline|idx|, it is
-possible to set any entry in the MUMPS integer control array.
-See also Sec.~\ref{sec:adding}.
+the point-Jacobi, hybrid Gauss-Seidel and block-Jacobi solvers and 
+their $\ell_1-$ versions reduce to the corresponding local solver
+objects (see Remark~2). For the point-Jacobi and Gauss-Seidel solvers,
+these objects correspond to a \emph{single} point-Jacobi sweep and a
+\emph{single} Gauss-Seidel sweep, respectively, which are very poor
+solvers. 
+
+On the other hand, the distributed layout can be used with any solver 
+except and SuperLU; therefore, if any of these two solvers has
+already been selected, the coarsest-level solver is changed to
+block-Jacobi, with the previously chosen solver applied to the local
+blocks. Likewise, the replicated layout can be used with any solver
+but SuperLu\_Dist and KRM; therefore, if SuperLu\_Dist or KRM have
+been previously set, the coarsest-level solver is changed to the
+default sequential solver.  
+
+In a parallel setting with many cores, we suggest to the users to
+change the default coarsest solver for using the KRM choice, i.e. a
+parallel distributed iterative solution of the coarsest system based
+on Krylov methods. 
+
+\textbf{Remark 4.}  The argument \fortinline|idx| can be used to allow
+finer control for those solvers; for instance, by specifying the
+keyword \fortinline|'MUMPS_IPAR_ENTRY'| and an appropriate value for
+\fortinline|idx|, it is possible to set any entry in the MUMPS integer
+control array. See also Sec.~\ref{sec:adding}.
 %The \verb|what,val| pairs described here are those of the predefined
 %moother/solver objects; newly developed solvers may define new pairs
 %according to their needs.
				`@ -64,6 +64,10 @@ class="cmr-12">.</span>`



				`<!--l. 148--><p class="indent" >`