diff --git a/docs/amg4psblas_1.2-guide.pdf b/docs/amg4psblas_1.2-guide.pdf
index 6cead832..faec7bbe 100644
Binary files a/docs/amg4psblas_1.2-guide.pdf and b/docs/amg4psblas_1.2-guide.pdf differ
diff --git a/docs/html/index.html b/docs/html/index.html
index 05e3ea7c..4ed148b0 100644
--- a/docs/html/index.html
+++ b/docs/html/index.html
@@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR
Software version: 1.2
December 31st, 2025
+class="cmr-12">December 23rd, 2025
diff --git a/docs/html/userhtml.html b/docs/html/userhtml.html
index 05e3ea7c..4ed148b0 100644
--- a/docs/html/userhtml.html
+++ b/docs/html/userhtml.html
@@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR
Software version: 1.2
December 31st, 2025
+class="cmr-12">December 23rd, 2025
diff --git a/docs/html/userhtmlli1.html b/docs/html/userhtmlli1.html
index 328c05e6..43007b4d 100644
--- a/docs/html/userhtmlli1.html
+++ b/docs/html/userhtmlli1.html
@@ -72,32 +72,34 @@ class="small-caps">n
class="cmcsc-10x-x-120">PSBLAS) is a package of parallel algebraic multilevel preconditioners included in the
PSCToolkit (Parallel Sparse Computation Toolkit) software framework. It is a progress
+class="cmr-12">PSCToolkit (Parallel Sparse Computation Toolkit) software framework. It
of a software development project started in 2007, named MLD2P4, which originally
+class="cmr-12">is an evolutiuon of a software development project started in 2007, named
implemented a multilevel version of some domain decomposition preconditioners of
+class="cmr-12">MLD2P4, which originally implemented a multilevel version of some domain
additive-Schwarz type, and was based on a parallel decoupled version of the well known
+class="cmr-12">decomposition preconditioners of additive-Schwarz type, and was based on a parallel
smoothed aggregation method to generate the multilevel hierarchy of coarser
+class="cmr-12">decoupled version of the well known smoothed aggregation method to generate the
matrices. In the last years, within the context of the EU-H2020 EoCoE project
+class="cmr-12">multilevel hierarchy of coarser matrices. In the last few years the package
(Energy Oriented Center of Excellence), the package was extended for including
+class="cmr-12">was extended for including new algorithms and functionalities for the setup
new algorithms and functionalities for the setup and application new AMG
+class="cmr-12">and application new AMG preconditioners with the final aims of improving
preconditioners with the final aims of improving efficiency and scalability when tens of
+class="cmr-12">efficiency and scalability when tens of thousands cores are used, and of boosting
thousands cores are used, and of boosting reliability in dealing with general
+class="cmr-12">reliability in dealing with general symmetric positive definite linear systems; these
symmetric positive definite linear systems. Due to the significant number
+class="cmr-12">developments have been supported in the context of the EU-H2020 EoCoE
+project (Energy Oriented Center of Excellence). Due to the significant number
of changes and the increase in scope, we decided to rename the package as
AMG4PSBLAS.
-
AMG4PSBLAS has been designed to provide scalable and easy-to-use
preconditioners in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra
@@ -111,14 +113,14 @@ class="cmr-12">algebraic approach; therefore users level interfaces assume that
class="cmr-12">and preconditioners are represented as PSBLAS distributed sparse matrices.
The package employs object-oriented design techniques in Fortran 2003, with
parallel implementation is based on a Single Program Multiple Dat
class="cmr-12">paradigm; the inter-process communication is based on MPI and is managed mainly
through PSBLAS.
- This guide provides a brief description of the functionalities and the user interface
of AMG4PSBLAS.
diff --git a/docs/html/userhtmlse1.html b/docs/html/userhtmlse1.html
index ae661559..f42c44aa 100644
--- a/docs/html/userhtmlse1.html
+++ b/docs/html/userhtmlse1.html
@@ -100,18 +100,18 @@ src="userhtml0x.png" alt="Ax = b,
id="x4-3001r1">
where A is a square, real or complex, sparse symmetric positive definite (s.p.d)
matrix.
- The preconditioners implemented in AMG4PSBLAS are obtained by combining 3
different types of AMG cycles with smoothers and coarsest-level solvers. Available
+class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. We provide a An algebraic approach is used to generate a hierarchy of coarse-level matrices and
operators, without explicitly using any information on the geometry of the original
@@ -150,7 +150,7 @@ class="cmr-12">problem, e.g., the discretization of a PDE. To this end, two diff
class="cmr-12">strategies, based on aggregation, are available: a decoupled version of the smoothed aggregation procedure proposed in [;
a coupled, parallel implementation of the Coarsening based on Compatible
Weighted Matching introduced in14]; Either exact or approximate solvers can be used on the coarsest-level system. We provide
interfaces to various parallel and sequential sparse LU factorizations from external
@@ -210,7 +210,7 @@ class="cmr-12">parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solve
class="cmr-12">preconditioned Krylov methods; all smoothers can be also exploited as one-level AMG4PSBLAS is written in Fortran 2003, following an object-oriented design
Single and double precision implementations of AMG4PSBLAS are ava
class="cmr-12">for both the real and the complex case, which can be used through a single
interface.
- AMG4PSBLAS has been designed to implement scalable and easy-to-use
+ AMG4PSBLAS has been designed to implement scalable and easy-to-use multilevel
multilevel preconditioners in the context of the PSBLAS (Parallel Sparse BLAS)
+class="cmr-12">preconditioners in the context of the PSBLAS (Parallel Sparse BLAS) computational
-
AMG4PSBLAS has a layered and modular software architecture where three main
layers can be identified. The lower layer consists of the PSBLAS kernels, the middle
one implements the construction and application phases of the preconditioners, and the
+upper one provides a uniform interface to all the preconditioners. This architecture
-upper one provides a uniform interface to all the preconditioners. This architecture
allows for different levels of use of the package: few black-box routines at the upper
6).
- This guide is organized as follows. General information on the distribution of the
source code is reported in Section. Most Fortran compilers provide this feature; in particular, thi
supported by the GNU Fortran compiler, for which we recommend to use at least
version 4.8. The software defines data types and interfaces for real and complex data,
+class="cmr-12">version 12. The software defines data types and interfaces for real and complex data, in
in both single and double precision.
+class="cmr-12">both single and double precision. Building AMG4PSBLAS requires some base libraries (see Section “developer” part; in order to build AMG4PSBLAS you ne
class="cmr-12">the base and optional software used by AMG4PSBLAS is given in the next
+
The following base libraries are needed:
+
BLAS [18for including -fPIC compilation option in the make.inc file of th
library.
+
MPI [25A version of MPI is available on most high-performance computing<
systems.
+
PSBLAS [21; version
class="cmr-12">3.9.0 (or later) is required. Indeed, all the prerequisites listed so far are also
prerequisites of PSBLAS. Please note that the four previous libraries must have Fortran interfaces compatible with
AMG4PSBLAS; usually this means that they should all be built with the same
compiler being used for AMG4PSBLAS.
- If you want to use the PSBLAS support for NVIDIA GPUs, you will also
need a working version of the CUDA Toolkit that is compatible with the
@@ -214,7 +214,7 @@ class="cmr-12">options: Previous versions required you to have the auxiliary libraries SPGPU and
PSBLAS-EXT compiled, this is no longer necessary because they have been integrated
@@ -226,24 +226,24 @@ class="cmr-12">
+
We provide interfaces to the following third-party software libraries; note that these are
optional, but if you enable them some defaults for multilevel preconditioners may
change to reflect their presence.
-
+
+
UMFPACK [16provide the right path to the BLAS and LAPACK libraries
class="cmtt-12">SuiteSparse_config/SuiteSparse_config.mk
+
MUMPS [2solution for single and double precision, real and complex data.
versions 4.10.0 and 5.0.1.
+
SuperLU [17data. We tested versions 4.3 and 5.0. If you installed BLAS from
remember to define the BLASLIB variable in the make.inc file.
+
SuperLU_Dist [28parallel graph partitioning and fill-reducing matrix ordering, av
href="glaros.dtc.umn.edu/gkhome/metis/parmetis/overview" class="url" >glaros.dtc.umn.edu/gkhome/metis/parmetis/overview.
+
In order to build AMG4PSBLAS, the first step is to use the configure script in the
main directory to generate the necessary makefile.
- As a minimal example consider the following:
@@ -360,7 +360,7 @@ class="cmr-12">As a minimal example consider the following: which assumes that the various MPI compilers and support libraries are available in
the standard directories on the system, and specifies only the PSBLAS install directory
@@ -374,7 +374,7 @@ class="cmtt-12">./configure For instance, if a user has built and installed PSBLAS 3.7 under the For instance, if a user has built and installed PSBLAS 3.9 under the /opt directory and is
might be configured with:
Once the configure script has completed execution, it will have generated the file
Make.inc which will then be used by all Makefiles in the directory tree; t
class="cmr-12">copied in the install directory under the name Make.inc.AMG4PSBLAS.
- To use the MUMPS solver package, the user has to add the appropriate options to
the configure script; by default we are looking for the libraries --with-extra-libs configure
option.
- To build the library the user will now enter
@@ -3962,7 +3962,7 @@ class="cmr-12">To build the library the user will now enter followed (optionally) by
@@ -3970,12 +3970,12 @@ class="cmr-12">followed (optionally) by
+
If you find any bugs in our codes, please report them through our issues page
on To enable us to track the bug, please provide a log from the failing application, the
test conditions, and ideally a self-contained test program reproducing the
issue.
-
+
The package contains a samples directory, divided in two subdirs subdirectories.
Their purpose is as follows:
+
simple contains a set of simple example programs with a predefined choice of
preconditioners, selectable via integer values. These are intended to get
acquainted with the multilevel preconditioners available in AMG4PSBLAS.
+
advanced contains a set of more sophisticated examples that will allow the user, via
the input files in the runs subdirectories, to experiment with the full range
of preconditioners implemented in the package. The fileread directories contain sample programs that read sparse matrices from files,
diff --git a/docs/html/userhtmlse4.html b/docs/html/userhtmlse4.html
index 286b0483..493821fa 100644
--- a/docs/html/userhtmlse4.html
+++ b/docs/html/userhtmlse4.html
@@ -351,7 +351,7 @@ class="content">Preconditioner types, corresponding strings and default choices.
Note that the module Remark 1. Coarsest-level solvers based on the LU factorization, such as those
problems. However, this does not necessarily correspond to the sh
on parallel computers.
+ Remark 2. Memory allocation on GPUs is a costly operation implying a
+synchronization; therefore, it is convenient to preallocate internal preconditioner
+workspace with the method prec%allocate_wrk(info) before invoking an iterative
+method, and release it upon exit with prec%deallocate_wrk(info).
The setup and application of the default multilevel preconditioner for the real single
precision and the complex, single and double precision, versions are obtained
@@ -461,6 +474,9 @@ class="cmr-12">
+
@@ -535,7 +551,7 @@ class="cmr-12">.
call psb_exit(ctxt)
stop
- Different versions of the multilevel preconditioner can be obtained by changing the
default values of the preconditioner parameters. The code reported in Figure2 The code fragments shown in Figures are included in the example program
class="cmr-12">file Finally, Figure nonsymmetric. The corresponding example program is available in t
amg_dexample_1lev.f90.
- For all the previous preconditioners, example programs where the sparse matrix
and the right-hand side are generated by discretizing a PDE with Dirichlet
@@ -631,7 +645,7 @@ class="cmr-12">.
+
+
The code discussed here shows how to set up a program exploiting the combined GPU
capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the
@@ -743,14 +757,14 @@ class="cmr-12">capabilities of PSBLAS and AMG4PSBLAS. The code example is availa
class="cmr-12">source distribution directory First of all, we need to include the appropriate modules and declare some auxiliary
variables:
- In this particular example we are choosing to employ a HLG data structure for
@@ -793,14 +807,14 @@ class="cmr-12">data structure for We then have to initialize the GPU environment, and pass the appropriate MOLD
variables to the build methods (see also the PSBLAS users’ guide).
- Finally, we convert the input matrix, the descriptor and the vectors to use a
GPU-enabled internal storage format. We then preallocate the preconditioner
@@ -842,7 +856,7 @@ class="cmr-12">GPU environment It is very important to employ smoothers and coarsest solvers that are suited to the
GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that
@@ -893,30 +907,30 @@ class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kern
class="cmr-12">satisfy this constraint include: JACOBI
BJAC with the following methods on the local blocks:
INVK
INVT
AINV POLY and their ℓ1 point-Jacobi, hybrid (forward) Gauss-Seidel, and hybrid backward
class="cmr-12">the previous smoothers can be defined just by setting Remark 4. Many of the coarsest-level solvers apply to a specific coarsest-matrix
layout; therefore, setting the solver after the layout may change the layout to either
distributed or replicated. Similarly, setting the layout after the solver may change the
+class="cmr-12">distributed or replicated, and similarly, setting the layout after the solver may More precisely, UMFPACK and SuperLU require the coarsest-level matrix to be
+class="cmr-12">change the solver. More specifically, UMFPACK and SuperLU require the
-
./configure --enable-cuda --with-cudadir=${CUDA_HOME} --with-cudacc=xx,yy,zz
-3.2 Optional third party libraries
-
-3.3 Configuration options
-
./configure --with-psblas=PSB-INSTALL-DIR
-produces:
solver, ovverriding any previous settings. MUMPS, point-Jacobi, hybrid Gauss-Seidel
+class="cmr-12">layout is redefined according to the solver, ovverriding any previous settings.‘configure/issues>.
-
-./configure --with-psblas=/opt/psblas-3.7/ \
+./configure --with-psblas=/opt/psblas-3.9/ \
--with-umfpackincdir=/usr/include/suitesparse/
-
make
-
make install
-3.4 Bug reporting
-
https://github.com/psctoolkit/psctoolkit/issues
-3.5 Example and test programs
-
-
-amg_prec_mod, containing the definition of the preconditioner
4.1).
-4.1 Examples
-
-P%init. Furthermore, specifying block-Jacobi as
-coarsest-level solver implies that the coarsest-level matrix is distributed among
+class="cmr-12">. Furthermore, specifying block-Jacobi as coarsest-level solver implies that
... ...
! build a V-cycle preconditioner with 1 block-Jacobi sweep (with
@@ -653,7 +667,7 @@ class="cmr-12">.
call P%smoothers_build(A,desc_A,info)
... ...
-
... ...
! build a W-cycle preconditioner with 2 hybrid Gauss-Seidel sweeps
@@ -692,7 +706,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
call P%smoothers_build(A,desc_A,info)
... ...
-
-
’SMOOTHER_TYPE’ to
certain specific values (see Tablescertain specific values (see Table 7accelerated using the specified polynomial smoothing technique. B
ℓ1-Jacobi smoother serves as the base smoother, offering theoretical guarantees on the
+class="cmr-12">-Jacobi smoother serves as the base smoother, offering theoretical guarantees
resulting convergence factoron the resulting convergence factor [ 27]. Alternative combinations are experimental and
+class="cmr-12">. Alternative combinations are
experimental.
-
On the other hand, the distributed layout can be used with any solver but +
On the other hand, the distributed layout can be used with any solver except and UMFPACK and SuperLU; therefore, if any of these two solvers has already been +class="cmr-12">SuperLU; therefore, if any of these two solvers has already been selected, the selected, the coarsest-level solver is changed to block-Jacobi, with the previously chosen +class="cmr-12">coarsest-level solver is changed to block-Jacobi, with the previously chosen solver solver applied to the local blocks. Likewise, the replicated layout can be used with any +class="cmr-12">applied to the local blocks. Likewise, the replicated layout can be used with any solver solver but SuperLubut SuperLu_Dist and KRM; therefore, if SuperLu_Dist or KRM have been previously set, the coarsest-level solver is changed to the default sequential solver. -
In a parallel setting with many cores, we suggest to the users to change the default
coarsest solver for using the KRM choice, i.e. a parallel distributed iterative solution of
@@ -615,7 +614,7 @@ class="cmr-12">the coarsest system based on Krylov methods.
Remark 4. The argument idx can be used to allow finer control for those solvers;
@@ -635,7 +634,7 @@ class="cmr-12">.
+