@ -17,21 +17,34 @@ where $A$ is a square, real or complex, sparse symmetric positive definite (s.p.
%
The preconditioners implemented in AMG4PSBLAS are obtained by combining
3 different types of AMG cycles with smoothers and coarsest-level solvers. The V-, W-, and a version of a Krylov-type cycle (K-cycle)~\cite{Briggs2000,Notay2008} are available, which can be combined with Jacobi hybrid
3 different types of AMG cycles with smoothers and coarsest-level
solvers. Available multigrid cycles include the V-, W-, and a version of a Krylov-type cycle
(K-cycle)~\cite{Briggs2000,Notay2008}; they can be
combined with Jacobi hybrid
%\footnote{see Note 2 in Table~\ref{tab:p_coarse}, p.~28.}
forward/backward Gauss-Seidel, block-Jacobi, and additive Schwarz smoothers. Also $\ell_1$ versions of Jacobi, block-Jacobi and Gauss-Seidel smoothers are available.
forward/backward Gauss-Seidel, block-Jacobi, and additive Schwarz
smoothers. The Jacobi, block-Jacobi and
Gauss-Seidel smoothers are also available in the $\ell_1$ version.
An algebraic approach is used to generate a hierarchy of
coarse-level matrices and operators, without explicitly using any information on the
geometry of the original problem, e.g., the discretization of a PDE. To this end,
two different coarsening strategies, based on aggregation, are available:
\begin{itemize}
\item a decoupled version of the well known smoothed aggregation procedure proposed in~\cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}, and already included in the previous versions of the package~\cite{BDDF2007,MLD2P4_TOMS};
\item the first parallel implementation of a coupled version of Coarsening based on Compatible Weighted Matching introduced in~\cite{DV2013,DFV2018} and described in details in~\cite{DDF2020};
\item a decoupled version of the smoothed aggregation procedure
proposed in~\cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}, and already
included in the previous versions of the
package~\cite{BDDF2007,MLD2P4_TOMS};
\item a coupled, parallel implementation of the Coarsening based on
Either exact or approximate solvers can be used on the coarsest-level system. Specifically, different sparse LU factorizations from external
packages, native incomplete LU and approximate inverse factorizations, weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solvers and recursive call to preconditioned Krylov methods are available. All the smoothers can be also exploited as one-level
preconditioners.
Either exact or approximate solvers can be used on the coarsest-level
system. We provide interfaces to various sparse LU factorizations from external
packages, native incomplete LU and approximate inverse factorizations,
weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solvers and
a recursive call to preconditioned Krylov methods; all
smoothers can be also exploited as one-level preconditioners.
AMG4PSBLAS is written in Fortran~2003, following an
object-oriented design through the exploitation of features
@ -47,19 +60,26 @@ AMG4PSBLAS has been designed to implement scalable and easy-to-use
multilevel preconditioners in the context of the PSBLAS (Parallel Sparse BLAS)
computational framework~\cite{psblas_00,PSBLAS3}. PSBLAS provides basic linear algebra
operators and data management facilities for distributed sparse matrices,
kernels for sequential incomplete factorizations needed for the parallel block-Jacobi and additive Schwarz smoothers, and
kernels for sequential incomplete factorizations needed for the
parallel block-Jacobi and additive Schwarz smoothers, and
parallel Krylov solvers which can be used with the AMG4PSBLAS preconditioners.
The choice of PSBLAS has been mainly motivated by the need of having
a portable and efficient software infrastructure implementing ``de facto'' standard
parallel sparse linear algebra kernels, to pursue goals such as performance,
portability, modularity ed extensibility in the development of the preconditioner
package. On the other hand, the implementation of AMG4PSBLAS, which was driven by the need to face the exascale challenge, has led to some important revisions and extentions of the PSBLAS infrastructure.
package. On the other hand, the implementation of AMG4PSBLAS, which
was driven by the need to face the exascale challenge, has led to some
important revisions and extentions of the PSBLAS infrastructure.
The inter-process comunication required by AMG4PSBLAS is encapsulated
in the PSBLAS routines;
therefore, AMG4PSBLAS can be run on any parallel machine where PSBLAS
implementations are available. In the most recent version of PSBLAS (release 3.7), a plug-in for GPU is included; it includes CUDA versions of
main vector operations and of sparse matrix-vector multiplication, so that Krylov methods coupled with AMG4PBLAS preconditioners
relying on Jacobi and block-Jacobi smoothers with sparse approximate inverses on the blocks can be efficiently executed on cluster of GPUs.
therefore, AMG4PSBLAS can be run on any parallel machine where PSBLAS
implementations are available. In the most recent version of PSBLAS
(release 3.7), a plug-in for GPU is included; it includes CUDA
versions of main vector operations and of sparse matrix-vector
multiplication, so that Krylov methods coupled with AMG4PBLAS
preconditioners relying on Jacobi and block-Jacobi smoothers with
sparse approximate inverses on the blocks can be efficiently executed
on cluster of GPUs.
AMG4PSBLAS has a layered and modular software architecture where three main layers can be
identified. The lower layer consists of the PSBLAS kernels, the middle one implements