based on PSBLAS}) is a package of parallel algebraic multilevel preconditioners included in the PSCToolkit (Parallel Sparse Computation Toolkit) software framework.
It is a progress of a software development project started in 2007, named MLD2P4, which originally implemented a
multilevel version of some domain decomposition preconditioners of additive-Schwarz type, and was based on a parallel decoupled version of the well known smoothed
based on PSBLAS}) is a package of parallel algebraic multilevel
preconditioners included in the PSCToolkit (Parallel Sparse
Computation Toolkit) software framework.
It is an evolutiuon of a software development project started in 2007,
named MLD2P4, which originally implemented a
multilevel version of some domain decomposition preconditioners of
additive-Schwarz type, and was based on a parallel decoupled version
of the well known smoothed
aggregation method to generate the multilevel hierarchy of coarser matrices.
In the last years, within the context of the EU-H2020 EoCoE project (Energy Oriented Center of Excellence), the package was extended for including new algorithms and
functionalities for the setup and application new AMG preconditioners with the final aims of improving efficiency and scalability when tens of thousands cores are
used, and of boosting reliability in dealing with general symmetric positive definite linear systems.
Due to the significant number of changes and the increase in scope, we decided to rename the package as AMG4PSBLAS.
In the last few years the package was extended for
including new algorithms and
functionalities for the setup and application new AMG preconditioners
with the final aims of improving efficiency and scalability when tens
of thousands cores are used, and of boosting reliability in dealing
with general symmetric positive definite linear systems; these
developments have been supported in the context of the EU-H2020 EoCoE
project (Energy Oriented Center of Excellence).
Due to the significant number of changes and the increase in scope, we
decided to rename the package as AMG4PSBLAS.
AMG4PSBLAS has been designed to provide scalable and easy-to-use preconditioners
in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra Subprograms)
computational framework and can be used in conjuction with the Krylov solvers
available in this framework.
AMG4PSBLAS has been designed to provide scalable and easy-to-use
preconditioners in the context of the PSBLAS (Parallel Sparse Basic
Linear Algebra Subprograms) computational framework and can be used in
conjuction with the Krylov solvers available in this framework.
Our package is based on a completely algebraic approach; therefore
users level interfaces assume that the system matrix and
preconditioners are represented as PSBLAS distributed sparse matrices.
AMG4PSBLAS enables the user to easily specify different
features of an algebraic multilevel preconditioner, thus allowing to experiment
with different preconditioners for the problem and parallel computers at hand.
features of an algebraic multilevel preconditioner, thus allowing to
experiment with different preconditioners for the problem and parallel
computers at hand.
The package employs object-oriented design techniques in
Fortran~2003, with interfaces to additional third party libraries
Fortran~2003, with interfaces to additional third party libraries
such as MUMPS, UMFPACK, SuperLU, and SuperLU\_Dist, which
can be exploited in building multilevel preconditioners. The parallel
can be exploited in building multilevel preconditioners. The parallel
implementation is based on a Single Program Multiple Data (SPMD)
paradigm; the inter-process communication is based on MPI and
@ -94,7 +94,6 @@ Multilevel &\fortinline|'ML'| & V-cycle with one hybrid forward Gauss-
\label{tab:precinit}}
\end{center}
\end{table}
Note that the module \fortinline|amg_prec_mod|, containing the definition of the
preconditioner data type and the interfaces to the routines of AMG4PSBLAS,
must be used in any program calling such routines.
@ -110,6 +109,12 @@ a standard discretization of basic scalar elliptic PDE problems. However,
this does not necessarily correspond to the shortest execution time
on parallel~computers.
\textbf{Remark 2.} Memory allocation on GPUs is a costly operation
implying a synchronization; therefore, it is convenient to preallocate
internal preconditioner workspace with the method
\verb|prec%allocate_wrk(info)| before invoking an iterative method,
and release it upon exit with \verb|prec%deallocate_wrk(info)|.
\subsection{Examples\label{sec:examples}}
@ -140,7 +145,6 @@ for the real single precision and the complex, single and double
precision, versions are obtained with straightforward modifications of the previous
example (see Section~\ref{sec:userinterface} for details). If these versions are installed,
the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|.
\begin{listing}[tbp]
\begin{center}
\begin{minipage}{.90\textwidth}
@ -260,7 +264,6 @@ stop
\label{fig:ex1}}
\end{center}
\end{listing}
Different versions of the multilevel preconditioner can be obtained by changing
the default values of the preconditioner parameters. The code reported in
Figure~\ref{fig:ex2} shows how to set a V-cycle preconditioner
@ -272,10 +275,15 @@ with block-Jacobi and set by~\fortinline|P%init|.
Furthermore, specifying block-Jacobi as coarsest-level
solver implies that the coarsest-level matrix is distributed
among the processes.
Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using the Coarsening based on Compatible Weighted Matching, aggregates of size at most $8$ and smoothed prolongators. It applies
Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using
the Coarsening based on Compatible Weighted Matching, aggregates of
size at most $8$ and smoothed prolongators. It applies
2 hybrid Gauss-Seidel sweeps as pre- and post-smoother,
and solves the coarsest-level system with the parallel flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi preconditioner having ILU(0) on the blocks. Default parameters are used for stopping criterion of the coarsest solver.
Note that, also in this case, specifying KRM as coarsest-level
and solves the coarsest-level system with the parallel flexible
Conjugate Gradient method (KRM) coupled with the block-Jacobi
preconditioner having ILU(0) on the blocks, with default parameters
used for the coarsest solver.
Note that specifying KRM as coarsest-level
solver implies that the coarsest-level matrix is distributed