@ -55,47 +55,51 @@ original version by: Nikos Drakos, CBLU, University of Leeds
<BR>
Multigrid Background
</H1><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
Multigrid preconditioners, coupled with Krylov iterative
solvers, are widely used in the parallel solution of large and sparse linear systems,
because of their optimality in the solution of linear systems arising from the
discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.
Optimality, also known as algorithmic scalability, is the property
of having a computational cost per iteration that depends linearly on
the problem size, and a convergence rate that is independent of the problem size.
Multigrid preconditioners are based on a recursive application of a two-grid process
consisting of smoother iterations and a coarse-space (or coarse-level) correction.
The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,
or more complex subspace-correction methods, such as the Schwarz ones.
The coarse-space correction consists of solving, in an appropriately chosen
coarse space, the residual equation associated with the approximate solution computed
by the smoother, and of using the solution of this equation to correct the
previous approximation. The transfer of information between the original
(fine) space and the coarse one is performed by using suitable restriction and
prolongation operators. The construction of the coarse space and the corresponding
transfer operators is carried out by applying a so-called coarsening algorithm to the system
matrix. Two main approaches can be used to perform coarsening: the geometric approach,
which exploits the knowledge of some physical grid associated with the matrix
and requires the user to define transfer operators from the fine
to the coarse level and vice versa, and the algebraic approach, which builds
the coarse-space correction and the associate transfer operators using only matrix
information. The first approach may be difficult when the system comes from
discretizations on complex geometries;
furthermore, ad hoc one-level smoothers may be required to get an efficient
interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients
are considered. The second approach performs a fully automatic coarsening and enforces the
interplay between fine and coarse level by suitably choosing the coarse space and
the coarse-to-fine interpolation (see, e.g., [<A
HREF="node27.html#Briggs2000">3</A>,<A
HREF="node27.html#Stuben_01">23</A>,<A
HREF="node27.html#dd2_96">21</A>] for details.)
MLD2P4 uses a pure algebraic approach, based on the smoothed
aggregation algorithm [<A
HREF="node27.html#BREZINA_VANEK">2</A>,<A
HREF="node27.html#VANEK_MANDEL_BREZINA">25</A>],
for building the sequence of coarse matrices and transfer operators,
starting from the original one.
A decoupled version of this algorithm is implemented, where the smoothed
aggregation is applied locally to each submatrix [<A
HREF="node27.html#TUMINARO_TONG">24</A>].
A brief description of the AMG preconditioners implemented in MLD2P4 is given in
Sections <AHREF="node12.html#sec:multilevel">4.1</A>-<AHREF="#sec:smoothers">4.3</A>. For further details the reader
is referred to [<A
HREF="node27.html#para_04">4</A>,<A
HREF="node27.html#aaecc_07">5</A>,<A
HREF="node27.html#apnum_07">7</A>,<A
HREF="node27.html#MLD2P4_TOMS">8</A>].
We note that optimal multigrid preconditioners do not necessarily correspond
to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
multigrid preconditioners, a tradeoff between the optimality and the cost of building and
applying the smoothers and the coarse-space corrections must be achieved. Effective
parallel preconditioners require algorithmic scalability to be coupled with implementation
scalability, i.e., a computational cost per iteration which remains (almost) constant as
the number of parallel processors increases.
</FONT></FONT></FONT>
HREF="node29.html#Briggs2000">3</A>,<A
HREF="node29.html#Stuben_01">23</A>,<A
HREF="node29.html#dd2_96">21</A>] for details.)
MLD2P4 uses a pure algebraic approach, based on the smoothed
aggregation algorithm [<A
HREF="node29.html#BREZINA_VANEK">2</A>,<A
HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>],
for building the sequence of coarse matrices and transfer operators,
starting from the original one.
A decoupled version of this algorithm is implemented, where the smoothed
aggregation is applied locally to each submatrix [<A
HREF="node29.html#TUMINARO_TONG">24</A>].
A brief description of the AMG preconditioners implemented in MLD2P4 is given in
Sections <AHREF="node12.html#sec:multilevel">4.1</A>-<AHREF="node14.html#sec:smoothers">4.3</A>. For further details the reader
is referred to [<A
HREF="node29.html#para_04">4</A>,<A
HREF="node29.html#aaecc_07">5</A>,<A
HREF="node29.html#apnum_07">7</A>,<A
HREF="node29.html#MLD2P4_TOMS">8</A>].
We note that optimal multigrid preconditioners do not necessarily correspond
to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
multigrid preconditioners, a tradeoff between the optimality and the cost of building and
applying the smoothers and the coarse-space corrections must be achieved. Effective
parallel preconditioners require algorithmic scalability to be coupled with implementation
scalability, i.e., a computational cost per iteration which remains (almost) constant as
the number of parallel processors increases.
</FONT></FONT></FONT>
<P></P><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
In the current implementation of MLD2P4 we have <IMG
WIDTH="95" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img17.png"
ALT="$R^k=(P^k)^T$">
A smoother with iteration matrix <IMG
WIDTH="32" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img18.png"
ALT="$M^k$"> is set up at each level <IMG
WIDTH="71" HEIGHT="34" ALIGN="MIDDLE" BORDER="0"
SRC="img11.png"
ALT="$k < nlev$">, and a solver
is set up at the coarsest level, so that they are ready for application
(for example, setting up a solver based on the <IMG
WIDTH="30" HEIGHT="16" ALIGN="BOTTOM" BORDER="0"
SRC="img19.png"
ALT="$LU$"> factorization means computing
and storing the <IMG
WIDTH="17" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
SRC="img20.png"
ALT="$L$"> and <IMG
WIDTH="18" HEIGHT="16" ALIGN="BOTTOM" BORDER="0"
SRC="img21.png"
ALT="$U$"> factors). The construction of the hierarchy of AMG components
described so far corresponds to the so-called build phase of the preconditioner.
</FONT></FONT></FONT>
<FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
The components produced in the build phase may be combined in several ways
to obtain different multilevel preconditioners;
this is done in the application phase, i.e., in the computation of a vector
of type <IMG
WIDTH="82" HEIGHT="21" ALIGN="BOTTOM" BORDER="0"
SRC="img23.png"
ALT="$w=B^{-1}v$">, where <IMG
WIDTH="19" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
SRC="img24.png"
ALT="$B$"> denotes the preconditioner, usually within an iteration
of a Krylov solver [<A
HREF="node29.html#Saad_book">20</A>]. An example of such a combination, known as
V-cycle, is given in Figure <AHREF="#fig:application_alg">1</A>. In this case, a single iteration
of the same smoother is used before and after the the recursive call to the V-cycle (i.e.,
in the pre-smoothing and post-smoothing phases); however, different choices can be
performed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycle
and W-cycle [<A
HREF="node29.html#Briggs2000">3</A>], and a version of the K-cycle described
in [<A
<FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">We describe the basics for building and applying MLD2P4 one-level and multi-level
(i.e., AMG) preconditioners with the Krylov solvers included in PSBLAS [<A
HREF="node27.html#PSBLASGUIDE">13</A>].
The following steps are required:
</FONT></FONT></FONT>
Smoothed Aggregation
</H2><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
In order to define the prolongator <IMG
WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img25.png"
ALT="$P^k$">, used to compute
the coarse-level matrix <IMG
WIDTH="43" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img15.png"
ALT="$A^{k+1}$">, MLD2P4 uses the smoothed aggregation
algorithm described in [<A
HREF="node29.html#BREZINA_VANEK">2</A>,<A
HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>].
The basic idea of this algorithm is to build a coarse set of indices
<IMG
WIDTH="43" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img26.png"
ALT="$\Omega^{k+1}$"> by suitably grouping the indices of <IMG
WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img9.png"
ALT="$\Omega^k$"> into disjoint
subsets (aggregates), and to define the coarse-to-fine space transfer operator
<IMG
WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img25.png"
ALT="$P^k$"> by applying a suitable smoother to a simple piecewise constant
prolongation operator, with the aim of improving the quality of the coarse-space correction.
Three main steps can be identified in the smoothed aggregation procedure:
</FONT></FONT></FONT>
<OL>
<LI><I>Declare the preconditioner data structure</I>. It is a derived data type,
<code>mld_</code><I>x</I><code>prec_</code><code>type</code>, where <I>x</I> may be <code>s</code>, <code>d</code>, <code>c</code>
or <code>z</code>, according to the basic data type of the sparse matrix
(<code>s</code> = real single precision; <code>d</code> = real double precision;
<code>c</code> = complex single precision; <code>z</code> = complex double precision).
This data structure is accessed by the user only through the MLD2P4 routines,
following an object-oriented approach.
</LI>
<LI><I>Allocate and initialize the preconditioner data structure, according to
a preconditioner type chosen by the user</I>. This is performed by the routine
<code>init</code>, which also sets defaults for each preconditioner
type selected by the user. The preconditioner types and the defaults associated
with them are given in Table <AHREF="#tab:precinit">1</A>, where the strings used by
<code>init</code> to identify the preconditioner types are also given.
Note that these strings are valid also if uppercase letters are substituted by
corresponding lowercase ones.
</LI>
<LI><I>Modify the selected preconditioner type, by properly setting
preconditioner parameters.</I> This is performed by the routine <code>set</code>.
This routine must be called only if the user wants to modify the default values
of the parameters associated with the selected preconditioner type, to obtain a variant
of that preconditioner. Examples of use of <code>set</code> are given in
Section <AHREF="node14.html#sec:examples">5.1</A>; a complete list of all the
preconditioner parameters and their allowed and default values is provided in
\Omega^k_j \subset \mathcal{N}_i^k(\theta) =
\left\{ r ...
...vert a_{ii}^ka_{rr}^k\vert} \right \} \cup \left\{ i \right\},
\end{displaymath}"></TD>
<TDWIDTH=10ALIGN="RIGHT">
(3)</TD></TR>
</TABLE>
</DIV></TD></TR>
<BRCLEAR="ALL"></DIV><P></P><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
for a given threshold <!-- MATH
$\theta \in [0,1]$
-->
<IMG
WIDTH="69" HEIGHT="36" ALIGN="MIDDLE" BORDER="0"
SRC="img32.png"
ALT="$\theta \in [0,1]$"> (see [<A
HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>] for the details).
Since this algorithm has a sequential nature, a decoupled
version of it is applied, where each processor independently executes
the algorithm on the set of indices assigned to it in the initial data
distribution. This version is embarrassingly parallel, since it does not require any data
communication. On the other hand, it may produce some nonuniform aggregates
and is strongly dependent on the number of processors and on the initial partitioning
of the matrix <IMG
WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
SRC="img3.png"
ALT="$A$">. Nevertheless, this parallel algorithm has been chosen for
MLD2P4, since it has been shown to produce good results in practice
[<A
HREF="node29.html#aaecc_07">5</A>,<A
HREF="node29.html#apnum_07">7</A>,<A
HREF="node29.html#TUMINARO_TONG">24</A>].
The prolongator <IMG
WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img25.png"
ALT="$P^k$"> is built starting from a tentative prolongator
<!-- MATH
$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$
-->
<IMG
WIDTH="117" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img33.png"
ALT="$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$">, defined as
</FONT></FONT></FONT>
<P></P><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
in order to remove nonsmooth components from the range of the prolongator,
and hence to improve the convergence properties of the multi-level
method [<A
HREF="node29.html#BREZINA_VANEK">2</A>,<A
HREF="node29.html#Stuben_01">23</A>].
A simple choice for <IMG
WIDTH="25" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
SRC="img38.png"
ALT="$S^k$"> is the damped Jacobi smoother:
</FONT></FONT></FONT>
<BR><P></P>
<DIVALIGN="CENTER">
<!-- MATH
\begin{displaymath}
S^k = I - \omega^k (D^k)^{-1} A^k_F ,
\end{displaymath}
-->
<IMG
WIDTH="175" HEIGHT="31" BORDER="0"
SRC="img39.png"
ALT="\begin{displaymath}
S^k = I - \omega^k (D^k)^{-1} A^k_F ,
\end{displaymath}">
</DIV>
<BRCLEAR="ALL">
<P></P><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
where <IMG
WIDTH="28" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img40.png"
ALT="$D^k$"> is the diagonal matrix with the same diagonal entries as <IMG
WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img41.png"
ALT="$A^k$">,
<!-- MATH
$A^k_F = (\bar{a}_{ij}^k)$
-->
<IMG
WIDTH="87" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img42.png"
ALT="$A^k_F = (\bar{a}_{ij}^k)$"> is the filtered matrix defined as
</FONT></FONT></FONT>
setup and application of the default multi-level preconditioner (example 1).
</CAPTION>
<TR><TD>
Smoothers and coarsest-level solvers
</H2><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods,
a hybrid version of the forward and backward Gauss-Seidel methods, and the
additive Schwarz (AS) ones (see, e.g., [<A
HREF="node29.html#Saad_book">20</A>,<A
HREF="node29.html#dd2_96">21</A>]).
The hybrid Gauss-Seidel
version is considered because the original Gauss-Seidel method is inherently sequential.
At each iteration of the hybrid version, each parallel process uses the most recent values
of its own local variables and the values of the non-local variables computed at the
previous iteration, obtained by exchanging data with other processes before
the beginning of the current iteration.
In the AS methods, the index space <IMG
WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img9.png"
ALT="$\Omega^k$"> is divided into <IMG
WIDTH="28" HEIGHT="31" ALIGN="MIDDLE" BORDER="0"
SRC="img50.png"
ALT="$m_k$">
subsets <IMG
WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img51.png"
ALT="$\Omega^k_i$"> of size <IMG
WIDTH="32" HEIGHT="31" ALIGN="MIDDLE" BORDER="0"
SRC="img52.png"
ALT="$n_{k,i}$">, possibly
overlapping. For each <IMG
WIDTH="11" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img30.png"
ALT="$i$"> we consider the restriction
operator <!-- MATH
$R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$
-->
<IMG
WIDTH="110" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img53.png"
ALT="$R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$">
that maps a vector <IMG
WIDTH="23" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
SRC="img54.png"
ALT="$x^k$"> to the vector <IMG
WIDTH="22" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img55.png"
ALT="$x_i^k$"> made of the components of <IMG
WIDTH="23" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
SRC="img54.png"
ALT="$x^k$">
with indices in <IMG
WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img51.png"
ALT="$\Omega^k_i$">, and the prolongation operator
<!-- MATH
$P^k_i = (R_i^k)^T$
-->
<IMG
WIDTH="95" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img56.png"
ALT="$P^k_i = (R_i^k)^T$">. These operators are then used to build
<!-- MATH
$A_i^k=R_i^kA^kP_i^k$
-->
<IMG
WIDTH="113" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img57.png"
ALT="$A_i^k=R_i^kA^kP_i^k$">, which is the restriction of <IMG
WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
SRC="img41.png"
ALT="$A^k$"> to the index
space <IMG
WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img51.png"
ALT="$\Omega^k_i$">.
The classical AS preconditioner <IMG
WIDTH="41" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img58.png"
ALT="$M^k_{AS}$"> is defined as
</FONT></FONT></FONT>
<BR><P></P>
<DIVALIGN="CENTER">
</DIV><TABLEWIDTH="90%">
<TR><TD>
<PRE>
use psb_base_mod
use mld_prec_mod
use psb_krylov_mod
... ...
!
! sparse matrix
type(psb_dspmat_type) :: A
! sparse matrix descriptor
type(psb_desc_type) :: desc_A
! preconditioner
type(mld_dprec_type) :: P
! right-hand side and solution vectors
type(psb_d_vect_type) :: b, x
... ...
!
! initialize the parallel environment
call psb_init(ictxt)
call psb_info(ictxt,iam,np)
... ...
!
! read and assemble the spd matrix A and the right-hand side b
! using PSBLAS routines for sparse matrix / vector management
... ...
!
! initialize the default multi-level preconditioner, i.e. V-cycle
! with basic smoothed aggregation, 1 hybrid forward/backward
! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
<P></P><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
where <IMG
WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img60.png"
ALT="$A_i^k$"> is supposed to be nonsingular. We observe that an approximate
inverse of <IMG
WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img60.png"
ALT="$A_i^k$"> is usually considered instead of <IMG
WIDTH="57" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img61.png"
ALT="$(A_i^k)^{-1}$">.
The setup of <IMG
WIDTH="41" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img58.png"
ALT="$M^k_{AS}$"> during the multilevel build phase
involves
</FONT></FONT></FONT>
<UL>
<LI>the definition of the index subspaces <IMG
WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img62.png"
ALT="$\Omega_i^k$"> and of the corresponding
operators <IMG
WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img63.png"
ALT="$R_i^k$"> (and <IMG
WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img64.png"
ALT="$P_i^k$">);
</LI>
<LI>the computation of the submatrices <IMG
WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img60.png"
ALT="$A_i^k$">;
</LI>
<LI>the computation of their inverses (usually approximated
through some form of incomplete factorization).
</LI>
</UL><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
The computation of <!-- MATH
$z^k=M^k_{AS}w^k$
-->
<IMG
WIDTH="102" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img65.png"
ALT="$z^k=M^k_{AS}w^k$">, with <!-- MATH
$w^k \in \mathbb{R}^{n_k}$
-->
<IMG
WIDTH="76" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img66.png"
ALT="$w^k \in \mathbb{R}^{n_k}$">, during the
multilevel application phase, requires
</FONT></FONT></FONT>
<UL>
<LI>the restriction of <IMG
WIDTH="25" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
SRC="img67.png"
ALT="$w^k$"> to the subspaces <!-- MATH
$\mathbb{R}^{n_{k,i}}$
-->
<IMG
WIDTH="41" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
SRC="img68.png"
ALT="$\mathbb{R}^{n_{k,i}}$">,
i.e. <!-- MATH
$w_i^k = R_i^{k} w^k$
-->
<IMG
WIDTH="91" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img69.png"
ALT="$w_i^k = R_i^{k} w^k$">;
</LI>
<LI>the computation of the vectors <!-- MATH
$z_i^k=(A_i^k)^{-1} w_i^k$
-->
<IMG
WIDTH="119" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img70.png"
ALT="$z_i^k=(A_i^k)^{-1} w_i^k$">;
</LI>
<LI>the prolongation and the sum of the previous vectors,
i.e. <!-- MATH
$z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$
-->
<IMG
WIDTH="127" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
SRC="img71.png"
ALT="$z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$">.
</LI>
</UL><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
Variants of the classical AS method, which use modifications of the
restriction and prolongation operators, are also implemented in MLD2P4.
Among them, the Restricted AS (RAS) preconditioner usually
outperforms the classical AS preconditioner in terms of convergence
rate and of computation and communication time on parallel distributed-memory
computers, and is therefore the most widely used among the AS
preconditioners [<A
HREF="node29.html#CAI_SARKIS">6</A>].
Direct solvers based on sparse LU factorizations, implemented in the
third-party libraries reported in Section <AHREF="node7.html#sec:third-party">3.2</A>, can be applied
as coarsest-level solvers by MLD2P4. Native inexact solvers based on
incomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel,
and block Jacobi preconditioners are also available. Direct solvers usually
lead to more effective preconditioners in terms of algorithmic scalability;
however, this does not guarantee parallel efficiency.
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Note that the strings are case insensitive.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Whether the other arguments apply only to the pre-smoother (<code>'PRE'</code>)
or to the post-smoother (<code>'POST'</code>). If <code>pos</code> is not present,
the other arguments are applied to both smoothers.
If the preconditioner is one-level or the parameter identified by <code>what</code>
does not concern the smoothers, <code>pos</code> is ignored.
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Note that the strings are case insensitive.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node27.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node27.html#sec:errors">8</A>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Whether the other arguments apply only to the pre-smoother (<code>'PRE'</code>)
or to the post-smoother (<code>'POST'</code>). If <code>pos</code> is not present,
the other arguments are applied to both smoothers.
If the preconditioner is one-level or the parameter identified by <code>what</code>
does not concern the smoothers, <code>pos</code> is ignored.
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=187>Matrix used in computing the smoothed
prolongator: filtered or unfiltered (see (<AHREF="node13.html#eq:filtered">5</A>) in Section <AHREF="node13.html#sec:aggregation">4.2</A>).</TD>
</TR>
<TR><TDALIGN="LEFT"COLSPAN=5><B>Note.</B> Different thresholds at different levels, such as
those used in [<A
HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>, Section 5.1], can be easily set by
invoking the rou-</TD>
</TR>
<TR><TDALIGN="LEFT"COLSPAN=5>tine <TT>set</TT> with
the parameter <TT>ilev</TT>.</TD>
</TR>
</TABLE>
</DIV>
</TD></TR>
</TABLE>
</DIV><P></P>
<BR><FONTSIZE="+1"><FONTSIZE="+1"></FONT></FONT>
<P>
<FONTSIZE="+1"><FONTSIZE="+1"></FONT></FONT>
<BR><P></P>
<DIVALIGN="CENTER"><ANAME="1304"></A>
<TABLE>
<CAPTION><STRONG>Table 5:</STRONG>
Parameters defining the coarse-space correction at the coarsest
@ -93,39 +91,53 @@ hierarchy produced by a previous call to <code>hierarchy_build</code>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node27.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor associated to the matrix to be
preconditioned.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Workspace. Its size should be at
least <code>4 * psb_cd_get_local_</code><code>cols(desc_a)</code> (see the PSBLAS User's Guide).
Note that <I>type</I> and <I>kind_parameter</I> must be chosen according
to the real/complex, single/double precision version of MLD2P4 under use.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node27.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=298><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The sparse matrix structure containing the local part of the
matrix to be preconditioned. Note that <I>x</I> must be chosen according
to the real/complex, single/double precision version of MLD2P4 under use.
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node27.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node27.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The user-defined new solver to be employed in the
preconditioner.
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=298><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node27.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node27.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">The M<SMALL>ULTI-</SMALL>L<SMALL>EVEL </SMALL>D<SMALL>OMAIN </SMALL>D<SMALL>ECOMPOSITION </SMALL>P<SMALL>ARALLEL </SMALL>P<SMALL>RECONDITIONERS </SMALL>P<SMALL>ACKAGE BASED ON
</SMALL>PSBLAS (MLD2P4) provides parallel Algebraic MultiGrid (AMG) and Domain
Decomposition preconditioners (see, e.g., [<A
HREF="node27.html#Briggs2000">3</A>,<A
HREF="node27.html#Stuben_01">23</A>,<A
HREF="node27.html#dd2_96">21</A>]),
HREF="node29.html#Briggs2000">3</A>,<A
HREF="node29.html#Stuben_01">23</A>,<A
HREF="node29.html#dd2_96">21</A>]),
to be used in the iterative solution of linear systems,
</FONT></FONT></FONT>
<BR>
@ -95,8 +95,8 @@ multi-level cycles and smoothers widely used in multigrid methods.
<FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">The multi-level preconditioners implemented in MLD2P4 are obtained by combining
AMG cycles with smoothers and coarsest-level solvers. The V-, W-, and
K-cycles [<A
HREF="node27.html#Briggs2000">3</A>,<A
HREF="node27.html#Notay2008">19</A>] are available, which allow to define
HREF="node29.html#Briggs2000">3</A>,<A
HREF="node29.html#Notay2008">19</A>] are available, which allow to define
almost all the preconditioners in the package, including the multi-level hybrid
Schwarz ones; a specific cycle is implemented to obtain multi-level additive
Schwarz preconditioners. The Jacobi, hybridforward/backward Gauss-Seidel, block-Jacobi, and additive Schwarz methods
@ -104,8 +104,8 @@ are available as smoothers. An algebraic approach is used to generate a hierarch
coarse-level matrices and operators, without explicitly using any information on the
geometry of the original problem, e.g., the discretization of a PDE. To this end,
the smoothed aggregation technique [<A
HREF="node27.html#BREZINA_VANEK">2</A>,<A
HREF="node27.html#VANEK_MANDEL_BREZINA">25</A>]
HREF="node29.html#BREZINA_VANEK">2</A>,<A
HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>]
is applied. Either exact or approximate solvers can be used on the coarsest-level
system. Specifically, different sparse LU factorizations from external
packages, and native incomplete LU factorizations and Jacobi, hybrid Gauss-Seidel,
@ -126,8 +126,8 @@ interface.
<FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">MLD2P4 has been designed to implement scalable and easy-to-use
multilevel preconditioners in the context of the PSBLAS (Parallel Sparse BLAS)
computational framework [<A
HREF="node27.html#psblas_00">15</A>,<A
HREF="node27.html#PSBLAS3">14</A>]. PSBLAS provides basic linear algebra
HREF="node29.html#psblas_00">15</A>,<A
HREF="node29.html#PSBLAS3">14</A>]. PSBLAS provides basic linear algebra
operators and data management facilities for distributed sparse matrices,
as well as parallel Krylov solvers which can be used with the MLD2P4 preconditioners.
The choice of PSBLAS has been mainly motivated by the need of having
@ -150,14 +150,14 @@ few black-box routines at the upper layer allow all users to easily
build and apply any preconditioner available in MLD2P4;
facilities are also available allowing expert users to extend the set of smoothers
and solvers for building new versions of the preconditioners (see