</H1>
Multigrid preconditioners, coupled with Krylov iterative
solvers, are widely used in the parallel solution of large and sparse linear systems,
because of their optimality in the solution of linear systems arising from the
discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.
Optimality, also known as algorithmic scalability, is the property
of having a computational cost per iteration that depends linearly on
the problem size, and a convergence rate that is independent of the problem size.
Multigrid preconditioners are based on a recursive application of a two-grid process
consisting of smoother iterations and a coarse-space (or coarse-level) correction.
The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,
or more complex subspace-correction methods, such as the Schwarz ones.
The coarse-space correction consists of solving, in an appropriately chosen
coarse space, the residual equation associated with the approximate solution computed
by the smoother, and of using the solution of this equation to correct the
previous approximation. The transfer of information between the original
(fine) space and the coarse one is performed by using suitable restriction and
prolongation operators. The construction of the coarse space and the corresponding
transfer operators is carried out by applying a so-called coarsening algorithm to the system
matrix. Two main approaches can be used to perform coarsening: the geometric approach,
which exploits the knowledge of some physical grid associated with the matrix
and requires the user to define transfer operators from the fine
to the coarse level and vice versa, and the algebraic approach, which builds
the coarse-space correction and the associate transfer operators using only matrix
information. The first approach may be difficult when the system comes from
discretizations on complex geometries;
furthermore, ad hoc one-level smoothers may be required to get an efficient
interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients
are considered. The second approach performs a fully automatic coarsening and enforces the
interplay between fine and coarse level by suitably choosing the coarse space and
the coarse-to-fine interpolation (see, e.g., [<A
</H1><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
Multigrid preconditioners, coupled with Krylov iterative
solvers, are widely used in the parallel solution of large and sparse linear systems,
because of their optimality in the solution of linear systems arising from the
discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.
Optimality, also known as algorithmic scalability, is the property
of having a computational cost per iteration that depends linearly on
the problem size, and a convergence rate that is independent of the problem size.
Multigrid preconditioners are based on a recursive application of a two-grid process
consisting of smoother iterations and a coarse-space (or coarse-level) correction.
The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,
or more complex subspace-correction methods, such as the Schwarz ones.
The coarse-space correction consists of solving, in an appropriately chosen
coarse space, the residual equation associated with the approximate solution computed
by the smoother, and of using the solution of this equation to correct the
previous approximation. The transfer of information between the original
(fine) space and the coarse one is performed by using suitable restriction and
prolongation operators. The construction of the coarse space and the corresponding
transfer operators is carried out by applying a so-called coarsening algorithm to the system
matrix. Two main approaches can be used to perform coarsening: the geometric approach,
which exploits the knowledge of some physical grid associated with the matrix
and requires the user to define transfer operators from the fine
to the coarse level and vice versa, and the algebraic approach, which builds
the coarse-space correction and the associate transfer operators using only matrix
information. The first approach may be difficult when the system comes from
discretizations on complex geometries;
furthermore, ad hoc one-level smoothers may be required to get an efficient
interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients
are considered. The second approach performs a fully automatic coarsening and enforces the
interplay between fine and coarse level by suitably choosing the coarse space and
the coarse-to-fine interpolation (see, e.g., [<A
HREF="node27.html#Briggs2000">3</A>,<A
HREF="node27.html#Briggs2000">3</A>,<A
HREF="node27.html#Stuben_01">23</A>,<A
HREF="node27.html#Stuben_01">23</A>,<A
HREF="node27.html#dd2_96">21</A>] for details.)
MLD2P4 uses a pure algebraic approach, based on the smoothed
aggregation algorithm [<A
HREF="node27.html#dd2_96">21</A>] for details.)
MLD2P4 uses a pure algebraic approach, based on the smoothed
aggregation algorithm [<A
@ -64,7 +64,8 @@ Multigrid Background
HREF="node27.html#para_04">4</A>,<A
HREF="node27.html#para_04">4</A>,<A
HREF="node27.html#aaecc_07">5</A>,<A
HREF="node27.html#aaecc_07">5</A>,<A
HREF="node27.html#apnum_07">7</A>,<A
HREF="node27.html#apnum_07">7</A>,<A
HREF="node27.html#MLD2P4_TOMS">8</A>].
We note that optimal multigrid preconditioners do not necessarily correspond
to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
multigrid preconditioners, a tradeoff between the optimality and the cost of building and
applying the smoothers and the coarse-space corrections must be achieved. Effective
parallel preconditioners require algorithmic scalability to be coupled with implementation
scalability, i.e., a computational cost per iteration which remains (almost) constant as
the number of parallel processors increases.
<BR><HR>
HREF="node27.html#MLD2P4_TOMS">8</A>].
We note that optimal multigrid preconditioners do not necessarily correspond
to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
multigrid preconditioners, a tradeoff between the optimality and the cost of building and
applying the smoothers and the coarse-space corrections must be achieved. Effective
parallel preconditioners require algorithmic scalability to be coupled with implementation
scalability, i.e., a computational cost per iteration which remains (almost) constant as
the number of parallel processors increases.
</FONT></FONT></FONT>
</H2>
In order to describe the AMG preconditioners available in MLD2P4, we consider a
linear system
<BR>
</H2><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
In order to describe the AMG preconditioners available in MLD2P4, we consider a
linear system
</FONT></FONT></FONT>
<BRCLEAR="ALL"></DIV><P></P><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">
where <!-- MATH
where <!-- MATH
$A=(a_{ij}) \in \mathbb{R}^{n \times n}$
$A=(a_{ij}) \in \mathbb{R}^{n \times n}$
-->
-->
<IMG
<IMG
WIDTH="137" HEIGHT="38" ALIGN="MIDDLE" BORDER="0"
WIDTH="137" HEIGHT="38" ALIGN="MIDDLE" BORDER="0"
SRC="img4.png"
SRC="img5.png"
ALT="$A=(a_{ij}) \in \mathbb{R}^{n \times n}$"> is a nonsingular sparse matrix;
for ease of presentation we assume <IMG
ALT="$A=(a_{ij}) \in \mathbb{R}^{n \times n}$"> is a nonsingular sparse matrix;
for ease of presentation we assume <IMG
WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
SRC="img2.png"
SRC="img3.png"
ALT="$A$"> is real, but the
results are valid for the complex case as well.
Let us assume as finest index space the set of row (column) indices of <IMG
ALT="$A$"> is real, but the
results are valid for the complex case as well.
Let us assume as finest index space the set of row (column) indices of <IMG
WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
SRC="img2.png"
SRC="img3.png"
ALT="$A$">, i.e.,
<!-- MATH
ALT="$A$">, i.e.,
<!-- MATH
$\Omega = \{1, 2, \ldots, n\}$
$\Omega = \{1, 2, \ldots, n\}$
-->
-->
<IMG
<IMG
WIDTH="132" HEIGHT="36" ALIGN="MIDDLE" BORDER="0"
WIDTH="132" HEIGHT="36" ALIGN="MIDDLE" BORDER="0"
SRC="img5.png"
SRC="img6.png"
ALT="$\Omega = \{1, 2, \ldots, n\}$">.
Any algebraic multilevel preconditioners implemented in MLD2P4 generates
a hierarchy of index spaces and a corresponding hierarchy of matrices,
<BR><P></P>
ALT="$\Omega = \{1, 2, \ldots, n\}$">.
Any algebraic multilevel preconditioners implemented in MLD2P4 generates
a hierarchy of index spaces and a corresponding hierarchy of matrices,
</FONT></FONT></FONT>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Note that the strings are case insensitive.</TD>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Note that the strings are case insensitive.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</TD>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A>
</FONT></FONT></FONT></TD>
for details.</TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Whether the other arguments apply only to the pre-smoother (<code>'PRE'</code>)
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Whether the other arguments apply only to the pre-smoother (<code>'PRE'</code>)
or to the post-smoother (<code>'POST'</code>). If <code>pos</code> is not present,
or to the post-smoother (<code>'POST'</code>). If <code>pos</code> is not present,
the other arguments are applied to both smoothers.
the other arguments are applied to both smoothers.
If the preconditioner is one-level or the parameter identified by <code>what</code>
If the preconditioner is one-level or the parameter identified by <code>what</code>
does not concern the smoothers, <code>pos</code> is ignored.</TD>
does not concern the smoothers, <code>pos</code> is ignored.
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</TD>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</TD>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> The communication descriptor of <code>a</code>. See the PSBLAS User's Guide for
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</TD>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</TD>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=298>Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</TD>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=298><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340>Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</TD>
</FONT></FONT></FONT></TD>
<TDALIGN="LEFT"VALIGN="TOP"WIDTH=340><FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1"> Error code. If no error, 0 is returned. See Section <AHREF="node25.html#sec:errors">8</A> for details.</FONT></FONT></FONT></TD>
The M<SMALL>ULTI-</SMALL>L<SMALL>EVEL </SMALL>D<SMALL>OMAIN </SMALL>D<SMALL>ECOMPOSITION </SMALL>P<SMALL>ARALLEL </SMALL>P<SMALL>RECONDITIONERS </SMALL>P<SMALL>ACKAGE BASED ON
<FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">The M<SMALL>ULTI-</SMALL>L<SMALL>EVEL </SMALL>D<SMALL>OMAIN </SMALL>D<SMALL>ECOMPOSITION </SMALL>P<SMALL>ARALLEL </SMALL>P<SMALL>RECONDITIONERS </SMALL>P<SMALL>ACKAGE BASED ON
</SMALL>PSBLAS (MLD2P4) provides parallel Algebraic MultiGrid (AMG) and Domain
</SMALL>PSBLAS (MLD2P4) provides parallel Algebraic MultiGrid (AMG) and Domain
Decomposition preconditioners (see, e.g., [<A
Decomposition preconditioners (see, e.g., [<A
HREF="node27.html#Briggs2000">3</A>,<A
HREF="node27.html#Briggs2000">3</A>,<A
HREF="node27.html#Stuben_01">23</A>,<A
HREF="node27.html#Stuben_01">23</A>,<A
HREF="node27.html#dd2_96">21</A>]),
HREF="node27.html#dd2_96">21</A>]),
to be used in the iterative solution of linear systems,
to be used in the iterative solution of linear systems,
In order to build MLD2P4 it is necessary to set up a Makefile with appropriate
In order to build MLD2P4 it is necessary to set up a Makefile with appropriate
system-dependent variables; this is done by means of the <code>configure</code>
system-dependent variables; this is done by means of the <code>configure</code>
script. The distribution also includes the autoconf and automake
script. The distribution also includes the autoconf and automake
sources employed to generate the script, but usually this is not needed
sources employed to generate the script, but usually this is not needed
to build the software.
to build the software.
</FONT></FONT></FONT>
<P>
<P>
MLD2P4 is implemented almost entirely in Fortran 2003, with some
<FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">MLD2P4 is implemented almost entirely in Fortran 2003, with some
interfaces to external libraries in C; the Fortran compiler
interfaces to external libraries in C; the Fortran compiler
must support the Fortran 2003 standard plus the extension <code>MOLD=</code>
must support the Fortran 2003 standard plus the extension <code>MOLD=</code>
feature, which enhances the usability of <code>ALLOCATE</code>.
feature, which enhances the usability of <code>ALLOCATE</code>.
@ -72,17 +72,18 @@ supported by the GNU Fortran compiler, for which we
recommend to use at least version 4.8.
recommend to use at least version 4.8.
The software defines data types and interfaces for
The software defines data types and interfaces for
real and complex data, in both single and double precision.
real and complex data, in both single and double precision.
</FONT></FONT></FONT>
<P>
<P>
Building MLD2P4 requires some base libraries (see Section <AHREF="node6.html#sec:prerequisites">3.1</A>);
<FONTSIZE="+1"><FONTSIZE="+1"><FONTSIZE="+1">Building MLD2P4 requires some base libraries (see Section <AHREF="node6.html#sec:prerequisites">3.1</A>);
interfaces to optional third-party libraries, which extend the functionalities of MLD2P4
interfaces to optional third-party libraries, which extend the functionalities of MLD2P4
(see Section <AHREF="node7.html#sec:third-party">3.2</A>), are also available. Many Linux distributions
(see Section <AHREF="node7.html#sec:third-party">3.2</A>), are also available. Many Linux distributions
(e.g., Ubuntu, Fedora, CentOS) provide precompiled packages for the prerequisite and
(e.g., Ubuntu, Fedora, CentOS) provide precompiled packages for the prerequisite and
optional software. In many cases these packages are split between a runtime part and a
optional software. In many cases these packages are split between a runtime part and a
``developer'' part; in order to build MLD2P4 you need both. A description of the base and
``developer'' part; in order to build MLD2P4 you need both. A description of the base and
optional software used by MLD2P4 is given in the next sections.
optional software used by MLD2P4 is given in the next sections.