From 92e47008bd73e35a97f429e56fd9d6fc464838bb Mon Sep 17 00:00:00 2001
From: Salvatore Filippone <salvatore.filippone@cranfield.ac.uk>
Date: Tue, 25 Jul 2017 14:19:05 +0100
Subject: [PATCH] Fix from Mac coding to Unix coding.

---
 docs/html/node11.html     |  68 ++++++++-
 docs/html/node12.html     | 127 +++++++++++++----
 docs/html/node13.html     | 154 +++++++++++++++------
 docs/html/node14.html     | 117 +++++++++++-----
 docs/html/node15.html     |   2 +-
 docs/html/node16.html     |   8 +-
 docs/html/node19.html     |  14 +-
 docs/mld2p4-2.1-guide.pdf |   6 +-
 docs/src/background.tex   | 282 +++++++++++++++++++++++++++++++++++++-
 9 files changed, 653 insertions(+), 125 deletions(-)
diff --git a/docs/html/node11.html b/docs/html/node11.html
index 6b586910..28beb567 100644
--- a/docs/html/node11.html
+++ b/docs/html/node11.html
@@ -54,17 +54,75 @@ original version by:  Nikos Drakos, CBLU, University of Leeds
 <H1><A NAME="SECTION00060000000000000000"></A><A NAME="sec:background"></A>
 <BR>
 Multigrid Background
-</H1><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">Multigrid preconditioners, coupled with Krylov iterativesolvers, are widely used in the parallel solution of large and sparse linear systems,because of their optimality in the solution of linear systems arising from thediscretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.Optimality, also known as algorithmic scalability, is the property of having a computational cost per iteration that depends linearly onthe problem size, and a convergence rate that is independent of the problem size.Multigrid preconditioners are based on a recursive application of a two-grid processconsisting of smoother iterations and a coarse-space (or coarse-level) correction.The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,or more complex subspace-correction methods, such as the Schwarz ones.The coarse-space correction consists of solving, in an appropriately chosencoarse space, the residual equation associated with the approximate solution computedby the smoother, and of using the solution of this equation to correct theprevious approximation. The transfer of information between the original(fine) space and the coarse one is performed by using suitable restriction andprolongation operators. The construction of the coarse space and the correspondingtransfer operators is carried out by applying a so-called coarsening algorithm to the systemmatrix. Two main approaches can be used to perform coarsening: the geometric approach,which exploits the knowledge of some physical grid associated with the matrixand requires the user to define transfer operators from the fineto the coarse level and vice versa, and the algebraic approach, which buildsthe coarse-space correction and the associate transfer operators using only matrixinformation. The first approach may be difficult when the system comes fromdiscretizations on complex geometries;furthermore, ad hoc one-level smoothers may be required to get an efficientinterplay between fine and coarse levels, e.g., when matrices with highly varying coefficientsare considered. The second approach performs a fully automatic coarsening and enforces theinterplay between fine and coarse level by suitably choosing the coarse space andthe coarse-to-fine interpolation (see, e.g., [<A
+</H1><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1"> 
+
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">Multigrid preconditioners, coupled with Krylov iterative
+solvers, are widely used in the parallel solution of large and sparse linear systems,
+because of their optimality in the solution of linear systems arising from the
+discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.
+Optimality, also known as algorithmic scalability, is the property 
+of having a computational cost per iteration that depends linearly on
+the problem size, and a convergence rate that is independent of the problem size.
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">Multigrid preconditioners are based on a recursive application of a two-grid process
+consisting of smoother iterations and a coarse-space (or coarse-level) correction.
+The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,
+or more complex subspace-correction methods, such as the Schwarz ones.
+The coarse-space correction consists of solving, in an appropriately chosen
+coarse space, the residual equation associated with the approximate solution computed
+by the smoother, and of using the solution of this equation to correct the
+previous approximation. The transfer of information between the original
+(fine) space and the coarse one is performed by using suitable restriction and
+prolongation operators. The construction of the coarse space and the corresponding
+transfer operators is carried out by applying a so-called coarsening algorithm to the system
+matrix. Two main approaches can be used to perform coarsening: the geometric approach,
+which exploits the knowledge of some physical grid associated with the matrix
+and requires the user to define transfer operators from the fine
+to the coarse level and vice versa, and the algebraic approach, which builds
+the coarse-space correction and the associate transfer operators using only matrix
+information. The first approach may be difficult when the system comes from
+discretizations on complex geometries;
+furthermore, ad hoc one-level smoothers may be required to get an efficient
+interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients
+are considered. The second approach performs a fully automatic coarsening and enforces the
+interplay between fine and coarse level by suitably choosing the coarse space and
+the coarse-to-fine interpolation (see, e.g., [<A
  HREF="node29.html#Briggs2000">3</A>,<A
  HREF="node29.html#Stuben_01">23</A>,<A
- HREF="node29.html#dd2_96">21</A>] for details.)MLD2P4 uses a pure algebraic approach, based on the smoothed aggregation algorithm [<A
+ HREF="node29.html#dd2_96">21</A>] for details.)
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">MLD2P4 uses a pure algebraic approach, based on the smoothed 
+aggregation algorithm [<A
  HREF="node29.html#BREZINA_VANEK">2</A>,<A
- HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>],for building the sequence of coarse matrices and transfer operators,starting from the original one.A decoupled version of this algorithm is implemented, where the smoothedaggregation is applied locally to each submatrix [<A
- HREF="node29.html#TUMINARO_TONG">24</A>].A brief description of the AMG preconditioners implemented in MLD2P4 is given in Sections&nbsp;<A HREF="node12.html#sec:multilevel">4.1</A>-<A HREF="node14.html#sec:smoothers">4.3</A>. For further details the readeris referred to [<A
+ HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>],
+for building the sequence of coarse matrices and transfer operators,
+starting from the original one.
+A decoupled version of this algorithm is implemented, where the smoothed
+aggregation is applied locally to each submatrix [<A
+ HREF="node29.html#TUMINARO_TONG">24</A>].
+A brief description of the AMG preconditioners implemented in MLD2P4 is given in 
+Sections&nbsp;<A HREF="node12.html#sec:multilevel">4.1</A>-<A HREF="node14.html#sec:smoothers">4.3</A>. For further details the reader
+is referred to [<A
  HREF="node29.html#para_04">4</A>,<A
  HREF="node29.html#aaecc_07">5</A>,<A
  HREF="node29.html#apnum_07">7</A>,<A
- HREF="node29.html#MLD2P4_TOMS">8</A>].We note that optimal multigrid preconditioners do not necessarily correspondto minimum execution times in a parallel setting. Indeed, to obtain effective parallelmultigrid preconditioners, a tradeoff between the optimality and the cost of building andapplying the smoothers and the coarse-space corrections must be achieved. Effectiveparallel preconditioners require algorithmic scalability to be coupled with implementationscalability, i.e., a computational cost per iteration which remains (almost) constant asthe number of parallel processors increases.</FONT></FONT></FONT>
+ HREF="node29.html#MLD2P4_TOMS">8</A>].
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">We note that optimal multigrid preconditioners do not necessarily correspond
+to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
+multigrid preconditioners, a tradeoff between the optimality and the cost of building and
+applying the smoothers and the coarse-space corrections must be achieved. Effective
+parallel preconditioners require algorithmic scalability to be coupled with implementation
+scalability, i.e., a computational cost per iteration which remains (almost) constant as
+the number of parallel processors increases.
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><HR>
 <!--Table of Child-Links-->
 <A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></A>
diff --git a/docs/html/node12.html b/docs/html/node12.html
index 42fbc14c..9de5d48c 100644
--- a/docs/html/node12.html
+++ b/docs/html/node12.html
@@ -54,7 +54,11 @@ original version by:  Nikos Drakos, CBLU, University of Leeds
 <H2><A NAME="SECTION00061000000000000000"></A><A NAME="sec:multilevel"></A>
 <BR>
 AMG preconditioners
-</H2><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">In order to describe the AMG preconditioners available in MLD2P4, we consider alinear system</FONT></FONT></FONT>
+</H2><FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">In order to describe the AMG preconditioners available in MLD2P4, we consider a
+linear system
+</FONT></FONT></FONT>
 <BR>
 <DIV ALIGN="RIGHT">
 
@@ -73,30 +77,41 @@ Ax=b,
 <TD WIDTH=10 ALIGN="RIGHT">
 (2)</TD></TR>
 </TABLE>
-<BR CLEAR="ALL"></DIV><P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">where <!-- MATH
+<BR CLEAR="ALL"></DIV><P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+where <!-- MATH
  $A=(a_{ij}) \in \mathbb{R}^{n \times n}$
  -->
 <IMG
  WIDTH="137" HEIGHT="38" ALIGN="MIDDLE" BORDER="0"
  SRC="img5.png"
- ALT="$A=(a_{ij}) \in \mathbb{R}^{n \times n}$"> is a nonsingular sparse matrix;for ease of presentation we assume <IMG
+ ALT="$A=(a_{ij}) \in \mathbb{R}^{n \times n}$"> is a nonsingular sparse matrix;
+for ease of presentation we assume <IMG
  WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
  SRC="img3.png"
- ALT="$A$"> is real, but theresults are valid for the complex case as well. Let us assume as finest index space the set of row (column) indices of <IMG
+ ALT="$A$"> is real, but the
+results are valid for the complex case as well. 
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">Let us assume as finest index space the set of row (column) indices of <IMG
  WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
  SRC="img3.png"
- ALT="$A$">, i.e.,<!-- MATH
+ ALT="$A$">, i.e.,
+<!-- MATH
  $\Omega = \{1, 2, \ldots, n\}$
  -->
 <IMG
  WIDTH="132" HEIGHT="36" ALIGN="MIDDLE" BORDER="0"
  SRC="img6.png"
- ALT="$\Omega = \{1, 2, \ldots, n\}$">. Any algebraic multilevel preconditioners implemented in MLD2P4 generatesa hierarchy of index spaces and a corresponding hierarchy of matrices,</FONT></FONT></FONT>
+ ALT="$\Omega = \{1, 2, \ldots, n\}$">. 
+Any algebraic multilevel preconditioners implemented in MLD2P4 generates
+a hierarchy of index spaces and a corresponding hierarchy of matrices,
+</FONT></FONT></FONT>
 <BR><P></P>
 <DIV ALIGN="CENTER">
 <!-- MATH
  \begin{displaymath}
-\Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev},\quad A^1 \equiv A, A^2, \ldots, A^{nlev},
+\Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev},
+\quad A^1 \equiv A, A^2, \ldots, A^{nlev},
 \end{displaymath}
  -->
 
@@ -106,13 +121,16 @@ Ax=b,
  ALT="\begin{displaymath}\Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev},\quad A^1 \equiv A, A^2, \ldots, A^{nlev}, \end{displaymath}">
 </DIV>
 <BR CLEAR="ALL">
-<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">by using the information contained in <IMG
+<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+by using the information contained in <IMG
  WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
  SRC="img3.png"
- ALT="$A$">, without assuming anyknowledge of the geometry of the problem from which <IMG
+ ALT="$A$">, without assuming any
+knowledge of the geometry of the problem from which <IMG
  WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
  SRC="img3.png"
- ALT="$A$"> originates.A vector space <!-- MATH
+ ALT="$A$"> originates.
+A vector space <!-- MATH
  $\mathbb{R}^{n_{k}}$
  -->
 <IMG
@@ -121,27 +139,32 @@ Ax=b,
  ALT="$\mathbb{R}^{n_{k}}$"> is associated with <IMG
  WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img9.png"
- ALT="$\Omega^k$">,where <IMG
+ ALT="$\Omega^k$">,
+where <IMG
  WIDTH="23" HEIGHT="31" ALIGN="MIDDLE" BORDER="0"
  SRC="img10.png"
  ALT="$n_k$"> is the size of <IMG
  WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img9.png"
- ALT="$\Omega^k$">.For all <IMG
+ ALT="$\Omega^k$">.
+For all <IMG
  WIDTH="71" HEIGHT="34" ALIGN="MIDDLE" BORDER="0"
  SRC="img11.png"
- ALT="$k &lt; nlev$">, a restriction operator and a prolongation one are built,which connect two levels <IMG
+ ALT="$k &lt; nlev$">, a restriction operator and a prolongation one are built,
+which connect two levels <IMG
  WIDTH="14" HEIGHT="16" ALIGN="BOTTOM" BORDER="0"
  SRC="img12.png"
  ALT="$k$"> and <IMG
  WIDTH="44" HEIGHT="34" ALIGN="MIDDLE" BORDER="0"
  SRC="img13.png"
- ALT="$k+1$">:</FONT></FONT></FONT>
+ ALT="$k+1$">:
+</FONT></FONT></FONT>
 <BR><P></P>
 <DIV ALIGN="CENTER">
 <!-- MATH
  \begin{displaymath}
-P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad     R^k \in \mathbb{R}^{n_{k+1}\times n_k};
+P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad 
+    R^k \in \mathbb{R}^{n_{k+1}\times n_k};
 \end{displaymath}
  -->
 
@@ -151,10 +174,13 @@ P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad     R^k \in \mathbb{R}^{n_{k+1}\
  ALT="\begin{displaymath} P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad  R^k \in \mathbb{R}^{n_{k+1}\times n_k};\end{displaymath}">
 </DIV>
 <BR CLEAR="ALL">
-<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">the matrix <IMG
+<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+the matrix <IMG
  WIDTH="43" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img15.png"
- ALT="$A^{k+1}$"> is computed by using the previous operators accordingto the Galerkin approach, i.e.,</FONT></FONT></FONT>
+ ALT="$A^{k+1}$"> is computed by using the previous operators according
+to the Galerkin approach, i.e.,
+</FONT></FONT></FONT>
 <BR><P></P>
 <DIV ALIGN="CENTER">
 <!-- MATH
@@ -169,25 +195,34 @@ A^{k+1}=R^kA^kP^k.
  ALT="\begin{displaymath} A^{k+1}=R^kA^kP^k.\end{displaymath}">
 </DIV>
 <BR CLEAR="ALL">
-<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">In the current implementation of MLD2P4 we have <IMG
+<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+In the current implementation of MLD2P4 we have <IMG
  WIDTH="95" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img17.png"
- ALT="$R^k=(P^k)^T$">A smoother with iteration matrix <IMG
+ ALT="$R^k=(P^k)^T$">
+A smoother with iteration matrix <IMG
  WIDTH="32" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img18.png"
  ALT="$M^k$"> is set up at each level <IMG
  WIDTH="71" HEIGHT="34" ALIGN="MIDDLE" BORDER="0"
  SRC="img11.png"
- ALT="$k &lt; nlev$">, and a solveris set up at the coarsest level, so that they are ready for application (for example, setting up a solver based on the <IMG
+ ALT="$k &lt; nlev$">, and a solver
+is set up at the coarsest level, so that they are ready for application 
+(for example, setting up a solver based on the <IMG
  WIDTH="30" HEIGHT="16" ALIGN="BOTTOM" BORDER="0"
  SRC="img19.png"
- ALT="$LU$"> factorization means computingand storing the <IMG
+ ALT="$LU$"> factorization means computing
+and storing the <IMG
  WIDTH="17" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
  SRC="img20.png"
  ALT="$L$"> and <IMG
  WIDTH="18" HEIGHT="16" ALIGN="BOTTOM" BORDER="0"
  SRC="img21.png"
- ALT="$U$"> factors). The construction of the hierarchy of AMG componentsdescribed so far corresponds to the so-called build phase of the preconditioner.</FONT></FONT></FONT>
+ ALT="$U$"> factors). The construction of the hierarchy of AMG components
+described so far corresponds to the so-called build phase of the preconditioner.
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <DIV ALIGN="CENTER"><A NAME="fig:application_alg"></A><A NAME="517"></A>
 <TABLE>
 <CAPTION ALIGN="BOTTOM"><STRONG>Figure 1:</STRONG>
@@ -195,26 +230,60 @@ Application phase of a V-cycle preconditioner.</CAPTION>
 <TR><TD>
 <DIV ALIGN="CENTER">
 <!-- MATH
- $\framebox{\begin{minipage}{.85\textwidth}\begin{tabbing}\quad \=\quad \=\quad \=\quad \\[-3mm]procedure V-cycle$\left(k,A^k,b^k,u^k\right)$\  \\[2mm]\>if $\left(k \ne nlev \right)$\  then \\[1mm]\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$\  \\[1mm]\>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$\  \\[1mm]\>\> $u^{k+1} =$\  V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$\  \\[1mm]\>\> $u^k = u^k + P^{k+1} u^{k+1}$\  \\[1mm]\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$\  \\[1mm]\>else \\[1mm]\>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm]\>endif \\[1mm]\>return $u^k$\  \\[1mm]end\end{tabbing}\end{minipage}}$
+ $\framebox{
+\begin{minipage}{.85\textwidth}
+\begin{tabbing}
+\quad \=\quad \=\quad \=\quad \\[-3mm]
+procedure V-cycle$\left(k,A^k,b^k,u^k\right)$\  \\[2mm]
+\>if $\left(k \ne nlev \right)$\  then \\[1mm]
+\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$\  \\[1mm]
+\>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$\  \\[1mm]
+\>\> $u^{k+1} =$\  V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$\  \\[1mm]
+\>\> $u^k = u^k + P^{k+1} u^{k+1}$\  \\[1mm]
+\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$\  \\[1mm]
+\>else \\[1mm]
+\>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm]
+\>endif \\[1mm]
+\>return $u^k$\  \\[1mm]
+end
+\end{tabbing}
+\end{minipage}
+}$
  -->
 <IMG
  WIDTH="333" HEIGHT="336" ALIGN="BOTTOM" BORDER="0"
  SRC="img22.png"
  ALT="\framebox{\begin{minipage}{.85\textwidth}\begin{tabbing}\quad \=\quad \=\quad...
-...mm]\&gt;endif \\ [1mm]\&gt;return $u^k$\ \\ [1mm]end\end{tabbing}\end{minipage}}">
+...mm]\&gt;endif \ [1mm]\&gt;return $u^k$ \ [1mm]end\end{tabbing}\end{minipage}}">
+
 </DIV></TD></TR>
 </TABLE>
 </DIV>
-<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">The components produced in the build phase may be combined in several waysto obtain different multilevel preconditioners;this is  done in the application phase, i.e., in the computation of a vectorof type <IMG
+<FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">The components produced in the build phase may be combined in several ways
+to obtain different multilevel preconditioners;
+this is  done in the application phase, i.e., in the computation of a vector
+of type <IMG
  WIDTH="82" HEIGHT="21" ALIGN="BOTTOM" BORDER="0"
  SRC="img23.png"
  ALT="$w=B^{-1}v$">, where <IMG
  WIDTH="19" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
  SRC="img24.png"
- ALT="$B$"> denotes the preconditioner, usually within an iterationof a Krylov solver [<A
- HREF="node29.html#Saad_book">20</A>]. An example of such a combination, known asV-cycle, is given in Figure&nbsp;<A HREF="#fig:application_alg">1</A>. In this case, a single iterationof the same smoother is used before and after the the recursive call to the V-cycle (i.e.,in the pre-smoothing and post-smoothing phases); however, different choices can beperformed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycleand W-cycle&nbsp;[<A
- HREF="node29.html#Briggs2000">3</A>], and a version of the K-cycle describedin&nbsp;[<A
- HREF="node29.html#Notay2008">19</A>].  </FONT></FONT></FONT><HR>
+ ALT="$B$"> denotes the preconditioner, usually within an iteration
+of a Krylov solver [<A
+ HREF="node29.html#Saad_book">20</A>]. An example of such a combination, known as
+V-cycle, is given in Figure&nbsp;<A HREF="#fig:application_alg">1</A>. In this case, a single iteration
+of the same smoother is used before and after the the recursive call to the V-cycle (i.e.,
+in the pre-smoothing and post-smoothing phases); however, different choices can be
+performed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycle
+and W-cycle&nbsp;[<A
+ HREF="node29.html#Briggs2000">3</A>], and a version of the K-cycle described
+in&nbsp;[<A
+ HREF="node29.html#Notay2008">19</A>].  
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT><HR>
 <!--Navigation Panel-->
 <A NAME="tex2html229"
   HREF="node13.html">
diff --git a/docs/html/node13.html b/docs/html/node13.html
index f871dd6c..2e779069 100644
--- a/docs/html/node13.html
+++ b/docs/html/node13.html
@@ -54,25 +54,37 @@ original version by:  Nikos Drakos, CBLU, University of Leeds
 <H2><A NAME="SECTION00062000000000000000"></A><A NAME="sec:aggregation"></A>
 <BR>
 Smoothed Aggregation
-</H2><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">In order to define the prolongator <IMG
+</H2><FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">In order to define the prolongator <IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img25.png"
- ALT="$P^k$">, used to computethe coarse-level matrix <IMG
+ ALT="$P^k$">, used to compute
+the coarse-level matrix <IMG
  WIDTH="43" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img15.png"
- ALT="$A^{k+1}$">, MLD2P4 uses the smoothed aggregationalgorithm described in [<A
+ ALT="$A^{k+1}$">, MLD2P4 uses the smoothed aggregation
+algorithm described in [<A
  HREF="node29.html#BREZINA_VANEK">2</A>,<A
- HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>].The basic idea of this algorithm is to build a coarse set of indices<IMG
+ HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>].
+The basic idea of this algorithm is to build a coarse set of indices
+<IMG
  WIDTH="43" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img26.png"
  ALT="$\Omega^{k+1}$"> by suitably grouping the indices of <IMG
  WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img9.png"
- ALT="$\Omega^k$"> into disjointsubsets (aggregates), and to define the coarse-to-fine space transfer operator<IMG
+ ALT="$\Omega^k$"> into disjoint
+subsets (aggregates), and to define the coarse-to-fine space transfer operator
+<IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img25.png"
- ALT="$P^k$"> by applying a suitable smoother to a simple piecewise constantprolongation operator, with the aim of improving the quality of the coarse-space correction.Three main steps can be identified in the smoothed aggregation procedure:</FONT></FONT></FONT>
-
+ ALT="$P^k$"> by applying a suitable smoother to a simple piecewise constant
+prolongation operator, with the aim of improving the quality of the coarse-space correction.
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">Three main steps can be identified in the smoothed aggregation procedure:
+</FONT></FONT></FONT>
 <OL>
 <LI>aggregation of the indices of <IMG
  WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
@@ -80,12 +92,12 @@ Smoothed Aggregation
  ALT="$\Omega^k$"> to obtain <IMG
  WIDTH="43" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img26.png"
- ALT="$\Omega^{k+1}$">;
+ ALT="$\Omega^{k+1}$">;
 </LI>
 <LI>construction of the prolongator <IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img25.png"
- ALT="$P^k$">;
+ ALT="$P^k$">;
 </LI>
 <LI>application of <IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
@@ -96,10 +108,14 @@ Smoothed Aggregation
  ALT="$R^k=(P^k)^T$"> to build <IMG
  WIDTH="43" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img15.png"
- ALT="$A^{k+1}$">.
+ ALT="$A^{k+1}$">.
 </LI>
-</OL><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1"> In order to perform the coarsening step, the smoothed aggregation algorithmdescribed in&nbsp;[<A
- HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>] is used. In this algorithm,each index <!-- MATH
+</OL><FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">In order to perform the coarsening step, the smoothed aggregation algorithm
+described in&nbsp;[<A
+ HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>] is used. In this algorithm,
+each index <!-- MATH
  $j \in \Omega^{k+1}$
  -->
 <IMG
@@ -111,22 +127,26 @@ Smoothed Aggregation
  ALT="$\Omega^k_j$"> of <IMG
  WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img9.png"
- ALT="$\Omega^k$">,consisting of a suitably chosen index <!-- MATH
+ ALT="$\Omega^k$">,
+consisting of a suitably chosen index <!-- MATH
  $i \in \Omega^k$
  -->
 <IMG
  WIDTH="52" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img29.png"
- ALT="$i \in \Omega^k$"> and indices that are (usually) contained in astrongly-coupled neighborood of <IMG
+ ALT="$i \in \Omega^k$"> and indices that are (usually) contained in a
+strongly-coupled neighborood of <IMG
  WIDTH="11" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img30.png"
- ALT="$i$">, i.e.,</FONT></FONT></FONT>
+ ALT="$i$">, i.e.,
+</FONT></FONT></FONT>
 <BR>
 <DIV ALIGN="RIGHT">
 
 <!-- MATH
  \begin{equation}
-\Omega^k_j \subset \mathcal{N}_i^k(\theta) =    \left\{ r \in \Omega^k: |a_{ir}^k| > \theta \sqrt{|a_{ii}^ka_{rr}^k|} \right \} \cup \left\{ i \right\},
+\Omega^k_j \subset \mathcal{N}_i^k(\theta) = 
+   \left\{ r \in \Omega^k: |a_{ir}^k| > \theta \sqrt{|a_{ii}^ka_{rr}^k|} \right \} \cup \left\{ i \right\},
 \end{equation}
  -->
 <TABLE WIDTH="100%" ALIGN="CENTER">
@@ -139,35 +159,54 @@ Smoothed Aggregation
 <TD WIDTH=10 ALIGN="RIGHT">
 (3)</TD></TR>
 </TABLE>
-<BR CLEAR="ALL"></DIV><P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">for a given threshold <!-- MATH
+<BR CLEAR="ALL"></DIV><P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+for a given threshold <!-- MATH
  $\theta \in [0,1]$
  -->
 <IMG
  WIDTH="69" HEIGHT="36" ALIGN="MIDDLE" BORDER="0"
  SRC="img32.png"
  ALT="$\theta \in [0,1]$"> (see&nbsp;[<A
- HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>] for the details).Since this algorithm has a sequential nature, a decoupledversion of it is applied, where each processor independently executesthe algorithm on the set of indices assigned to it in the initial datadistribution. This version is embarrassingly parallel, since it does not require any data communication. On the other hand, it may produce some nonuniform aggregatesand is strongly dependent on the number of processors and on the initial partitioningof the matrix <IMG
+ HREF="node29.html#VANEK_MANDEL_BREZINA">25</A>] for the details).
+Since this algorithm has a sequential nature, a decoupled
+version of it is applied, where each processor independently executes
+the algorithm on the set of indices assigned to it in the initial data
+distribution. This version is embarrassingly parallel, since it does not require any data 
+communication. On the other hand, it may produce some nonuniform aggregates
+and is strongly dependent on the number of processors and on the initial partitioning
+of the matrix <IMG
  WIDTH="18" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
  SRC="img3.png"
- ALT="$A$">. Nevertheless, this parallel algorithm has been chosen forMLD2P4, since it has been shown to produce good results in practice[<A
+ ALT="$A$">. Nevertheless, this parallel algorithm has been chosen for
+MLD2P4, since it has been shown to produce good results in practice
+[<A
  HREF="node29.html#aaecc_07">5</A>,<A
  HREF="node29.html#apnum_07">7</A>,<A
- HREF="node29.html#TUMINARO_TONG">24</A>].The prolongator <IMG
+ HREF="node29.html#TUMINARO_TONG">24</A>].
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">The prolongator <IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img25.png"
- ALT="$P^k$"> is built starting from a tentative prolongator<!-- MATH
+ ALT="$P^k$"> is built starting from a tentative prolongator
+<!-- MATH
  $\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$
  -->
 <IMG
  WIDTH="117" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img33.png"
- ALT="$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$">, defined as</FONT></FONT></FONT>
+ ALT="$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$">, defined as
+</FONT></FONT></FONT>
 <BR>
 <DIV ALIGN="RIGHT">
 
 <!-- MATH
  \begin{equation}
-\bar{P}^k =(\bar{p}_{ij}^k), \quad  \bar{p}_{ij}^k = \left\{ \begin{array}{ll}1 & \quad \mbox{if} \; i \in \Omega^k_j, \\0 & \quad \mbox{otherwise},\end{array} \right.
+\bar{P}^k =(\bar{p}_{ij}^k), \quad  \bar{p}_{ij}^k = 
+\left\{ \begin{array}{ll}
+1 & \quad \mbox{if} \; i \in \Omega^k_j, \\
+0 & \quad \mbox{otherwise},
+\end{array} \right.
 \end{equation}
  -->
 <TABLE WIDTH="100%" ALIGN="CENTER">
@@ -175,36 +214,41 @@ Smoothed Aggregation
  WIDTH="287" HEIGHT="51" BORDER="0"
  SRC="img34.png"
  ALT="\begin{displaymath}\bar{P}^k =(\bar{p}_{ij}^k), \quad \bar{p}_{ij}^k = \left\{...
-...ega^k_j, \\ 0 &amp; \quad \mbox{otherwise},\end{array} \right.
+...ega^k_j, \ 0 &amp; \quad \mbox{otherwise},\end{array} \right.
 \end{displaymath}"></TD>
 <TD WIDTH=10 ALIGN="RIGHT">
 (4)</TD></TR>
 </TABLE>
-<BR CLEAR="ALL"></DIV><P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">where <IMG
+<BR CLEAR="ALL"></DIV><P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+where <IMG
  WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img28.png"
  ALT="$\Omega^k_j$"> is the aggregate of <IMG
  WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img9.png"
- ALT="$\Omega^k$">corresponding to the index <!-- MATH
+ ALT="$\Omega^k$">
+corresponding to the index <!-- MATH
  $j \in \Omega^{k+1}$
  -->
 <IMG
  WIDTH="72" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img27.png"
- ALT="$j \in \Omega^{k+1}$">.<IMG
+ ALT="$j \in \Omega^{k+1}$">.
+<IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img25.png"
  ALT="$P^k$"> is obtained by applying to <IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img35.png"
- ALT="$\bar{P}^k$"> a smoother<!-- MATH
+ ALT="$\bar{P}^k$"> a smoother
+<!-- MATH
  $S^k \in \mathbb{R}^{n_k \times n_k}$
  -->
 <IMG
  WIDTH="101" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img36.png"
- ALT="$S^k \in \mathbb{R}^{n_k \times n_k}$">:</FONT></FONT></FONT>
+ ALT="$S^k \in \mathbb{R}^{n_k \times n_k}$">:
+</FONT></FONT></FONT>
 <BR><P></P>
 <DIV ALIGN="CENTER">
 <!-- MATH
@@ -219,12 +263,17 @@ P^k = S^k \bar{P}^k,
  ALT="\begin{displaymath}P^k = S^k \bar{P}^k,\end{displaymath}">
 </DIV>
 <BR CLEAR="ALL">
-<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">in order to remove nonsmooth components from the range of the prolongator,and hence to improve the convergence properties of the multi-levelmethod&nbsp;[<A
+<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+in order to remove nonsmooth components from the range of the prolongator,
+and hence to improve the convergence properties of the multi-level
+method&nbsp;[<A
  HREF="node29.html#BREZINA_VANEK">2</A>,<A
- HREF="node29.html#Stuben_01">23</A>].A simple choice for <IMG
+ HREF="node29.html#Stuben_01">23</A>].
+A simple choice for <IMG
  WIDTH="25" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
  SRC="img38.png"
- ALT="$S^k$"> is the damped Jacobi smoother:</FONT></FONT></FONT>
+ ALT="$S^k$"> is the damped Jacobi smoother:
+</FONT></FONT></FONT>
 <BR><P></P>
 <DIV ALIGN="CENTER">
 <!-- MATH
@@ -239,25 +288,35 @@ S^k = I - \omega^k (D^k)^{-1} A^k_F ,
  ALT="\begin{displaymath}S^k = I - \omega^k (D^k)^{-1} A^k_F , \end{displaymath}">
 </DIV>
 <BR CLEAR="ALL">
-<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">where <IMG
+<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+where <IMG
  WIDTH="28" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img40.png"
  ALT="$D^k$"> is the diagonal matrix with the same diagonal entries as <IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img41.png"
- ALT="$A^k$">,<!-- MATH
+ ALT="$A^k$">,
+<!-- MATH
  $A^k_F = (\bar{a}_{ij}^k)$
  -->
 <IMG
  WIDTH="87" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img42.png"
- ALT="$A^k_F = (\bar{a}_{ij}^k)$"> is the filtered matrix defined as</FONT></FONT></FONT>
+ ALT="$A^k_F = (\bar{a}_{ij}^k)$"> is the filtered matrix defined as
+</FONT></FONT></FONT>
 <BR>
 <DIV ALIGN="RIGHT">
 
 <!-- MATH
  \begin{equation}
-\bar{a}_{ij}^k =   \left \{ \begin{array}{ll}   a_{ij}^k & \mbox{if } j \in \mathcal{N}_i^k(\theta), \\   0            & \mbox{otherwise},   \end{array} \right.   \; (j \ne i),   \qquad   \bar{a}_{ii}^k = a_{ii}^k - \sum_{j \ne i} (a_{ij}^k - \bar{a}_{ij}^k),
+\bar{a}_{ij}^k =
+   \left \{ \begin{array}{ll}
+   a_{ij}^k & \mbox{if } j \in \mathcal{N}_i^k(\theta), \\
+   0            & \mbox{otherwise},
+   \end{array} \right.
+   \; (j \ne i),
+   \qquad
+   \bar{a}_{ii}^k = a_{ii}^k - \sum_{j \ne i} (a_{ij}^k - \bar{a}_{ij}^k),
 \end{equation}
  -->
 <TABLE WIDTH="100%" ALIGN="CENTER">
@@ -270,13 +329,15 @@ S^k = I - \omega^k (D^k)^{-1} A^k_F ,
 <TD WIDTH=10 ALIGN="RIGHT">
 (5)</TD></TR>
 </TABLE>
-<BR CLEAR="ALL"></DIV><P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">and <IMG
+<BR CLEAR="ALL"></DIV><P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+and <IMG
  WIDTH="24" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
  SRC="img44.png"
  ALT="$\omega^k$"> is an approximation of <IMG
  WIDTH="61" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img45.png"
- ALT="$4/(3\rho^k)$">, where<IMG
+ ALT="$4/(3\rho^k)$">, where
+<IMG
  WIDTH="22" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img46.png"
  ALT="$\rho^k$"> is the spectral radius of <!-- MATH
@@ -286,25 +347,32 @@ S^k = I - \omega^k (D^k)^{-1} A^k_F ,
  WIDTH="83" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img47.png"
  ALT="$(D^k)^{-1}A^k_F$"> [<A
- HREF="node29.html#BREZINA_VANEK">2</A>].In MLD2P4 this approximation is obtained by using <!-- MATH
+ HREF="node29.html#BREZINA_VANEK">2</A>].
+In MLD2P4 this approximation is obtained by using <!-- MATH
  $\| A^k_F \|_\infty$
  -->
 <IMG
  WIDTH="61" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img48.png"
- ALT="$\Vert A^k_F \Vert _\infty$"> as an estimateof <IMG
+ ALT="$\Vert A^k_F \Vert _\infty$"> as an estimate
+of <IMG
  WIDTH="22" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img46.png"
- ALT="$\rho^k$">. Note that for systems coming from uniformly ellipticproblems, filtering the matrix <IMG
+ ALT="$\rho^k$">. Note that for systems coming from uniformly elliptic
+problems, filtering the matrix <IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img41.png"
- ALT="$A^k$"> has little or no effect, and<IMG
+ ALT="$A^k$"> has little or no effect, and
+<IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img41.png"
  ALT="$A^k$"> can be used instead of <IMG
  WIDTH="29" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img49.png"
- ALT="$A^k_F$">. The latter choice is the default in MLD2P4.</FONT></FONT></FONT><HR>
+ ALT="$A^k_F$">. The latter choice is the default in MLD2P4.
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT><HR>
 <!--Navigation Panel-->
 <A NAME="tex2html241"
   HREF="node14.html">
diff --git a/docs/html/node14.html b/docs/html/node14.html
index 71b7a474..25220388 100644
--- a/docs/html/node14.html
+++ b/docs/html/node14.html
@@ -53,30 +53,49 @@ original version by:  Nikos Drakos, CBLU, University of Leeds
 <H2><A NAME="SECTION00063000000000000000"></A><A NAME="sec:smoothers"></A>
 <BR>
 Smoothers and coarsest-level solvers
-</H2><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods,a hybrid version of the forward and backward Gauss-Seidel methods, and theadditive Schwarz (AS) ones (see, e.g., [<A
+</H2><FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods,
+a hybrid version of the forward and backward Gauss-Seidel methods, and the
+additive Schwarz (AS) ones (see, e.g., [<A
  HREF="node29.html#Saad_book">20</A>,<A
- HREF="node29.html#dd2_96">21</A>]). The hybrid Gauss-Seidelversion is considered because the original Gauss-Seidel method is inherently sequential.At each iteration of the hybrid version, each parallel process uses the most recent valuesof its own local variables and the values of the non-local variables computed at theprevious iteration, obtained by exchanging data with other processes beforethe beginning of the current iteration.In the AS methods, the index space <IMG
+ HREF="node29.html#dd2_96">21</A>]). 
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">The hybrid Gauss-Seidel
+version is considered because the original Gauss-Seidel method is inherently sequential.
+At each iteration of the hybrid version, each parallel process uses the most recent values
+of its own local variables and the values of the non-local variables computed at the
+previous iteration, obtained by exchanging data with other processes before
+the beginning of the current iteration.
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">In the AS methods, the index space <IMG
  WIDTH="25" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img9.png"
  ALT="$\Omega^k$"> is divided into <IMG
  WIDTH="28" HEIGHT="31" ALIGN="MIDDLE" BORDER="0"
  SRC="img50.png"
- ALT="$m_k$">subsets <IMG
+ ALT="$m_k$">
+subsets <IMG
  WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img51.png"
  ALT="$\Omega^k_i$"> of size <IMG
  WIDTH="32" HEIGHT="31" ALIGN="MIDDLE" BORDER="0"
  SRC="img52.png"
- ALT="$n_{k,i}$">,  possiblyoverlapping. For each <IMG
+ ALT="$n_{k,i}$">,  possibly
+overlapping. For each <IMG
  WIDTH="11" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img30.png"
- ALT="$i$"> we consider the restrictionoperator <!-- MATH
+ ALT="$i$"> we consider the restriction
+operator <!-- MATH
  $R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$
  -->
 <IMG
  WIDTH="110" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img53.png"
- ALT="$R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$">that maps a vector <IMG
+ ALT="$R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$">
+that maps a vector <IMG
  WIDTH="23" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
  SRC="img54.png"
  ALT="$x^k$"> to the vector <IMG
@@ -85,16 +104,19 @@ Smoothers and coarsest-level solvers
  ALT="$x_i^k$"> made of the components of <IMG
  WIDTH="23" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
  SRC="img54.png"
- ALT="$x^k$">with indices in <IMG
+ ALT="$x^k$">
+with indices in <IMG
  WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img51.png"
- ALT="$\Omega^k_i$">, and the prolongation operator<!-- MATH
+ ALT="$\Omega^k_i$">, and the prolongation operator
+<!-- MATH
  $P^k_i = (R_i^k)^T$
  -->
 <IMG
  WIDTH="95" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img56.png"
- ALT="$P^k_i = (R_i^k)^T$">. These operators are then  used to build<!-- MATH
+ ALT="$P^k_i = (R_i^k)^T$">. These operators are then  used to build
+<!-- MATH
  $A_i^k=R_i^kA^kP_i^k$
  -->
 <IMG
@@ -103,13 +125,16 @@ Smoothers and coarsest-level solvers
  ALT="$A_i^k=R_i^kA^kP_i^k$">, which is the restriction of <IMG
  WIDTH="26" HEIGHT="18" ALIGN="BOTTOM" BORDER="0"
  SRC="img41.png"
- ALT="$A^k$"> to the indexspace <IMG
+ ALT="$A^k$"> to the index
+space <IMG
  WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img51.png"
- ALT="$\Omega^k_i$">.The classical AS preconditioner <IMG
+ ALT="$\Omega^k_i$">.
+The classical AS preconditioner <IMG
  WIDTH="41" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img58.png"
- ALT="$M^k_{AS}$"> is defined as</FONT></FONT></FONT>
+ ALT="$M^k_{AS}$"> is defined as
+</FONT></FONT></FONT>
 <BR><P></P>
 <DIV ALIGN="CENTER">
 <!-- MATH
@@ -124,40 +149,47 @@ Smoothers and coarsest-level solvers
  ALT="\begin{displaymath} ( M^k_{AS} )^{-1} = \sum_{i=1}^{m_k} P_i^k (A_i^k)^{-1} R_i^{k},\end{displaymath}">
 </DIV>
 <BR CLEAR="ALL">
-<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">where <IMG
+<P></P><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+where <IMG
  WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img60.png"
- ALT="$A_i^k$"> is supposed to be nonsingular. We observe that an approximateinverse of <IMG
+ ALT="$A_i^k$"> is supposed to be nonsingular. We observe that an approximate
+inverse of <IMG
  WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img60.png"
  ALT="$A_i^k$"> is usually considered instead of <IMG
  WIDTH="57" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img61.png"
- ALT="$(A_i^k)^{-1}$">.The setup of <IMG
+ ALT="$(A_i^k)^{-1}$">.
+The setup of <IMG
  WIDTH="41" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img58.png"
- ALT="$M^k_{AS}$"> during the multilevel build phaseinvolves</FONT></FONT></FONT>
-
+ ALT="$M^k_{AS}$"> during the multilevel build phase
+involves
+</FONT></FONT></FONT>
 <UL>
 <LI>the definition of the index subspaces <IMG
  WIDTH="25" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img62.png"
- ALT="$\Omega_i^k$"> and of the corresponding   operators <IMG
+ ALT="$\Omega_i^k$"> and of the corresponding 
+  operators <IMG
  WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img63.png"
  ALT="$R_i^k$"> (and <IMG
  WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img64.png"
- ALT="$P_i^k$">);
+ ALT="$P_i^k$">);
 </LI>
 <LI>the computation of the submatrices <IMG
  WIDTH="26" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img60.png"
- ALT="$A_i^k$">;
+ ALT="$A_i^k$">;
 </LI>
-<LI>the computation of their inverses (usually approximated    through  some form  of incomplete factorization).
+<LI>the computation of their inverses (usually approximated
+    through  some form  of incomplete factorization).
 </LI>
-</UL><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">The computation of <!-- MATH
+</UL><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+The computation of <!-- MATH
  $z^k=M^k_{AS}w^k$
  -->
 <IMG
@@ -169,8 +201,9 @@ Smoothers and coarsest-level solvers
 <IMG
  WIDTH="76" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img66.png"
- ALT="$w^k \in \mathbb{R}^{n_k}$">, during themultilevel application phase, requires</FONT></FONT></FONT>
-
+ ALT="$w^k \in \mathbb{R}^{n_k}$">, during the
+multilevel application phase, requires
+</FONT></FONT></FONT>
 <UL>
 <LI>the restriction of <IMG
  WIDTH="25" HEIGHT="19" ALIGN="BOTTOM" BORDER="0"
@@ -181,13 +214,14 @@ Smoothers and coarsest-level solvers
 <IMG
  WIDTH="41" HEIGHT="15" ALIGN="BOTTOM" BORDER="0"
  SRC="img68.png"
- ALT="$\mathbb{R}^{n_{k,i}}$">,	  i.e. <!-- MATH
+ ALT="$\mathbb{R}^{n_{k,i}}$">,
+	  i.e. <!-- MATH
  $w_i^k = R_i^{k} w^k$
  -->
 <IMG
  WIDTH="91" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img69.png"
- ALT="$w_i^k = R_i^{k} w^k$">;
+ ALT="$w_i^k = R_i^{k} w^k$">;
 </LI>
 <LI>the computation of the vectors <!-- MATH
  $z_i^k=(A_i^k)^{-1} w_i^k$
@@ -195,19 +229,38 @@ Smoothers and coarsest-level solvers
 <IMG
  WIDTH="119" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img70.png"
- ALT="$z_i^k=(A_i^k)^{-1} w_i^k$">;
+ ALT="$z_i^k=(A_i^k)^{-1} w_i^k$">;
 </LI>
-<LI>the prolongation and the sum of the previous vectors,    i.e. <!-- MATH
+<LI>the prolongation and the sum of the previous vectors,
+    i.e. <!-- MATH
  $z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$
  -->
 <IMG
  WIDTH="127" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
  SRC="img71.png"
- ALT="$z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$">.
+ ALT="$z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$">.
 </LI>
-</UL><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">Variants of the classical AS method, which use modifications of therestriction and prolongation operators, are also implemented in MLD2P4.Among them, the Restricted AS (RAS) preconditioner usuallyoutperforms the classical AS preconditioner in terms of convergencerate and of computation and communication time on parallel distributed-memorycomputers, and is therefore the most widely used among the ASpreconditioners&nbsp;[<A
- HREF="node29.html#CAI_SARKIS">6</A>]. Direct solvers based on sparse LU factorizations, implemented in thethird-party libraries reported in Section&nbsp;<A HREF="node7.html#sec:third-party">3.2</A>, can be appliedas coarsest-level solvers by MLD2P4. Native inexact solvers based onincomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel,and block Jacobi preconditioners are also available. Direct solvers usuallylead to more effective preconditioners in terms of algorithmic scalability;however, this does not guarantee parallel efficiency.
-</FONT></FONT></FONT><HR>
+</UL><FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">
+Variants of the classical AS method, which use modifications of the
+restriction and prolongation operators, are also implemented in MLD2P4.
+Among them, the Restricted AS (RAS) preconditioner usually
+outperforms the classical AS preconditioner in terms of convergence
+rate and of computation and communication time on parallel distributed-memory
+computers, and is therefore the most widely used among the AS
+preconditioners&nbsp;[<A
+ HREF="node29.html#CAI_SARKIS">6</A>]. 
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1">Direct solvers based on sparse LU factorizations, implemented in the
+third-party libraries reported in Section&nbsp;<A HREF="node7.html#sec:third-party">3.2</A>, can be applied
+as coarsest-level solvers by MLD2P4. Native inexact solvers based on
+incomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel,
+and block Jacobi preconditioners are also available. Direct solvers usually
+lead to more effective preconditioners in terms of algorithmic scalability;
+however, this does not guarantee parallel efficiency.
+</FONT></FONT></FONT>
+<P>
+<FONT SIZE="+1"><FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT></FONT><HR>
 <!--Navigation Panel-->
 <A NAME="tex2html251"
   HREF="node15.html">
diff --git a/docs/html/node15.html b/docs/html/node15.html
index ca695c21..c1007b17 100644
--- a/docs/html/node15.html
+++ b/docs/html/node15.html
@@ -122,7 +122,7 @@ Examples showing the basic use of MLD2P4 are reported in Section&nbsp;<A HREF="n
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><P></P>
-<DIV ALIGN="CENTER"><A NAME="857"></A>
+<DIV ALIGN="CENTER"><A NAME="897"></A>
 <TABLE>
 <CAPTION><STRONG>Table 1:</STRONG>
 Preconditioner types, corresponding strings and default choices.
diff --git a/docs/html/node16.html b/docs/html/node16.html
index e4e69d4a..9d368277 100644
--- a/docs/html/node16.html
+++ b/docs/html/node16.html
@@ -88,7 +88,7 @@ the corresponding codes are available in <code>examples/fileread/</code>.
 </FONT></FONT></FONT>
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
-<DIV ALIGN="CENTER"><A NAME="fig:ex1"></A><A NAME="860"></A>
+<DIV ALIGN="CENTER"><A NAME="fig:ex1"></A><A NAME="900"></A>
 <TABLE>
 <CAPTION ALIGN="BOTTOM"><STRONG>Figure 2:</STRONG>
 setup and application of the default multi-level preconditioner (example 1).
@@ -194,7 +194,7 @@ boundary conditions are also available in the directory <code>examples/pdegen</c
 </FONT></FONT></FONT>
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
-<DIV ALIGN="CENTER"><A NAME="fig:ex2"></A><A NAME="862"></A>
+<DIV ALIGN="CENTER"><A NAME="fig:ex2"></A><A NAME="902"></A>
 <TABLE>
 <CAPTION ALIGN="BOTTOM"><STRONG>Figure 3:</STRONG>
 setup of a multi-level preconditioner</CAPTION>
@@ -227,7 +227,7 @@ setup of a multi-level preconditioner</CAPTION>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
-<DIV ALIGN="CENTER"><A NAME="fig:ex3"></A><A NAME="864"></A>
+<DIV ALIGN="CENTER"><A NAME="fig:ex3"></A><A NAME="904"></A>
 <TABLE>
 <CAPTION ALIGN="BOTTOM"><STRONG>Figure 4:</STRONG>
 setup of a multi-level preconditioner</CAPTION>
@@ -260,7 +260,7 @@ setup of a multi-level preconditioner</CAPTION>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
-<DIV ALIGN="CENTER"><A NAME="fig:ex4"></A><A NAME="866"></A>
+<DIV ALIGN="CENTER"><A NAME="fig:ex4"></A><A NAME="906"></A>
 <TABLE>
 <CAPTION ALIGN="BOTTOM"><STRONG>Figure 5:</STRONG>
 setup of a one-level Schwarz preconditioner.</CAPTION>
diff --git a/docs/html/node19.html b/docs/html/node19.html
index 950b875b..9f0ff2a6 100644
--- a/docs/html/node19.html
+++ b/docs/html/node19.html
@@ -249,7 +249,7 @@ solver is changed to the default sequential solver.
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><P></P>
-<DIV ALIGN="CENTER"><A NAME="1290"></A>
+<DIV ALIGN="CENTER"><A NAME="1330"></A>
 <TABLE>
 <CAPTION><STRONG>Table 2:</STRONG>
 Parameters defining the multi-level cycle and the number of cycles to
@@ -302,7 +302,7 @@ number <IMG
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><P></P>
-<DIV ALIGN="CENTER"><A NAME="1295"></A>
+<DIV ALIGN="CENTER"><A NAME="1335"></A>
 <TABLE>
 <CAPTION><STRONG>Table 3:</STRONG>
 Parameters defining the aggregation algorithm.
@@ -417,7 +417,7 @@ of levels.</TD>
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><P></P>
-<DIV ALIGN="CENTER"><A NAME="1299"></A>
+<DIV ALIGN="CENTER"><A NAME="1339"></A>
 <TABLE>
 <CAPTION><STRONG>Table 4:</STRONG>
 Parameters defining the aggregation algorithm (continued).
@@ -484,7 +484,7 @@ the parameter <TT>ilev</TT>.</TD>
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><P></P>
-<DIV ALIGN="CENTER"><A NAME="1304"></A>
+<DIV ALIGN="CENTER"><A NAME="1344"></A>
 <TABLE>
 <CAPTION><STRONG>Table 5:</STRONG>
 Parameters defining the coarse-space correction at the coarsest
@@ -592,7 +592,7 @@ Note that <TT>UMF</TT> and <TT>SLU</TT> require the coarsest
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><P></P>
-<DIV ALIGN="CENTER"><A NAME="1306"></A>
+<DIV ALIGN="CENTER"><A NAME="1346"></A>
 <TABLE>
 <CAPTION><STRONG>Table 6:</STRONG>
 Parameters defining the coarse-space correction at the coarsest
@@ -658,7 +658,7 @@ number <IMG
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><P></P>
-<DIV ALIGN="CENTER"><A NAME="1308"></A>
+<DIV ALIGN="CENTER"><A NAME="1348"></A>
 <TABLE>
 <CAPTION><STRONG>Table 7:</STRONG>
 Parameters defining the smoother or the details of the one-level preconditioner.
@@ -781,7 +781,7 @@ Parameters defining the smoother or the details of the one-level preconditioner.
 <P>
 <FONT SIZE="+1"><FONT SIZE="+1"></FONT></FONT>
 <BR><P></P>
-<DIV ALIGN="CENTER"><A NAME="1310"></A>
+<DIV ALIGN="CENTER"><A NAME="1350"></A>
 <TABLE>
 <CAPTION><STRONG>Table 8:</STRONG>
 Parameters defining the smoother or the details of the one-level preconditioner
diff --git a/docs/mld2p4-2.1-guide.pdf b/docs/mld2p4-2.1-guide.pdf
index 8bbe1121..70a33d88 100644
--- a/docs/mld2p4-2.1-guide.pdf
+++ b/docs/mld2p4-2.1-guide.pdf
@@ -9981,8 +9981,8 @@ endobj
 680 0 obj
 <<
  /Title (MultiLevel Domain Decomposition Parallel Preconditioners Package based on PSBLAS, V. 2.1) /Subject (MultiLevel Domain Decomposition Parallel Preconditioners Package) /Keywords (Parallel Numerical Software, Algebraic Multilevel Preconditioners, Sparse Iterative Solvers, PSBLAS, MPI) /Creator (pdfLaTeX) /Producer ($Id: userguide.tex 2008-04-08 Pasqua D'Ambra, Daniela di Serafino, Salvatore Filippone$) /Author()/Title()/Subject()/Creator(LaTeX with hyperref package)/Producer(pdfTeX-1.40.17)/Keywords()
-/CreationDate (D:20170725140354+01'00')
-/ModDate (D:20170725140354+01'00')
+/CreationDate (D:20170725141821+01'00')
+/ModDate (D:20170725141821+01'00')
 /Trapped /False
 /PTEX.Fullbanner (This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) kpathsea version 6.2.2)
 >>
@@ -10035,7 +10035,7 @@ endobj
 /W [1 3 1]
 /Root 679 0 R
 /Info 680 0 R
-/ID [<AD0F352769F0D26CB9AD6813D665082E> <AD0F352769F0D26CB9AD6813D665082E>]
+/ID [<80F11E5166324BBFF3FAE1715940D892> <80F11E5166324BBFF3FAE1715940D892>]
 /Length 3410      
 >>
 stream
diff --git a/docs/src/background.tex b/docs/src/background.tex
index eae73aee..f86b0348 100644
--- a/docs/src/background.tex
+++ b/docs/src/background.tex
@@ -1 +1,281 @@
-\section{Multigrid Background\label{sec:background}}\markboth{\textsc{MLD2P4 User's and Reference Guide}}         {\textsc{\ref{sec:background} Multigrid Background}}Multigrid preconditioners, coupled with Krylov iterativesolvers, are widely used in the parallel solution of large and sparse linear systems,because of their optimality in the solution of linear systems arising from thediscretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.Optimality, also known as algorithmic scalability, is the property of having a computational cost per iteration that depends linearly onthe problem size, and a convergence rate that is independent of the problem size.Multigrid preconditioners are based on a recursive application of a two-grid processconsisting of smoother iterations and a coarse-space (or coarse-level) correction.The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,or more complex subspace-correction methods, such as the Schwarz ones.The coarse-space correction consists of solving, in an appropriately chosencoarse space, the residual equation associated with the approximate solution computedby the smoother, and of using the solution of this equation to correct theprevious approximation. The transfer of information between the original(fine) space and the coarse one is performed by using suitable restriction andprolongation operators. The construction of the coarse space and the correspondingtransfer operators is carried out by applying a so-called coarsening algorithm to the systemmatrix. Two main approaches can be used to perform coarsening: the geometric approach,which exploits the knowledge of some physical grid associated with the matrixand requires the user to define transfer operators from the fineto the coarse level and vice versa, and the algebraic approach, which buildsthe coarse-space correction and the associate transfer operators using only matrixinformation. The first approach may be difficult when the system comes fromdiscretizations on complex geometries;furthermore, ad hoc one-level smoothers may be required to get an efficientinterplay between fine and coarse levels, e.g., when matrices with highly varying coefficientsare considered. The second approach performs a fully automatic coarsening and enforces theinterplay between fine and coarse level by suitably choosing the coarse space andthe coarse-to-fine interpolation (see, e.g., \cite{Briggs2000,Stuben_01,dd2_96} for details.)MLD2P4 uses a pure algebraic approach, based on the smoothed aggregation algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA},for building the sequence of coarse matrices and transfer operators,starting from the original one.A decoupled version of this algorithm is implemented, where the smoothedaggregation is applied locally to each submatrix \cite{TUMINARO_TONG}.A brief description of the AMG preconditioners implemented in MLD2P4 is given in Sections~\ref{sec:multilevel}-\ref{sec:smoothers}. For further details the readeris referred to \cite{para_04,aaecc_07,apnum_07,MLD2P4_TOMS}.We note that optimal multigrid preconditioners do not necessarily correspondto minimum execution times in a parallel setting. Indeed, to obtain effective parallelmultigrid preconditioners, a tradeoff between the optimality and the cost of building andapplying the smoothers and the coarse-space corrections must be achieved. Effectiveparallel preconditioners require algorithmic scalability to be coupled with implementationscalability, i.e., a computational cost per iteration which remains (almost) constant asthe number of parallel processors increases.\subsection{AMG preconditioners\label{sec:multilevel}}In order to describe the AMG preconditioners available in MLD2P4, we consider alinear system\begin{equation}Ax=b, \label{eq:system}\end{equation}where $A=(a_{ij}) \in \mathbb{R}^{n \times n}$ is a nonsingular sparse matrix;for ease of presentation we assume $A$ is real, but theresults are valid for the complex case as well. Let us assume as finest index space the set of row (column) indices of $A$, i.e.,$\Omega = \{1, 2, \ldots, n\}$. Any algebraic multilevel preconditioners implemented in MLD2P4 generatesa hierarchy of index spaces and a corresponding hierarchy of matrices,\[ \Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev},\quad A^1 \equiv A, A^2, \ldots, A^{nlev}, \]by using the information contained in $A$, without assuming anyknowledge of the geometry of the problem from which $A$ originates.A vector space $\mathbb{R}^{n_{k}}$ is associated with $\Omega^k$,where $n_k$ is the size of $\Omega^k$.For all $k < nlev$, a restriction operator and a prolongation one are built,which connect two levels $k$ and $k+1$:\[    P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad     R^k \in \mathbb{R}^{n_{k+1}\times n_k};\]the matrix $A^{k+1}$ is computed by using the previous operators accordingto the Galerkin approach, i.e.,\[  A^{k+1}=R^kA^kP^k.\]In the current implementation of MLD2P4 we have $R^k=(P^k)^T$A smoother with iteration matrix $M^k$ is set up at each level $k < nlev$, and a solveris set up at the coarsest level, so that they are ready for application (for example, setting up a solver based on the $LU$ factorization means computingand storing the $L$ and $U$ factors). The construction of the hierarchy of AMG componentsdescribed so far corresponds to the so-called build phase of the preconditioner.\begin{figure}[t]\begin{center} \framebox{\begin{minipage}{.85\textwidth}\begin{tabbing}\quad \=\quad \=\quad \=\quad \\[-3mm]procedure V-cycle$\left(k,A^k,b^k,u^k\right)$ \\[2mm]\>if $\left(k \ne nlev \right)$ then \\[1mm]\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]\>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$ \\[1mm]\>\> $u^{k+1} =$ V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$ \\[1mm]\>\> $u^k = u^k + P^{k+1} u^{k+1}$ \\[1mm]\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]\>else \\[1mm]\>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm]\>endif \\[1mm]\>return $u^k$ \\[1mm]end\end{tabbing}\end{minipage}}\caption{Application phase of a V-cycle preconditioner.\label{fig:application_alg}}\end{center}\end{figure}The components produced in the build phase may be combined in several waysto obtain different multilevel preconditioners;this is  done in the application phase, i.e., in the computation of a vectorof type $w=B^{-1}v$, where $B$ denotes the preconditioner, usually within an iterationof a Krylov solver \cite{Saad_book}. An example of such a combination, known asV-cycle, is given in Figure~\ref{fig:application_alg}. In this case, a single iterationof the same smoother is used before and after the the recursive call to the V-cycle (i.e.,in the pre-smoothing and post-smoothing phases); however, different choices can beperformed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycleand W-cycle~\cite{Briggs2000}, and a version of the K-cycle describedin~\cite{Notay2008}.  \subsection{Smoothed Aggregation\label{sec:aggregation}}In order to define the prolongator $P^k$, used to computethe coarse-level matrix $A^{k+1}$, MLD2P4 uses the smoothed aggregationalgorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.The basic idea of this algorithm is to build a coarse set of indices$\Omega^{k+1}$ by suitably grouping the indices of $\Omega^k$ into disjointsubsets (aggregates), and to define the coarse-to-fine space transfer operator$P^k$ by applying a suitable smoother to a simple piecewise constantprolongation operator, with the aim of improving the quality of the coarse-space correction.Three main steps can be identified in the smoothed aggregation procedure:\begin{enumerate}        \item aggregation of the indices of $\Omega^k$ to obtain $\Omega^{k+1}$;        \item construction of the prolongator $P^k$;        \item application of $P^k$ and $R^k=(P^k)^T$ to build $A^{k+1}$.\end{enumerate} In order to perform the coarsening step, the smoothed aggregation algorithmdescribed in~\cite{VANEK_MANDEL_BREZINA} is used. In this algorithm,each index $j \in \Omega^{k+1}$ corresponds to an aggregate $\Omega^k_j$ of $\Omega^k$,consisting of a suitably chosen index $i \in \Omega^k$ and indices that are (usually) contained in astrongly-coupled neighborood of $i$, i.e.,\begin{equation}\label{eq:strongly_coup}   \Omega^k_j \subset \mathcal{N}_i^k(\theta) =    \left\{ r \in \Omega^k: |a_{ir}^k| > \theta \sqrt{|a_{ii}^ka_{rr}^k|} \right \} \cup \left\{ i \right\},\end{equation}for a given threshold $\theta \in [0,1]$ (see~\cite{VANEK_MANDEL_BREZINA} for the details).Since this algorithm has a sequential nature, a decoupledversion of it is applied, where each processor independently executesthe algorithm on the set of indices assigned to it in the initial datadistribution. This version is embarrassingly parallel, since it does not require any data communication. On the other hand, it may produce some nonuniform aggregatesand is strongly dependent on the number of processors and on the initial partitioningof the matrix $A$. Nevertheless, this parallel algorithm has been chosen forMLD2P4, since it has been shown to produce good results in practice\cite{aaecc_07,apnum_07,TUMINARO_TONG}.The prolongator $P^k$ is built starting from a tentative prolongator$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$, defined as\begin{equation}\bar{P}^k =(\bar{p}_{ij}^k), \quad  \bar{p}_{ij}^k = \left\{ \begin{array}{ll}1 & \quad \mbox{if} \; i \in \Omega^k_j, \\0 & \quad \mbox{otherwise},\end{array} \right.\label{eq:tent_prol}\end{equation}where $\Omega^k_j$ is the aggregate of $\Omega^k$corresponding to the index $j \in \Omega^{k+1}$.$P^k$ is obtained by applying to $\bar{P}^k$ a smoother$S^k \in \mathbb{R}^{n_k \times n_k}$:$$P^k = S^k \bar{P}^k,$$in order to remove nonsmooth components from the range of the prolongator,and hence to improve the convergence properties of the multi-levelmethod~\cite{BREZINA_VANEK,Stuben_01}.A simple choice for $S^k$ is the damped Jacobi smoother:\[S^k = I - \omega^k (D^k)^{-1} A^k_F , \]where $D^k$ is the diagonal matrix with the same diagonal entries as $A^k$,$A^k_F = (\bar{a}_{ij}^k)$ is the filtered matrix defined as\begin{equation}\label{eq:filtered}  \bar{a}_{ij}^k =   \left \{ \begin{array}{ll}   a_{ij}^k & \mbox{if } j \in \mathcal{N}_i^k(\theta), \\   0            & \mbox{otherwise},   \end{array} \right.   \; (j \ne i),   \qquad   \bar{a}_{ii}^k = a_{ii}^k - \sum_{j \ne i} (a_{ij}^k - \bar{a}_{ij}^k),\end{equation}and $\omega^k$ is an approximation of $4/(3\rho^k)$, where$\rho^k$ is the spectral radius of $(D^k)^{-1}A^k_F$ \cite{BREZINA_VANEK}.In MLD2P4 this approximation is obtained by using $\| A^k_F \|_\infty$ as an estimateof $\rho^k$. Note that for systems coming from uniformly ellipticproblems, filtering the matrix $A^k$ has little or no effect, and$A^k$ can be used instead of $A^k_F$. The latter choice is the default in MLD2P4.\subsection{Smoothers and coarsest-level solvers\label{sec:smoothers}}The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods,a hybrid version of the forward and backward Gauss-Seidel methods, and theadditive Schwarz (AS) ones (see, e.g., \cite{Saad_book,dd2_96}). The hybrid Gauss-Seidelversion is considered because the original Gauss-Seidel method is inherently sequential.At each iteration of the hybrid version, each parallel process uses the most recent valuesof its own local variables and the values of the non-local variables computed at theprevious iteration, obtained by exchanging data with other processes beforethe beginning of the current iteration.In the AS methods, the index space $\Omega^k$ is divided into $m_k$subsets $\Omega^k_i$ of size $n_{k,i}$,  possiblyoverlapping. For each $i$ we consider the restrictionoperator $R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$that maps a vector $x^k$ to the vector $x_i^k$ made of the components of $x^k$with indices in $\Omega^k_i$, and the prolongation operator$P^k_i = (R_i^k)^T$. These operators are then  used to build$A_i^k=R_i^kA^kP_i^k$, which is the restriction of $A^k$ to the indexspace $\Omega^k_i$.The classical AS preconditioner $M^k_{AS}$ is defined as\[    ( M^k_{AS} )^{-1} = \sum_{i=1}^{m_k} P_i^k (A_i^k)^{-1} R_i^{k},\]where $A_i^k$ is supposed to be nonsingular. We observe that an approximateinverse of $A_i^k$ is usually considered instead of $(A_i^k)^{-1}$.The setup of $M^k_{AS}$ during the multilevel build phaseinvolves\begin{itemize}  \item the definition of the index subspaces $\Omega_i^k$ and of the corresponding   operators $R_i^k$ (and $P_i^k$);  \item the computation of the submatrices $A_i^k$;  \item the computation of their inverses (usually approximated    through  some form  of incomplete factorization).\end{itemize}The computation of $z^k=M^k_{AS}w^k$, with $w^k \in \mathbb{R}^{n_k}$, during themultilevel application phase, requires\begin{itemize}	\item the restriction of $w^k$ to the subspaces $\mathbb{R}^{n_{k,i}}$,	  i.e.\ $w_i^k = R_i^{k} w^k$;	\item the computation of the vectors $z_i^k=(A_i^k)^{-1} w_i^k$;	\item the prolongation and the sum of the previous vectors,    i.e.\ $z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$.\end{itemize}Variants of the classical AS method, which use modifications of therestriction and prolongation operators, are also implemented in MLD2P4.Among them, the Restricted AS (RAS) preconditioner usuallyoutperforms the classical AS preconditioner in terms of convergencerate and of computation and communication time on parallel distributed-memorycomputers, and is therefore the most widely used among the ASpreconditioners~\cite{CAI_SARKIS}. Direct solvers based on sparse LU factorizations, implemented in thethird-party libraries reported in Section~\ref{sec:third-party}, can be appliedas coarsest-level solvers by MLD2P4. Native inexact solvers based onincomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel,and block Jacobi preconditioners are also available. Direct solvers usuallylead to more effective preconditioners in terms of algorithmic scalability;however, this does not guarantee parallel efficiency.%%% Local Variables: %%% mode: latex%%% TeX-master: "userguide"%%% End: 
\ No newline at end of file
+\section{Multigrid Background\label{sec:background}} 
+\markboth{\textsc{MLD2P4 User's and Reference Guide}}
+         {\textsc{\ref{sec:background} Multigrid Background}}
+
+Multigrid preconditioners, coupled with Krylov iterative
+solvers, are widely used in the parallel solution of large and sparse linear systems,
+because of their optimality in the solution of linear systems arising from the
+discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.
+Optimality, also known as algorithmic scalability, is the property 
+of having a computational cost per iteration that depends linearly on
+the problem size, and a convergence rate that is independent of the problem size.
+
+Multigrid preconditioners are based on a recursive application of a two-grid process
+consisting of smoother iterations and a coarse-space (or coarse-level) correction.
+The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,
+or more complex subspace-correction methods, such as the Schwarz ones.
+The coarse-space correction consists of solving, in an appropriately chosen
+coarse space, the residual equation associated with the approximate solution computed
+by the smoother, and of using the solution of this equation to correct the
+previous approximation. The transfer of information between the original
+(fine) space and the coarse one is performed by using suitable restriction and
+prolongation operators. The construction of the coarse space and the corresponding
+transfer operators is carried out by applying a so-called coarsening algorithm to the system
+matrix. Two main approaches can be used to perform coarsening: the geometric approach,
+which exploits the knowledge of some physical grid associated with the matrix
+and requires the user to define transfer operators from the fine
+to the coarse level and vice versa, and the algebraic approach, which builds
+the coarse-space correction and the associate transfer operators using only matrix
+information. The first approach may be difficult when the system comes from
+discretizations on complex geometries;
+furthermore, ad hoc one-level smoothers may be required to get an efficient
+interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients
+are considered. The second approach performs a fully automatic coarsening and enforces the
+interplay between fine and coarse level by suitably choosing the coarse space and
+the coarse-to-fine interpolation (see, e.g., \cite{Briggs2000,Stuben_01,dd2_96} for details.)
+
+MLD2P4 uses a pure algebraic approach, based on the smoothed 
+aggregation algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA},
+for building the sequence of coarse matrices and transfer operators,
+starting from the original one.
+A decoupled version of this algorithm is implemented, where the smoothed
+aggregation is applied locally to each submatrix \cite{TUMINARO_TONG}.
+A brief description of the AMG preconditioners implemented in MLD2P4 is given in 
+Sections~\ref{sec:multilevel}-\ref{sec:smoothers}. For further details the reader
+is referred to \cite{para_04,aaecc_07,apnum_07,MLD2P4_TOMS}.
+
+We note that optimal multigrid preconditioners do not necessarily correspond
+to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
+multigrid preconditioners, a tradeoff between the optimality and the cost of building and
+applying the smoothers and the coarse-space corrections must be achieved. Effective
+parallel preconditioners require algorithmic scalability to be coupled with implementation
+scalability, i.e., a computational cost per iteration which remains (almost) constant as
+the number of parallel processors increases.
+
+
+\subsection{AMG preconditioners\label{sec:multilevel}}
+
+In order to describe the AMG preconditioners available in MLD2P4, we consider a
+linear system
+\begin{equation}
+Ax=b, \label{eq:system}
+\end{equation}
+where $A=(a_{ij}) \in \mathbb{R}^{n \times n}$ is a nonsingular sparse matrix;
+for ease of presentation we assume $A$ is real, but the
+results are valid for the complex case as well. 
+
+Let us assume as finest index space the set of row (column) indices of $A$, i.e.,
+$\Omega = \{1, 2, \ldots, n\}$. 
+Any algebraic multilevel preconditioners implemented in MLD2P4 generates
+a hierarchy of index spaces and a corresponding hierarchy of matrices,
+\[ \Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev},
+\quad A^1 \equiv A, A^2, \ldots, A^{nlev}, \]
+by using the information contained in $A$, without assuming any
+knowledge of the geometry of the problem from which $A$ originates.
+A vector space $\mathbb{R}^{n_{k}}$ is associated with $\Omega^k$,
+where $n_k$ is the size of $\Omega^k$.
+For all $k < nlev$, a restriction operator and a prolongation one are built,
+which connect two levels $k$ and $k+1$:
+\[
+    P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad 
+    R^k \in \mathbb{R}^{n_{k+1}\times n_k};
+\]
+the matrix $A^{k+1}$ is computed by using the previous operators according
+to the Galerkin approach, i.e.,
+\[
+  A^{k+1}=R^kA^kP^k.
+\]
+In the current implementation of MLD2P4 we have $R^k=(P^k)^T$
+A smoother with iteration matrix $M^k$ is set up at each level $k < nlev$, and a solver
+is set up at the coarsest level, so that they are ready for application 
+(for example, setting up a solver based on the $LU$ factorization means computing
+and storing the $L$ and $U$ factors). The construction of the hierarchy of AMG components
+described so far corresponds to the so-called build phase of the preconditioner.
+
+\begin{figure}[t]
+\begin{center} 
+\framebox{
+\begin{minipage}{.85\textwidth}
+\begin{tabbing}
+\quad \=\quad \=\quad \=\quad \\[-3mm]
+procedure V-cycle$\left(k,A^k,b^k,u^k\right)$ \\[2mm]
+\>if $\left(k \ne nlev \right)$ then \\[1mm]
+\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]
+\>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$ \\[1mm]
+\>\> $u^{k+1} =$ V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$ \\[1mm]
+\>\> $u^k = u^k + P^{k+1} u^{k+1}$ \\[1mm]
+\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]
+\>else \\[1mm]
+\>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm]
+\>endif \\[1mm]
+\>return $u^k$ \\[1mm]
+end
+\end{tabbing}
+\end{minipage}
+}
+\caption{Application phase of a V-cycle preconditioner.\label{fig:application_alg}}
+\end{center}
+\end{figure}
+
+The components produced in the build phase may be combined in several ways
+to obtain different multilevel preconditioners;
+this is  done in the application phase, i.e., in the computation of a vector
+of type $w=B^{-1}v$, where $B$ denotes the preconditioner, usually within an iteration
+of a Krylov solver \cite{Saad_book}. An example of such a combination, known as
+V-cycle, is given in Figure~\ref{fig:application_alg}. In this case, a single iteration
+of the same smoother is used before and after the the recursive call to the V-cycle (i.e.,
+in the pre-smoothing and post-smoothing phases); however, different choices can be
+performed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycle
+and W-cycle~\cite{Briggs2000}, and a version of the K-cycle described
+in~\cite{Notay2008}.  
+
+
+\subsection{Smoothed Aggregation\label{sec:aggregation}}
+
+In order to define the prolongator $P^k$, used to compute
+the coarse-level matrix $A^{k+1}$, MLD2P4 uses the smoothed aggregation
+algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.
+The basic idea of this algorithm is to build a coarse set of indices
+$\Omega^{k+1}$ by suitably grouping the indices of $\Omega^k$ into disjoint
+subsets (aggregates), and to define the coarse-to-fine space transfer operator
+$P^k$ by applying a suitable smoother to a simple piecewise constant
+prolongation operator, with the aim of improving the quality of the coarse-space correction.
+
+Three main steps can be identified in the smoothed aggregation procedure:
+\begin{enumerate}
+        \item aggregation of the indices of $\Omega^k$ to obtain $\Omega^{k+1}$;
+        \item construction of the prolongator $P^k$;
+        \item application of $P^k$ and $R^k=(P^k)^T$ to build $A^{k+1}$.
+\end{enumerate}
+ 
+In order to perform the coarsening step, the smoothed aggregation algorithm
+described in~\cite{VANEK_MANDEL_BREZINA} is used. In this algorithm,
+each index $j \in \Omega^{k+1}$ corresponds to an aggregate $\Omega^k_j$ of $\Omega^k$,
+consisting of a suitably chosen index $i \in \Omega^k$ and indices that are (usually) contained in a
+strongly-coupled neighborood of $i$, i.e.,
+\begin{equation}
+\label{eq:strongly_coup}
+   \Omega^k_j \subset \mathcal{N}_i^k(\theta) = 
+   \left\{ r \in \Omega^k: |a_{ir}^k| > \theta \sqrt{|a_{ii}^ka_{rr}^k|} \right \} \cup \left\{ i \right\},
+\end{equation}
+for a given threshold $\theta \in [0,1]$ (see~\cite{VANEK_MANDEL_BREZINA} for the details).
+Since this algorithm has a sequential nature, a decoupled
+version of it is applied, where each processor independently executes
+the algorithm on the set of indices assigned to it in the initial data
+distribution. This version is embarrassingly parallel, since it does not require any data 
+communication. On the other hand, it may produce some nonuniform aggregates
+and is strongly dependent on the number of processors and on the initial partitioning
+of the matrix $A$. Nevertheless, this parallel algorithm has been chosen for
+MLD2P4, since it has been shown to produce good results in practice
+\cite{aaecc_07,apnum_07,TUMINARO_TONG}.
+
+The prolongator $P^k$ is built starting from a tentative prolongator
+$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$, defined as
+\begin{equation}
+\bar{P}^k =(\bar{p}_{ij}^k), \quad  \bar{p}_{ij}^k = 
+\left\{ \begin{array}{ll}
+1 & \quad \mbox{if} \; i \in \Omega^k_j, \\
+0 & \quad \mbox{otherwise},
+\end{array} \right.
+\label{eq:tent_prol}
+\end{equation}
+where $\Omega^k_j$ is the aggregate of $\Omega^k$
+corresponding to the index $j \in \Omega^{k+1}$.
+$P^k$ is obtained by applying to $\bar{P}^k$ a smoother
+$S^k \in \mathbb{R}^{n_k \times n_k}$:
+$$
+P^k = S^k \bar{P}^k,
+$$
+in order to remove nonsmooth components from the range of the prolongator,
+and hence to improve the convergence properties of the multi-level
+method~\cite{BREZINA_VANEK,Stuben_01}.
+A simple choice for $S^k$ is the damped Jacobi smoother:
+\[
+S^k = I - \omega^k (D^k)^{-1} A^k_F , 
+\]
+where $D^k$ is the diagonal matrix with the same diagonal entries as $A^k$,
+$A^k_F = (\bar{a}_{ij}^k)$ is the filtered matrix defined as
+\begin{equation}
+\label{eq:filtered}
+  \bar{a}_{ij}^k =
+   \left \{ \begin{array}{ll}
+   a_{ij}^k & \mbox{if } j \in \mathcal{N}_i^k(\theta), \\
+   0            & \mbox{otherwise},
+   \end{array} \right.
+   \; (j \ne i),
+   \qquad
+   \bar{a}_{ii}^k = a_{ii}^k - \sum_{j \ne i} (a_{ij}^k - \bar{a}_{ij}^k),
+\end{equation}
+and $\omega^k$ is an approximation of $4/(3\rho^k)$, where
+$\rho^k$ is the spectral radius of $(D^k)^{-1}A^k_F$ \cite{BREZINA_VANEK}.
+In MLD2P4 this approximation is obtained by using $\| A^k_F \|_\infty$ as an estimate
+of $\rho^k$. Note that for systems coming from uniformly elliptic
+problems, filtering the matrix $A^k$ has little or no effect, and
+$A^k$ can be used instead of $A^k_F$. The latter choice is the default in MLD2P4.
+
+\subsection{Smoothers and coarsest-level solvers\label{sec:smoothers}}
+
+The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods,
+a hybrid version of the forward and backward Gauss-Seidel methods, and the
+additive Schwarz (AS) ones (see, e.g., \cite{Saad_book,dd2_96}). 
+
+The hybrid Gauss-Seidel
+version is considered because the original Gauss-Seidel method is inherently sequential.
+At each iteration of the hybrid version, each parallel process uses the most recent values
+of its own local variables and the values of the non-local variables computed at the
+previous iteration, obtained by exchanging data with other processes before
+the beginning of the current iteration.
+
+In the AS methods, the index space $\Omega^k$ is divided into $m_k$
+subsets $\Omega^k_i$ of size $n_{k,i}$,  possibly
+overlapping. For each $i$ we consider the restriction
+operator $R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$
+that maps a vector $x^k$ to the vector $x_i^k$ made of the components of $x^k$
+with indices in $\Omega^k_i$, and the prolongation operator
+$P^k_i = (R_i^k)^T$. These operators are then  used to build
+$A_i^k=R_i^kA^kP_i^k$, which is the restriction of $A^k$ to the index
+space $\Omega^k_i$.
+The classical AS preconditioner $M^k_{AS}$ is defined as
+\[
+    ( M^k_{AS} )^{-1} = \sum_{i=1}^{m_k} P_i^k (A_i^k)^{-1} R_i^{k},
+\]
+where $A_i^k$ is supposed to be nonsingular. We observe that an approximate
+inverse of $A_i^k$ is usually considered instead of $(A_i^k)^{-1}$.
+The setup of $M^k_{AS}$ during the multilevel build phase
+involves
+\begin{itemize}
+  \item the definition of the index subspaces $\Omega_i^k$ and of the corresponding 
+  operators $R_i^k$ (and $P_i^k$);
+  \item the computation of the submatrices $A_i^k$;
+  \item the computation of their inverses (usually approximated
+    through  some form  of incomplete factorization).
+\end{itemize}
+The computation of $z^k=M^k_{AS}w^k$, with $w^k \in \mathbb{R}^{n_k}$, during the
+multilevel application phase, requires
+\begin{itemize}
+	\item the restriction of $w^k$ to the subspaces $\mathbb{R}^{n_{k,i}}$,
+	  i.e.\ $w_i^k = R_i^{k} w^k$;
+	\item the computation of the vectors $z_i^k=(A_i^k)^{-1} w_i^k$;
+	\item the prolongation and the sum of the previous vectors,
+    i.e.\ $z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$.
+\end{itemize}
+Variants of the classical AS method, which use modifications of the
+restriction and prolongation operators, are also implemented in MLD2P4.
+Among them, the Restricted AS (RAS) preconditioner usually
+outperforms the classical AS preconditioner in terms of convergence
+rate and of computation and communication time on parallel distributed-memory
+computers, and is therefore the most widely used among the AS
+preconditioners~\cite{CAI_SARKIS}. 
+
+Direct solvers based on sparse LU factorizations, implemented in the
+third-party libraries reported in Section~\ref{sec:third-party}, can be applied
+as coarsest-level solvers by MLD2P4. Native inexact solvers based on
+incomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel,
+and block Jacobi preconditioners are also available. Direct solvers usually
+lead to more effective preconditioners in terms of algorithmic scalability;
+however, this does not guarantee parallel efficiency.
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "userguide"
+%%% End: