diff --git a/docs/src/background.tex b/docs/src/background.tex
index 8b600155..d496242f 100644
--- a/docs/src/background.tex
+++ b/docs/src/background.tex
@@ -1 +1 @@
-\section{Multigrid Background\label{sec:background}}
\markboth{\textsc{MLD2P4 User's and Reference Guide}}
{\textsc{\ref{sec:background} Multigrid Background}}
Multigrid preconditioners, coupled with Krylov iterative
solvers, are widely used in the parallel solution of large and sparse linear systems,
because of their optimality in the solution of linear systems arising from the
discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.
Optimality, also known as algorithmic scalability, is the property
of having a computational cost per iteration that depends linearly on
the problem size, and a convergence rate that is independent of the problem size.
Multigrid preconditioners are based on a recursive application of a two-grid process
consisting of smoother iterations and a coarse-space (or coarse-level) correction.
The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,
or more complex subspace-correction methods, such as the Schwarz ones.
The coarse-space correction consists of solving, in an appropriately chosen
coarse space, the residual equation associated with the approximate solution computed
by the smoother, and of using the solution of this equation to correct the
previous approximation. The transfer of information between the original
(fine) space and the coarse one is performed by using suitable restriction and
prolongation operators. The construction of the coarse space and the corresponding
transfer operators is carried out by applying a so-called coarsening algorithm to the system
matrix. Two main approaches can be used to perform coarsening: the geometric approach,
which exploits the knowledge of some physical grid associated with the matrix
and requires the user to define transfer operators from the fine
to the coarse level and vice versa, and the algebraic approach, which builds
the coarse-space correction and the associate transfer operators using only matrix
information. The first approach may be difficult when the system comes from
discretizations on complex geometries;
furthermore, ad hoc one-level smoothers may be required to get an efficient
interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients
are considered. The second approach performs a fully automatic coarsening and enforces the
interplay between fine and coarse level by suitably choosing the coarse space and
the coarse-to-fine interpolation (see, e.g., \cite{Briggs2000,Stuben_01,dd2_96} for details.)
MLD2P4 uses a pure algebraic approach, based on the smoothed
aggregation algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA},
for building the sequence of coarse matrices and transfer operators,
starting from the original one.
A decoupled version of this algorithm is implemented, where the smoothed
aggregation is applied locally to each submatrix \cite{TUMINARO_TONG}.
A brief description of the AMG preconditioners implemented in MLD2P4 is given in
Sections~\ref{sec:multilevel}-\ref{sec:smoothers}. For further details the reader
is referred to \cite{para_04,aaecc_07,apnum_07,MLD2P4_TOMS}.
We note that optimal multigrid preconditioners do not necessarily correspond
to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
multigrid preconditioners, a tradeoff between the optimality and the cost of building and
applying the smoothers and the coarse-space corrections must be achieved. Effective
parallel preconditioners require algorithmic scalability to be coupled with implementation
scalability, i.e., a computational cost per iteration which remains (almost) constant as
the number of parallel processors increases.
\subsection{AMG preconditioners\label{sec:multilevel}}
In order to describe the AMG preconditioners available in MLD2P4, we consider a
linear system
\begin{equation}
Ax=b, \label{eq:system}
\end{equation}
where $A=(a_{ij}) \in \mathbb{R}^{n \times n}$ is a nonsingular sparse matrix;
for ease of presentation we assume $A$ is real, but the
results are valid for the complex case as well.
Let us assume as finest index space the set of row (column) indices of $A$, i.e.,
$\Omega = \{1, 2, \ldots, n\}$.
Any algebraic multilevel preconditioners implemented in MLD2P4 generates
a hierarchy of index spaces and a corresponding hierarchy of matrices,
\[ \Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev},
\quad A^1 \equiv A, A^2, \ldots, A^{nlev}, \]
by using the information contained in $A$, without assuming any
knowledge of the geometry of the problem from which $A$ originates.
A vector space $\mathbb{R}^{n_{k}}$ is associated with $\Omega^k$,
where $n_k$ is the size of $\Omega^k$.
For all $k < nlev$, a restriction operator and a prolongation one are built,
which connect two levels $k$ and $k+1$:
$$
P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad
R^k \in \mathbb{R}^{n_{k+1}\times n_k};
$$
%\[
% P^k: \mathbb{R}^{n_{k+1}} \longrightarrow \mathbb{R}^{n_k}, \quad
% R^k: \mathbb{R}^{n_k} \longrightarrow \mathbb{R}^{n_{k+1}};
%\]
the matrix $A^{k+1}$ is computed by using the previous operators according
to the Galerkin approach, i.e.,
$$
A^{k+1}=R^kA^kP^k.
$$
$R^k=(P^k)^T$ in the current implementation of MLD2P4.
A smoother with iteration matrix $M^k$ is set up at each level $k < nlev$, and a solver
is set up at the coarsest level, so that they are ready for application
(for example, setting up a solver based on the $LU$ factorization means computing
and storing the $L$ and $U$ factors). The construction of the hierachy of AMG components
described so far corresponds to the so-called build phase of the preconditioner.
\begin{figure}[t]
\begin{center}
\framebox{
\begin{minipage}{.85\textwidth}
\begin{tabbing}
\quad \=\quad \=\quad \=\quad \\[-3mm]
procedure V-cycle$\left(k,A^k,b^k,u^k\right)$ \\[2mm]
\>if $\left(k \ne nlev \right)$ then \\[1mm]
\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]
\>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$ \\[1mm]
\>\> $u^{k+1} =$ V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$ \\[1mm]
\>\> $u^k = u^k + P^{k+1} u^{k+1}$ \\[1mm]
\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]
\>else \\[1mm]
\>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm]
\>endif \\[1mm]
\>return $u^k$ \\[1mm]
end
\end{tabbing}
\end{minipage}
}
\caption{Application phase of a V-cycle preconditioner.\label{fig:application_alg}}
\end{center}
\end{figure}
The components produced in the build phase may be combined in several ways
to obtain different multilevel preconditioners;
this is done in the application phase, i.e., in the computation of a vector
of type $w=B^{-1}v$, where $B$ denotes the preconditioner, usually within an iteration
of a Krylov solver \cite{Saad_book}. An example of such a combination, known as
V-cycle, is given in Figure~\ref{fig:application_alg}. In this case, a single iteration
of the same smoother is used before and after the the recursive call to the V-cycle (i.e.,
in the pre-smoothing and post-smoothing phases); however, different choices can be
performed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycle
and W-cycle~\cite{Briggs2000}, and a version of the K-cycle described in~\cite{Notay2008}.
\subsection{Smoothed Aggregation\label{sec:aggregation}}
In order to define the prolongator $P^k$, used to compute
the coarse-level matrix $A^{k+1}$, MLD2P4 uses the smoothed aggregation
algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.
The basic idea of this algorithm is to build a coarse set of indices
$\Omega^{k+1}$ by suitably grouping the indices of $\Omega^k$ into disjoint
subsets (aggregates), and to define the coarse-to-fine space transfer operator
$P^k$ by applying a suitable smoother to a simple piecewise constant
prolongation operator, with the aim of improving the quality of the coarse-space correction.
Three main steps can be identified in the smoothed aggregation procedure:
\begin{enumerate}
\item aggregation of the indices set $\Omega^k$, to obtain $\Omega^{k+1}$;
\item construction of the prolongator $P^k$;
\item application of $P^k$ and $R^k=(P^k)^T$ to build $A^{k+1}$.
\end{enumerate}
In order to perform the coarsening step, the smoothed aggregation algorithm
described in~\cite{VANEK_MANDEL_BREZINA} is used. In this algorithm,
each index in $\Omega^{k+1}$ corresponds to an aggregate of $\Omega^k$,
consisting of a suitably chosen index $j$ and of the indices $i$ that are strongly
coupled to $j$, i.e.,
$$
|a_{ij}^k| > \theta \sqrt{|a_{ii}^ka_{jj}^k|},
$$
for a given $\theta \in [0,1]$. Since this algorithm has a sequential nature, a decoupled
version of it is applied, where each processor $i$ independently executes
the algorithm on the set of indices assigned to it in the initial data
distribution. This version is embarrassingly parallel, since it does not require any data
communication. On the other hand, it may produce some non-uniform aggregates
and is strongly dependent on the number of processors and on the initial partitioning
of the matrix $A$. Nevertheless, this parall algorithm has been chosen for
MLD2P4, since it has been shown to produce good results in practice
\cite{aaecc_07,apnum_07,TUMINARO_TONG}.
The prolongator $P^k$ is built starting from a tentative prolongator
$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$, defined as
$$
\bar{P}^k =(\bar{p}_{ij}^k), \quad \bar{p}_{ij}^k =
\left\{ \begin{array}{ll}
1 & \quad \mbox{if} \; i \in \Omega^k_j, \\
0 & \quad \mbox{otherwise},
\end{array} \right.
\label{eq:tent_prol}
$$
where $\Omega^k_j$ is the aggregate of $\Omega^k$
corresponding to the index $j \in \Omega^{k+1}$.
$P^k$ is obtained by applying to $\bar{P}^k$ a smoother
$S^k \in \mathbb{R}^{n_k \times n_k}$:
$$
P^k = S^k \bar{P}^k,
$$
in order to remove nonsmooth components from the range of the prolongator,
and hence to improve the convergence properties of the multi-level
method~\cite{BREZINA_VANEK,Stuben_01}.
A simple choice for $S^k$ is the damped Jacobi smoother:
$$
S^k = I - \omega^k (D^k)^{-1} A^k ,
$$
where $D^k$ is the diagonal matrix with the same diagonal entries as $A^k$,
and $\omega^k$ is an approximation of $4/(3\rho^k)$, where
$\rho^k$ is the spectral radius of $(D^k)^{-1}A^k$.
computed by using some estimate of the spectral radius of $(D^k)^{-1}A^k$
\cite{BREZINA_VANEK}.
\subsection{Smoothers and coarsest-level solvers\label{sec:smoothers}}
The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods,
a hybrid version of the forward and backward Gauss-Seidel methods, and the
additive Schwarz (AS) ones (see, e.g., \cite{Saad_book,dd2_96}).
The hybrid Gauss-Seidel
version is considered because the original Gauss-Seidel method is inherently sequential.
At each iteration of the hybrid version, each parallel process uses the most recent values
of its own local variables and the values of the non-local variables computed at the
previous iteration, obtained by exchanging data with other processes before
the beginning of the current iteration.
In the AS methods, the index space $\Omega^k$ is divided into $m_k$
subsets $\Omega^k_i$ of size $n_{k,i}$, possibly
overlapping. For each $i$ we consider the restriction
operator $R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$
% $R_i^k: \mathbb{R}^{n_k} \longrightarrow \mathbb{R}^{n_{k,i}}$
that maps a vector $x^k$ to the vector $x_i^k$ made of the components of $x^k$
with indices in $\Omega^k_i$, and the prolongation operator
$P^k_i = (R_i^k)^T$. These operators are then used to build
$A_i^k=R_i^kA^kP_i^k$, which is the restriction of $A^k$ to the index
space $\Omega^k_i$.
The classical AS preconditioner $M^k_{AS}$ is defined as
\[
( M^k_{AS} )^{-1} = \sum_{i=1}^{m_k} P_i^k (A_i^k)^{-1} R_i^{k},
\]
where $A_i^k$ is supposed to be nonsingular. We observe that an approximate
inverse of $A_i^k$ is usually considered instead of $(A_i^k)^{-1}$.
The setup of $S^k_{AS}$ during the multilevel build phase
involves
\begin{itemize}
\item the definition of the index subspaces $\Omega_i^k$ and of the corresponding
operators $R_i^k$ (and $P_i^k$);
\item the computation of the submatrices $A_i^k$;
\item the computation of their inverses (usually approximated
through some form of incomplete factorization).
\end{itemize}
The computation of $z^k=M^k_{AS}w^k$, with $w^k \in \mathbb{R}^{n_k}$, during the
multilevel application phase, requires
\begin{itemize}
\item the restriction of $w^k$ to the subspaces $\mathbb{R}^{n_{k,i}}$,
i.e.\ $w_i^k = R_i^{k} w^k$;
\item the computation of the vectors $z_i^k=(A_i^k)^{-1} w_i^k$;
\item the prolongation and the sum of the previous vectors,
i.e.\ $z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$.
\end{itemize}
Variants of the classical AS method, which use modifications of the
restriction and prolongation operators, are also implemented in MLD2P4.
Among them, the Restricted AS (RAS) preconditioner usually
outperforms the classical AS preconditioner in terms of convergence
rate and of computation and communication time on parallel distributed-memory
computers, and is therefore the most widely used among the AS
preconditioners~\cite{CAI_SARKIS}.
Direct solvers based on sparse LU factorizations, implemented in the
third party libraries reported in Section~\ref{sec:third_party}, can be applied
as coarsest-level solvers by MLD2P4. Native inexact solvers based on
incomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel,
and block Jacobi preconditioners are also available. Direct solvers usually
lead to more effective preconditioners in terms of algorithmic scalability;
however, this does not guarantee parallel efficiency.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "userguide"
%%% End:
\ No newline at end of file
+\section{Multigrid Background\label{sec:background}}
\markboth{\textsc{MLD2P4 User's and Reference Guide}}
{\textsc{\ref{sec:background} Multigrid Background}}
Multigrid preconditioners, coupled with Krylov iterative
solvers, are widely used in the parallel solution of large and sparse linear systems,
because of their optimality in the solution of linear systems arising from the
discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.
Optimality, also known as algorithmic scalability, is the property
of having a computational cost per iteration that depends linearly on
the problem size, and a convergence rate that is independent of the problem size.
Multigrid preconditioners are based on a recursive application of a two-grid process
consisting of smoother iterations and a coarse-space (or coarse-level) correction.
The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,
or more complex subspace-correction methods, such as the Schwarz ones.
The coarse-space correction consists of solving, in an appropriately chosen
coarse space, the residual equation associated with the approximate solution computed
by the smoother, and of using the solution of this equation to correct the
previous approximation. The transfer of information between the original
(fine) space and the coarse one is performed by using suitable restriction and
prolongation operators. The construction of the coarse space and the corresponding
transfer operators is carried out by applying a so-called coarsening algorithm to the system
matrix. Two main approaches can be used to perform coarsening: the geometric approach,
which exploits the knowledge of some physical grid associated with the matrix
and requires the user to define transfer operators from the fine
to the coarse level and vice versa, and the algebraic approach, which builds
the coarse-space correction and the associate transfer operators using only matrix
information. The first approach may be difficult when the system comes from
discretizations on complex geometries;
furthermore, ad hoc one-level smoothers may be required to get an efficient
interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients
are considered. The second approach performs a fully automatic coarsening and enforces the
interplay between fine and coarse level by suitably choosing the coarse space and
the coarse-to-fine interpolation (see, e.g., \cite{Briggs2000,Stuben_01,dd2_96} for details.)
MLD2P4 uses a pure algebraic approach, based on the smoothed
aggregation algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA},
for building the sequence of coarse matrices and transfer operators,
starting from the original one.
A decoupled version of this algorithm is implemented, where the smoothed
aggregation is applied locally to each submatrix \cite{TUMINARO_TONG}.
A brief description of the AMG preconditioners implemented in MLD2P4 is given in
Sections~\ref{sec:multilevel}-\ref{sec:smoothers}. For further details the reader
is referred to \cite{para_04,aaecc_07,apnum_07,MLD2P4_TOMS}.
We note that optimal multigrid preconditioners do not necessarily correspond
to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
multigrid preconditioners, a tradeoff between the optimality and the cost of building and
applying the smoothers and the coarse-space corrections must be achieved. Effective
parallel preconditioners require algorithmic scalability to be coupled with implementation
scalability, i.e., a computational cost per iteration which remains (almost) constant as
the number of parallel processors increases.
\subsection{AMG preconditioners\label{sec:multilevel}}
In order to describe the AMG preconditioners available in MLD2P4, we consider a
linear system
\begin{equation}
Ax=b, \label{eq:system}
\end{equation}
where $A=(a_{ij}) \in \mathbb{R}^{n \times n}$ is a nonsingular sparse matrix;
for ease of presentation we assume $A$ is real, but the
results are valid for the complex case as well.
Let us assume as finest index space the set of row (column) indices of $A$, i.e.,
$\Omega = \{1, 2, \ldots, n\}$.
Any algebraic multilevel preconditioners implemented in MLD2P4 generates
a hierarchy of index spaces and a corresponding hierarchy of matrices,
\[ \Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev},
\quad A^1 \equiv A, A^2, \ldots, A^{nlev}, \]
by using the information contained in $A$, without assuming any
knowledge of the geometry of the problem from which $A$ originates.
A vector space $\mathbb{R}^{n_{k}}$ is associated with $\Omega^k$,
where $n_k$ is the size of $\Omega^k$.
For all $k < nlev$, a restriction operator and a prolongation one are built,
which connect two levels $k$ and $k+1$:
$$
P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad
R^k \in \mathbb{R}^{n_{k+1}\times n_k};
$$
%\[
% P^k: \mathbb{R}^{n_{k+1}} \longrightarrow \mathbb{R}^{n_k}, \quad
% R^k: \mathbb{R}^{n_k} \longrightarrow \mathbb{R}^{n_{k+1}};
%\]
the matrix $A^{k+1}$ is computed by using the previous operators according
to the Galerkin approach, i.e.,
$$
A^{k+1}=R^kA^kP^k.
$$
$R^k=(P^k)^T$ in the current implementation of MLD2P4.
A smoother with iteration matrix $M^k$ is set up at each level $k < nlev$, and a solver
is set up at the coarsest level, so that they are ready for application
(for example, setting up a solver based on the $LU$ factorization means computing
and storing the $L$ and $U$ factors). The construction of the hierarchy of AMG components
described so far corresponds to the so-called build phase of the preconditioner.
\begin{figure}[t]
\begin{center}
\framebox{
\begin{minipage}{.85\textwidth}
\begin{tabbing}
\quad \=\quad \=\quad \=\quad \\[-3mm]
procedure V-cycle$\left(k,A^k,b^k,u^k\right)$ \\[2mm]
\>if $\left(k \ne nlev \right)$ then \\[1mm]
\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]
\>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$ \\[1mm]
\>\> $u^{k+1} =$ V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$ \\[1mm]
\>\> $u^k = u^k + P^{k+1} u^{k+1}$ \\[1mm]
\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]
\>else \\[1mm]
\>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm]
\>endif \\[1mm]
\>return $u^k$ \\[1mm]
end
\end{tabbing}
\end{minipage}
}
\caption{Application phase of a V-cycle preconditioner.\label{fig:application_alg}}
\end{center}
\end{figure}
The components produced in the build phase may be combined in several ways
to obtain different multilevel preconditioners;
this is done in the application phase, i.e., in the computation of a vector
of type $w=B^{-1}v$, where $B$ denotes the preconditioner, usually within an iteration
of a Krylov solver \cite{Saad_book}. An example of such a combination, known as
V-cycle, is given in Figure~\ref{fig:application_alg}. In this case, a single iteration
of the same smoother is used before and after the the recursive call to the V-cycle (i.e.,
in the pre-smoothing and post-smoothing phases); however, different choices can be
performed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycle
and W-cycle~\cite{Briggs2000}, and a version of the K-cycle described in~\cite{Notay2008}.
\subsection{Smoothed Aggregation\label{sec:aggregation}}
In order to define the prolongator $P^k$, used to compute
the coarse-level matrix $A^{k+1}$, MLD2P4 uses the smoothed aggregation
algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.
The basic idea of this algorithm is to build a coarse set of indices
$\Omega^{k+1}$ by suitably grouping the indices of $\Omega^k$ into disjoint
subsets (aggregates), and to define the coarse-to-fine space transfer operator
$P^k$ by applying a suitable smoother to a simple piecewise constant
prolongation operator, with the aim of improving the quality of the coarse-space correction.
Three main steps can be identified in the smoothed aggregation procedure:
\begin{enumerate}
\item aggregation of the indices of $\Omega^k$ to obtain $\Omega^{k+1}$;
\item construction of the prolongator $P^k$;
\item application of $P^k$ and $R^k=(P^k)^T$ to build $A^{k+1}$.
\end{enumerate}
In order to perform the coarsening step, the smoothed aggregation algorithm
described in~\cite{VANEK_MANDEL_BREZINA} is used. In this algorithm,
each index $j \in \Omega^{k+1}$ corresponds to an aggregate $\Omega^k_j$ of $\Omega^k$,
consisting of a suitably chosen index $i \in \Omega^k$ and indices that are (usually) contained in a
strongly-coupled neighborood of $i$, i.e.,
\begin{equation}
\label{eq:strongly_coup}
\Omega^k_j \subset \mathcal{N}_i^k(\theta) =
\left\{ r \in \Omega^k: |a_{ir}^k| > \theta \sqrt{|a_{ii}^ka_{rr}^k|} \right \} \cup \left\{ i \right\},
\end{equation}
for a given threshold $\theta \in [0,1]$ (see~\cite{VANEK_MANDEL_BREZINA} for the details).
Since this algorithm has a sequential nature, a decoupled
version of it is applied, where each processor independently executes
the algorithm on the set of indices assigned to it in the initial data
distribution. This version is embarrassingly parallel, since it does not require any data
communication. On the other hand, it may produce some nonuniform aggregates
and is strongly dependent on the number of processors and on the initial partitioning
of the matrix $A$. Nevertheless, this parallel algorithm has been chosen for
MLD2P4, since it has been shown to produce good results in practice
\cite{aaecc_07,apnum_07,TUMINARO_TONG}.
The prolongator $P^k$ is built starting from a tentative prolongator
$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$, defined as
$$
\bar{P}^k =(\bar{p}_{ij}^k), \quad \bar{p}_{ij}^k =
\left\{ \begin{array}{ll}
1 & \quad \mbox{if} \; i \in \Omega^k_j, \\
0 & \quad \mbox{otherwise},
\end{array} \right.
\label{eq:tent_prol}
$$
where $\Omega^k_j$ is the aggregate of $\Omega^k$
corresponding to the index $j \in \Omega^{k+1}$.
$P^k$ is obtained by applying to $\bar{P}^k$ a smoother
$S^k \in \mathbb{R}^{n_k \times n_k}$:
$$
P^k = S^k \bar{P}^k,
$$
in order to remove nonsmooth components from the range of the prolongator,
and hence to improve the convergence properties of the multi-level
method~\cite{BREZINA_VANEK,Stuben_01}.
A simple choice for $S^k$ is the damped Jacobi smoother:
$$
S^k = I - \omega^k (D^k)^{-1} A^k_F ,
$$
where $D^k$ is the diagonal matrix with the same diagonal entries as $A^k$,
$A^k_F = (\bar{a}_{ij}^k)$ is the filtered matrix defined as
\begin{equation}
\label{eq:filtered}
\bar{a}_{ij}^k =
\left \{ \begin{array}{ll}
a_{ij}^k & \mbox{if } j \in \mathcal{N}_i^k(\theta), \\
0 & \mbox{otherwise},
\end{array} \right.
\; (j \ne i),
\qquad
\bar{a}_{ii}^k = a_{ii}^k - \sum_{j \ne i} (a_{ij}^k - \bar{a}_{ij}^k),
\end{equation}
and $\omega^k$ is an approximation of $4/(3\rho^k)$, where
$\rho^k$ is the spectral radius of $(D^k)^{-1}A^k_F$ \cite{BREZINA_VANEK}.
In MLD2P4 this approximation is obtained by using $\| A^k_F \|_\infty$ as an estimate
of $\rho^k$. Note that for systems coming from uniformly elliptic
problems, filtering the matrix $A^k$ has little or no effect, and
$A^k$ can be used instead of $A^k_F$. The latter choice is the default in MLD2P4.
\subsection{Smoothers and coarsest-level solvers\label{sec:smoothers}}
The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods,
a hybrid version of the forward and backward Gauss-Seidel methods, and the
additive Schwarz (AS) ones (see, e.g., \cite{Saad_book,dd2_96}).
The hybrid Gauss-Seidel
version is considered because the original Gauss-Seidel method is inherently sequential.
At each iteration of the hybrid version, each parallel process uses the most recent values
of its own local variables and the values of the non-local variables computed at the
previous iteration, obtained by exchanging data with other processes before
the beginning of the current iteration.
In the AS methods, the index space $\Omega^k$ is divided into $m_k$
subsets $\Omega^k_i$ of size $n_{k,i}$, possibly
overlapping. For each $i$ we consider the restriction
operator $R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$
% $R_i^k: \mathbb{R}^{n_k} \longrightarrow \mathbb{R}^{n_{k,i}}$
that maps a vector $x^k$ to the vector $x_i^k$ made of the components of $x^k$
with indices in $\Omega^k_i$, and the prolongation operator
$P^k_i = (R_i^k)^T$. These operators are then used to build
$A_i^k=R_i^kA^kP_i^k$, which is the restriction of $A^k$ to the index
space $\Omega^k_i$.
The classical AS preconditioner $M^k_{AS}$ is defined as
\[
( M^k_{AS} )^{-1} = \sum_{i=1}^{m_k} P_i^k (A_i^k)^{-1} R_i^{k},
\]
where $A_i^k$ is supposed to be nonsingular. We observe that an approximate
inverse of $A_i^k$ is usually considered instead of $(A_i^k)^{-1}$.
The setup of $M^k_{AS}$ during the multilevel build phase
involves
\begin{itemize}
\item the definition of the index subspaces $\Omega_i^k$ and of the corresponding
operators $R_i^k$ (and $P_i^k$);
\item the computation of the submatrices $A_i^k$;
\item the computation of their inverses (usually approximated
through some form of incomplete factorization).
\end{itemize}
The computation of $z^k=M^k_{AS}w^k$, with $w^k \in \mathbb{R}^{n_k}$, during the
multilevel application phase, requires
\begin{itemize}
\item the restriction of $w^k$ to the subspaces $\mathbb{R}^{n_{k,i}}$,
i.e.\ $w_i^k = R_i^{k} w^k$;
\item the computation of the vectors $z_i^k=(A_i^k)^{-1} w_i^k$;
\item the prolongation and the sum of the previous vectors,
i.e.\ $z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$.
\end{itemize}
Variants of the classical AS method, which use modifications of the
restriction and prolongation operators, are also implemented in MLD2P4.
Among them, the Restricted AS (RAS) preconditioner usually
outperforms the classical AS preconditioner in terms of convergence
rate and of computation and communication time on parallel distributed-memory
computers, and is therefore the most widely used among the AS
preconditioners~\cite{CAI_SARKIS}.
Direct solvers based on sparse LU factorizations, implemented in the
third-party libraries reported in Section~\ref{sec:third-party}, can be applied
as coarsest-level solvers by MLD2P4. Native inexact solvers based on
incomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel,
and block Jacobi preconditioners are also available. Direct solvers usually
lead to more effective preconditioners in terms of algorithmic scalability;
however, this does not guarantee parallel efficiency.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "userguide"
%%% End:
\ No newline at end of file
diff --git a/docs/src/bibliography.tex b/docs/src/bibliography.tex
index 7d8290fa..98305063 100644
--- a/docs/src/bibliography.tex
+++ b/docs/src/bibliography.tex
@@ -6,13 +6,19 @@
%\let\refname\relax
%
+\bibitem{MUMPS}
+P.~R.~Amestoy, C.~Ashcraft, O.~Boiteau, A.~Buttari, J.~L'Excellent, C.~Weisbecker,
+{\em Improving multifrontal methods by means of block low-rank representations},
+SIAM Journal on Scientific Computing, volume 37 (3), 2015, A1452--A1474.
+See also {\tt http://mumps.enseeiht.fr}.
+%
\bibitem{BREZINA_VANEK}
-M.~Brezina, P.~Van{\v e}k,
+M.~Brezina, P.~Van\v{e}k,
{\em A Black-Box Iterative Solver Based on a Two-Level Schwarz Method},
Computing, 63, 1999, 233--263.
%
\bibitem{Briggs2000}
-W.~L.~Briggs, V.~E.~Henson, S.~F.~ McCormick,
+W.~L.~Briggs, V.~E.~Henson, S.~F.~McCormick,
{\em A Multigrid Tutorial, Second Edition},
SIAM, 2000.
%
@@ -32,28 +38,29 @@ Applicable Algebra in Engineering, Communications and Computing,
18 (3) 2007, 223--239.
%Published online: 13 February 2007, {\tt http://dx.doi.org/10.1007/s00200-007-0035-z}
%
-\bibitem{apnum_07} P.~D'Ambra, S.~Filippone, D.~di~Serafino,
-{\em On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners},
-Applied Numerical Mathematics, Elsevier Science,
-57 (11-12), 2007, 1181-1196.
-%published online 3 February 2007, {\tt
-% http://dx.doi.org/10.1016/j.apnum.2007.01.006}
-%
\bibitem{CAI_SARKIS}
X.~C.~Cai, M.~Sarkis,
{\em A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems},
SIAM Journal on Scientific Computing, 21 (2), 1999, 792--797.
%
-\bibitem{Cai_Widlund_92}
-X.~C.~Cai, O.~B.~Widlund,
-{\em Domain Decomposition Algorithms for Indefinite Elliptic Problems},
-SIAM Journal on Scientific and Statistical Computing, 13 (1), 1992, 243--258.
+%\bibitem{Cai_Widlund_92}
+%X.~C.~Cai, O.~B.~Widlund,
+%{\em Domain Decomposition Algorithms for Indefinite Elliptic Problems},
+%SIAM Journal on Scientific and Statistical Computing, 13 (1), 1992, 243--258.
+%
+%\bibitem{dd1_94}
+%T.~Chan and T.~Mathew,
+%{\em Domain Decomposition Algorithms},
+%in A.~Iserles, editor, Acta Numerica 1994, 61--143.
+%Cambridge University Press.
%
-\bibitem{dd1_94}
-T.~Chan and T.~Mathew,
-{\em Domain Decomposition Algorithms},
-in A.~Iserles, editor, Acta Numerica 1994, 61--143.
-Cambridge University Press.
+\bibitem{apnum_07}
+P.~D'Ambra, S.~Filippone, D.~di~Serafino,
+{\em On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners},
+Applied Numerical Mathematics, Elsevier Science,
+57 (11-12), 2007, 1181-1196.
+%published online 3 February 2007, {\tt
+% http://dx.doi.org/10.1016/j.apnum.2007.01.006}
%
\bibitem{MLD2P4_TOMS}
P.~D'Ambra, D.~di~Serafino, S.~Filippone,
@@ -62,20 +69,14 @@ Algebraic Domain Decomposition Preconditioners
in Fortran 95}, ACM Trans. Math. Softw., 37(3), 2010, art. 30.
%
\bibitem{UMFPACK}
-T.A.~Davis,
+T.~A.~Davis,
{\em Algorithm 832: UMFPACK - an Unsymmetric-pattern Multifrontal
Method with a Column Pre-ordering Strategy},
ACM Transactions on Mathematical Software, 30, 2004, 196--199.
(See also {\tt http://www.cise.ufl.edu/~davis/})
%
-\bibitem{MUMPS}
-P.R.~Amestoy, C.~Ashcraft, O.~Boiteau, A.~Buttari, J.~L'Excellent, C.~Weisbecker
-{\em Improving multifrontal methods by means of block low-rank representations},
-SIAM Journal on Scientific Computing, volume 37 (3), 2015, A1452--A1474.
-See also {\tt http://mumps.enseeiht.fr}.
-%
\bibitem{SUPERLU}
-J.W.~Demmel, S.C.~Eisenstat, J.R.~Gilbert, X.S.~Li and J.W.H.~Liu,
+J.~W.~Demmel, S.~C.~Eisenstat, J.~R.~Gilbert, X.~S.~Li, J.~W.~H.~Liu,
A supernodal approach to sparse partial pivoting,
SIAM Journal on Matrix Analysis and Applications, 20 (3), 1999, 720--755.
%
@@ -89,16 +90,16 @@ J.~J.~Dongarra, J.~Du Croz, S.~Hammarling, R.~J.~Hanson,
\emph{An extended set of FORTRAN Basic Linear Algebra Subprograms},
ACM Transactions on Mathematical Software, 14 (1) 1988, 1--17.
%
-\bibitem{BLACS}
-J.~J.~Dongarra and R.~C.~Whaley,
-{\em A User's Guide to the BLACS v.~1.1},
-Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of
-Tennessee, March 1995 (updated May 1997).
+%\bibitem{BLACS}
+%J.~J.~Dongarra, R.~C.~Whaley,
+%{\em A User's Guide to the BLACS v.~1.1},
+%Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of
+%Tennessee, March 1995 (updated May 1997).
%
-\bibitem{EFSTATHIOU}
-E.~Efstathiou, J.~G.~Gander,
-{\em Why Restricted Additive Schwarz Converges Faster than Additive Schwarz},
-BIT Numerical Mathematics, 43 (5), 2003, 945--959.
+%\bibitem{EFSTATHIOU}
+%E.~Efstathiou, J.~G.~Gander,
+%{\em Why Restricted Additive Schwarz Converges Faster than Additive Schwarz},
+%BIT Numerical Mathematics, 43 (5), 2003, 945--959.
%
\bibitem{PSBLASGUIDE}
S.~Filippone, A.~Buttari,
@@ -106,9 +107,9 @@ S.~Filippone, A.~Buttari,
available from \texttt{http://www.ce.uniroma2.it/psblas/}.
%
\bibitem{PSBLAS3}
-Salvatore Filippone and Alfredo Buttari.
+S.~Filippone, A.~Buttari,
{\em Object-Oriented Techniques for Sparse Matrix Computations in Fortran 2003}.
-ACM Transactions on on Mathematical Software, 38 (4), 2012, art. 23.
+ACM Transactions on on Mathematical Software, 38 (4), 2012, art.~23.
%
\bibitem{psblas_00}
S.~Filippone, M.~Colajanni,
@@ -127,12 +128,14 @@ C.~L.~Lawson, R.~J.~Hanson, D.~Kincaid, F.~T.~Krogh,
ACM Transactions on Mathematical Software, 5 (3), 1979, 308--323.
%
\bibitem{SUPERLUDIST}
-X.~S.~Li, J.~W.~Demmel, {\em SuperLU\_DIST: A Scalable Distributed-memory
+X.~S.~Li, J.~W.~Demmel,
+{\em SuperLU\_DIST: A Scalable Distributed-memory
Sparse Direct Solver for Unsymmetric Linear Systems},
ACM Transactions on Mathematical Software, 29 (2), 2003, 110--140.
%
\bibitem{Notay2008}
-Y.~Notay, P.~S.~Vassilevski, {\em Recursive Krylov-based multigrid cycles},
+Y.~Notay, P.~S.~Vassilevski,
+{\em Recursive Krylov-based multigrid cycles},
Numerical Linear Algebra with Applications, 15 (5), 2008, 473--487.
%
\bibitem{Saad_book}
@@ -149,7 +152,7 @@ Cambridge University Press, 1996.
M.~Snir, S.~Otto, S.~Huss-Lederman, D.~Walker, J.~Dongarra,
{\em MPI: The Complete Reference. Volume 1 - The MPI Core}, second edition,
MIT Press, 1998.
-%%
+%
\bibitem{Stuben_01}
K.~St\"{u}ben,
{\em An Introduction to Algebraic Multigrid},
@@ -161,9 +164,8 @@ R.~S.~Tuminaro, C.~Tong,
{\em Parallel Smoothed Aggregation Multigrid: Aggregation Strategies on Massively Parallel Machines}, in J. Donnelley, editor, Proceedings of SuperComputing 2000, Dallas, 2000.
%
\bibitem{VANEK_MANDEL_BREZINA}
-P.~Van{\v e}k, J.~Mandel and M.~Brezina,
+P.~Van\v{e}k, J.~Mandel, M.~Brezina,
{\em Algebraic Multigrid by Smoothed Aggregation for Second and Fourth Order Elliptic Problems},
Computing, 56 (3) 1996, 179--196.
-%
\end{thebibliography}
diff --git a/docs/src/building.tex b/docs/src/building.tex
index bc0e49e6..163d2dfe 100644
--- a/docs/src/building.tex
+++ b/docs/src/building.tex
@@ -17,7 +17,15 @@ recommend to use at least version 4.8.
The software defines data types and interfaces for
real and complex data, in both single and double precision.
-\subsection{Prerequisites}
+Building MLD2P4 requires some base libraries (see Section~\ref{sec:prerequisites});
+interfaces to optional third-party libraries, which extend the functionalities of MLD2P4
+(see Section~\ref{sec:third-party}), are also available. Many Linux distributions
+(e.g., Ubuntu, Fedora, CentOS) provide precompiled packages for the prerequisite and
+optional software. In many cases these packages are split between a runtime part and a
+``developer'' part; in order to build MLD2P4 you need both. A description of the base and
+optional software used by MLD2P4 is given in the next sections.
+
+\subsection{Prerequisites\label{sec:prerequisites}}
The following base libraries are needed:
\begin{description}
@@ -68,18 +76,18 @@ for multi-level preconditioners may change to reflect their presence.
A sparse LU factorization package available from \url{mumps.enseeiht.fr};
it provides sequential and parallel factorizations and triangular system solution
for single and double precision, real and complex data.
- We tested versions 4.10.0 and version 5.0.1.
+ We tested versions 4.10.0 and 5.0.1.
\item[SuperLU] \cite{SUPERLU}
A sparse LU factorization package available from
\url{crd.lbl.gov/~xiaoye/SuperLU/}; it provides sequential
factorization and triangular system solution for single and double precision,
- real and complex data. We tested version 4.3 and 5.0. If you installed BLAS from
+ real and complex data. We tested versions 4.3 and 5.0. If you installed BLAS from
ATLAS, remember to define the BLASLIB variable in the make.inc file.
\item[SuperLU\_Dist] \cite{SUPERLUDIST}
A sparse LU factorization package available
from the same site as SuperLU; it provides parallel factorization and
triangular system solution for double precision real and complex data.
- We tested version 3.3 and 4.2. If you installed BLAS from
+ We tested versions 3.3 and 4.2. If you installed BLAS from
ATLAS, remember to define the BLASLIB variable in the make.inc file and
to add the \verb|-std=c99| option to the C compiler options.
Note that this library requires the ParMETIS
@@ -278,7 +286,7 @@ install directory under the name \verb|Make.inc.MLD2P4|.
To use the MUMPS solver package,
the user has to add the appropriate options to the configure script;
by default we are looking for the libraries
-\verb|-ldmumps -lsmumps| \verb|-lzmumps -mumps_common -lpord|.
+\verb|-ldmumps -lsmumps| \verb| -lzmumps -lcmumps -mumps_common -lpord|.
MUMPS often uses additional packages such as ScaLAPACK, ParMETIS,
SCOTCH, as well as enabling OpenMP; in such cases it is necessary to
add linker options with the \verb|--with-extra-libs| configure option.
@@ -291,23 +299,23 @@ followed (optionally) by
\begin{verbatim}
make install
\end{verbatim}
-Many Linux distributions (e.g. Ubuntu, Fedora, CentOS) provide
-precompiled packages for the prerequisite softwares; in many cases the
-software packages are split between a runtime part and a ``developer''
-part, to rebuild MLD2P4 you'll need both.
-
\subsection{Bug reporting}
-If you find any bugs in our codes, please let us know at
-\begin{rawhtml}
-
-\end{rawhtml}
-\texttt{bugreport@mld2p4.it}
-\begin{rawhtml}
-
-\end{rawhtml}
-; be aware that
-the amount of information needed to reproduce a problem in a parallel
-program may vary quite a lot.
+If you find any bugs in our codes, please send an email to \\[2mm]
+\texttt{pasqua.dambra@cnr.it} \\
+\texttt{daniela.diserafino@unicampania.it} \\
+\texttt{salvatore.filippone@cranfield.ac.uk} \\[2mm]
+% please let us know at
+%\begin{rawhtml}
+%
+%\end{rawhtml}
+%\texttt{bugreport@mld2p4.it}
+%\begin{rawhtml}
+%
+%\end{rawhtml}
+%;
+You should be aware that the amount of information needed to reproduce a problem
+in a parallel program may vary quite a lot.
+
\subsection{Example and test programs\label{sec:ex_and_test}}
The package contains the \verb|examples| and \verb|tests| directories;
both of them are further divided into \verb|fileread| and
diff --git a/docs/src/distribution.tex b/docs/src/distribution.tex
index 6fd6c7ae..bc16b84b 100644
--- a/docs/src/distribution.tex
+++ b/docs/src/distribution.tex
@@ -8,7 +8,6 @@ MLD2P4 is available from the web site
\texttt{http://www.mld2p4.it}
\end{quotation}
where contact points for further information can be also found.
-{\bf Passiamo subito a GitHub?}
The software is available under a modified BSD license, as specified
in Appendix~\ref{sec:license}; please note that some of the optional
diff --git a/docs/src/gettingstarted.tex b/docs/src/gettingstarted.tex
index 8f99a12c..758d28f1 100644
--- a/docs/src/gettingstarted.tex
+++ b/docs/src/gettingstarted.tex
@@ -70,7 +70,7 @@ No preconditioner &\verb|'NOPREC'|& Considered only to use the PSBLAS
Krylov solvers with no preconditioner. \\ \hline
Diagonal & \verb|'DIAG'| or \verb|'JACOBI'| & Diagonal preconditioner.
For any zero diagonal entry of the matrix to be preconditioned,
- the corresponding entry of he preconditioner is set to~1.\\ \hline
+ the corresponding entry of the preconditioner is set to~1.\\ \hline
Block Jacobi & \verb|'BJAC'| & Block-Jacobi with ILU(0) on the local blocks.\\ \hline
Additive Schwarz & \verb|'AS'| & Restricted Additive Schwarz (RAS),
with overlap~1 and ILU(0) on the local blocks. \\ \hline
@@ -176,11 +176,11 @@ the corresponding codes are available in \verb|examples/fileread/|.
! with basic smoothed aggregation, 1 hybrid forward/backward
! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
! solver
- call P%init(P,'ML',info)
+ call P%init('ML',info)
!
! build the preconditioner
- call P%hierarchy_build(A,desc_A,P,info)
- call P%smoothers_build(A,desc_A,P,info)
+ call P%hierarchy_build(A,desc_A,info)
+ call P%smoothers_build(A,desc_A,info)
!
! set the solver parameters and the initial guess
@@ -191,7 +191,7 @@ the corresponding codes are available in \verb|examples/fileread/|.
... ...
!
! deallocate the preconditioner
- call P%free(P,info)
+ call P%free(info)
!
! deallocate other data structures
... ...
@@ -260,12 +260,12 @@ boundary conditions are also available in the directory \verb|examples/pdegen|.
! build a V-cycle preconditioner with 1 block-Jacobi sweep (with
! ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi
! sweeps (with ILU(0) on the blocks) as coarsest-level solver
- call P%init(P,'ML',info)
- call_P%set(P,'SMOOTHER_TYPE','BJAC',info)
- call P%set(P,'COARSE_SOLVE','BJAC',info)
- call P%set(P,'COARSE_SWEEPS',8,info)
- call P%hierarchy_build(A,desc_A,P,info)
- call P%smoothers_build(A,desc_A,P,info)
+ call P%init('ML',info)
+ call_P%set('SMOOTHER_TYPE','BJAC',info)
+ call P%set('COARSE_SOLVE','BJAC',info)
+ call P%set('COARSE_SWEEPS',8,info)
+ call P%hierarchy_build(A,desc_A,info)
+ call P%smoothers_build(A,desc_A,info)
... ...
\end{verbatim}
}
@@ -284,17 +284,17 @@ boundary conditions are also available in the directory \verb|examples/pdegen|.
! build a W-cycle preconditioner with 2 Gauss-Seidel sweeps as
! post-smoother (and no pre-smoother), a distributed coarsest
! matrix, and MUMPS as coarsest-level solver
- call P%init(P,'ML',info)
+ call P%init('ML',info)
call P%set('ML_TYPE','WCYCLE',info)
call P%set('SMOOTHER_TYPE','GS',info)
call P%set('SMOOTHER_SWEEPS',0,info,pos='PRE')
call P%set('SMOOTHER_SWEEPS',2,info,pos='POST')
call P%set('COARSE_SOLVE','MUMPS',info)
call P%set('COARSE_MAT','DIST',info)
- call P%hierarchy_build(A,desc_A,P,info)
- call P%smoothers_build(A,desc_A,P,info)
+ call P%hierarchy_build(A,desc_A,info)
+ call P%smoothers_build(A,desc_A,info)
... ...
-! solve Ax=b with preconditioned CG
+! solve Ax=b with preconditioned BiCGSTAB
call psb_krylov('BICGSTAB',A,P,b,x,tol,desc_A,info)
\end{verbatim}
}
@@ -310,9 +310,9 @@ boundary conditions are also available in the directory \verb|examples/pdegen|.
\begin{verbatim}
... ...
! set RAS with overlap 2 and ILU(0) on the local blocks
- call P%init(P,'AS',info)
- call P%set(P,'SUB_OVR',2,info)
- call P%bld(A,desc_A,P,info)
+ call P%init('AS',info)
+ call P%set('SUB_OVR',2,info)
+ call P%bld(A,desc_A,info)
... ...
\end{verbatim}
}
diff --git a/docs/src/intro.tex b/docs/src/intro.tex
deleted file mode 100644
index 864ab263..00000000
--- a/docs/src/intro.tex
+++ /dev/null
@@ -1,34 +0,0 @@
-\section{Introduction}\label{sec:intro}
-\markboth{\underline{MLD2P4 User's and Reference Guide}}
- {\underline{\ref{sec:overview} Introduction}}
-
-The MLD2P4 library provides ....
-
-
-\subsection{Programming model}
-
-The MLD2P4 librarary is based on the Single Program Multiple Data
-(SPMD) programming model: each process participating in the
-computation performs the same actions on a chunk of data. Parallelism
-is thus data-driven.
-
-Because of this structure, many subroutines coordinate their action
-across the various processes, thus providing an implicit
-synchronization point, and therefore \emph{must} be
-called simultaneously by all processes participating in the
-computation.
-However there are many cases where no synchronization, and indeed no
-communication among processes, is implied.
-
-Throughout this user's guide each subroutine will be clearly indicated
-as:
-\begin{description}
-\item[Synchronous:] must be called simultaneously by all the
- processes in the relevant communication context;
-\item[Asynchronous:] may be called in a totally independent manner.
-\end{description}
-
-%%% Local Variables:
-%%% mode: latex
-%%% TeX-master: "userguide"
-%%% End:
diff --git a/docs/src/newobjects.tex b/docs/src/newobjects.tex
new file mode 100644
index 00000000..aa7ec330
--- /dev/null
+++ b/docs/src/newobjects.tex
@@ -0,0 +1,85 @@
+
+\clearpage
+
+\section{Adding new smoother and solver objects to MLD2P4\label{sec:adding}}
+
+Developers can add completely new smoother and/or solver classes
+derived from the base objects in the library (see Remark~2 in Section~\ref{sec:precset}),
+without recompiling the library itself.
+
+To do so, it is necessary first to select the base type to be extended.
+In our experience, it is quite likely that the new application needs
+only the definition of a ``solver'' object, which is almost
+always acting only on the local part of the distributed matrix.
+The parallel actions required to connect the various solver objects
+are most often already provided by the block-Jacobi or the additive
+Schwarz smoothers. To define a new solver, the developer will then
+have to define its components and methods, perhaps taking one of the
+predefined solvers as a starting point, if possible.
+
+Once the new smoother/solver class has been developed, to use it in
+the context of the multilevel preconditioners it is necessary to:
+\begin{itemize}
+\item declare in the application program a variable of the new type;
+\item pass that variable as the argument to the \verb|set| routine as in the
+following:
+\begin{center}
+\verb|call p%set(smoother,info [,ilev,ilmax,pos])|\\
+\verb|call p%set(solver,info [,ilev,ilmax,pos])|
+\end{center}
+\item link the code implementing the various methods into the application executable.
+\end{itemize}
+The new solver object is then dynamically included in the
+preconditioner structure, and acts as a \emph{mold} to which the
+preconditioner will conform, even though the MLD2P4 library has not
+been modified to account for this new development.
+
+It is possible to define new values for the keyword \verb|WHAT| in the
+\verb|set| routine; if the library code does not recognize a keyword,
+it passes it down the composition hierarchy (levels containing
+smoothers containing in turn solvers), so that it can be eventually caught by
+the new solver.
+
+An example is provided in the source code distribution under the
+folder \verb|tests/newslv|. In this example we are implementing a new
+incomplete factorization variant (which is simply the ILU(0)
+factorization under a new name). Because of the specifics of this case, it is
+possible to reuse the basic structure of the ILU solver, with its
+L/D/U components and the methods needed to apply the solver; only a
+few methods, such as the description and most importantly the build,
+need to be ovverridden (rewritten).
+
+The interfaces for the calls shown above are defined using
+\begin{center}
+\begin{tabular}{p{1.4cm}p{12cm}}
+\verb|smoother| & \verb|class(mld_x_base_smoother_type)| \\
+ & The user-defined new smoother to be employed in the
+ preconditioner.\\
+\verb|solver| & \verb|class(mld_x_base_solver_type)| \\
+ & The user-defined new solver to be employed in the
+ preconditioner.
+\end{tabular}
+\end{center}
+The other arguments are defined in the way described in
+Sec.~\ref{sec:precset}. As an example, in the \verb|tests/newslv|
+code we define a new object of type \verb|mld_d_tlu_solver_type|, and
+we pass it as follows:
+\begin{verbatim}
+
+ ! sparse matrix and preconditioner
+ type(psb_dspmat_type) :: a
+ type(mld_dprec_type) :: prec
+ type(mld_d_tlu_solver_type) :: tlusv
+
+......
+ !
+ ! prepare the preconditioner: an ML with defaults, but with TLU solver at
+ ! intermediate levels. All other parameters are at default values.
+ !
+ call prec%init('ML', info)
+ call prec%hierarchy_build(a,desc_a,info)
+ nlv = prec%get_nlevs()
+ call prec%set(tlusv, info,ilev=1,ilmax=max(1,nlv-1))
+ call prec%smoothers_build(a,desc_a,info)
+
+\end{verbatim}
diff --git a/docs/src/overview.tex b/docs/src/overview.tex
index 30079b4b..6c88056f 100644
--- a/docs/src/overview.tex
+++ b/docs/src/overview.tex
@@ -25,7 +25,7 @@ The multi-level preconditioners implemented in MLD2P4 are obtained by combining
AMG cycles with smoothers and coarsest-level solvers. The V-, W-, and
K-cycles~\cite{Briggs2000,Notay2008} are available, which allow to define
almost all the preconditioners in the package, including the multi-level hybrid
-Schwarz ones; a specific cycle is implemented to obained multi-level additive
+Schwarz ones; a specific cycle is implemented to obtain multi-level additive
Schwarz preconditioners. The Jacobi, hybrid
%\footnote{see Note 2 in Table~\ref{tab:p_coarse}, p.~28.}
forward/backward Gauss-Seidel, block-Jacobi, and additive Schwarz methods
@@ -62,8 +62,8 @@ portability, modularity ed extensibility in the development of the preconditione
package. On the other hand, the implementation of MLD2P4 has led to some
revisions and extentions of the original PSBLAS kernels.
The inter-process comunication required by MLD2P4 is encapsulated
-in the PSBLAS routines;% , except few cases where MPI~\cite{MPI1} is explicitly called.
-% \textbf{E' ancora cos\'{i} o adesso \`e tutto incapsulato in PSBLAS?}
+in the PSBLAS routines;
+% , except few cases where MPI~\cite{MPI1} is explicitly called.
therefore, MLD2P4 can be run on any parallel machine where PSBLAS
implementations are available.
diff --git a/docs/src/precs.tex b/docs/src/precs.tex
deleted file mode 100644
index 582b2b12..00000000
--- a/docs/src/precs.tex
+++ /dev/null
@@ -1,384 +0,0 @@
-\section{Preconditioner routines}
-\label{sec:precs}
-\markboth{\underline{MLD2P4 User's and Reference Guide}}
- {\underline{\ref{sec:precs} Preconditioners}}
-
-% \section{Preconditioners}
-\label{sec:psprecs}
-The MLD2P4 library contains the implementation of many preconditioning
-techniques. The preconditioners may be applied as normal ``base''
-preconditioners; alternatively multiple ``base'' preconditioners may
-be combined in a multilevel framework.
-
-The base (one-level) preconditioners include:
-\begin{itemize}
-\item Diagonal Scaling
-\item Block Jacobi
-\item Hybrid Gauss-Seidel;
-\item Additive Schwarz, Restricted Additive Schwarz and
- Additive Schwarz with Harmonic extensions;
-\end{itemize}
-The Jacobi and Additive Schwarz preconditioners can make use of the
-following solvers:
-\begin{itemize}
-\item Level-$p$ Incomplete LU factorization ($ILU(p)$);
-\item Threshold Incomplete LU factorization ($ILU(\tau,p)$);
-\item Complete LU factorization by means of the following optional
- external packages:
-\begin{itemize}
-\item UMFPACK;
-\item SuperLU;
-\item SuperLU\_Dist.
-\end{itemize}
-\end{itemize}
-
-The supporting data type and subroutine interfaces are defined in the
-module \verb|mld_prec_mod|; the module also overrides the variables
-and tyep definitions of \verb|psb_prec_mod| so as to function as a
-drop-in replacement for the PSBLAS methods. Thus if the user does not
-wish to employ the additional MLD2P4 capabitlities, it is possible to
-migrate an existing PSBLAS program without any source code
-modifications, only a recompilation is needed.
-
-%% We also provide a companion package of multi-level Additive
-%% Schwarz preconditioners called MD2P4; this is actually a family of
-%% preconditioners since there is the possibility to choose between
-%% many variants, and is currently in an experimental stateIts
-%% documentation is planned to appear after stabilization of the
-%% package, which will characterize release 2.1 of our library.
-
-
-
-
-\subroutine{mld\_precinit}{Initialize a preconditioner}
-
-\syntax{call mld\_precinit}{prec, ptype, info}
-\syntax*{call mld\_precinit}{prec, ptype, info, nlev}
-
-\begin{description}
-\item[Type:] Asynchronous.
-\item[\bf On Entry]
-\item[ptype] the type of preconditioner.
-Scope: {\bf global} \\
-Type: {\bf required}\\
-Intent: {\bf in}.\\
-Specified as: a character string, see usage notes.
-\item[nlev] Number of levels in a multilevel precondtioner.
-Scope: {\bf global} \\
-Type: {\bf optional}\\
-Specified as: an integer value, see usage notes.
-%% \item[rs]
-%% Scope: {\bf global} \\
-%% Type: {\bf optional}\\
-%% Specified as: a long precision real number.
-\item[\bf On Exit]
-
-\item[prec]
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: a preconditioner data structure \precdata.
-\item[info]
-Scope: {\bf global} \\
-Type: {\bf required}\\
-Intent: {\bf out}.\\
-Error code: if no error, 0 is returned.
-\end{description}
-\subsection*{Usage Notes}
-%% The PSBLAS 2.0 contains a number of preconditioners, ranging from a
-%% simple diagonal scaling to 2-level domain decomposition. These
-%% preconditioners may use the SuperLU or the UMFPACK software, if
-%% installed; see~\cite{SUPERLU,UMFPACK}.
-Legal inputs to this subroutine are interpreted depending on the
-$ptype$ string as follows\footnote{The string is case-insensitive}:
-\begin{description}
-\item[NONE] No preconditioning, i.e. the preconditioner is just a copy
- operator.
-\item[DIAG] Diagonal scaling; each entry of the input vector is
- multiplied by the reciprocal of the sum of the absolute values of
- the coefficients in the corresponding row of matrix $A$;
-\item[BJAC] Precondition by a factorization of the
- block-diagonal of matrix $A$, where block boundaries are determined
- by the data allocation boundaries for each process; requires no
- communication.
-\item[AS] Additive Schwarz; default is to apply the Restricted
- Additive Schwarz variant, with an $ILU(0)$ factorization
-\item[ML] Multilevel preconditioner.
-\end{description}
-
-
-
-\subroutine{mld\_precset}{Set preconditioner features}
-
-\syntax{call mld\_precset}{prec, what, val, info, ilev, pos}
-\syntax{call prec\%set}{what, val, info, ilev, pos}
-
-
-\begin{description}
-\item[Type:] Asynchronous.
-\item[\bf On Entry]
-\item[prec] the preconditioner.\\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: an already initialized precondtioner data structure \precdata\\
-\item[what] The feature to be set. \\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}.\\
-Specified as: an integer constant or a string. Symbolic names are
-available in the library module, see usage notes for legal values.
-\item[val] The value to set the chosen feature to. \\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}.\\
-Specified as: an integer, double precision or character variable.
-Symbolic names for some choices are available in the library module,
-see usage notes for legal values.
-\item[ilev] The level of a multilevel preconditioner to which the
- feature choice should apply.\\
-Scope: {\bf global} \\
-Type: {\bf optional}\\
-Specified as: an integer value, see usage notes.
-\item[pos] The position of the smoother/solver to which the current
- setting applies.
- feature choice should apply.\\
-Scope: {\bf global} \\
-Type: {\bf optional}\\
-Specified as: a character variable, with values \verb|pre| or \verb|post|.
-\end{description}
-
-\begin{description}
-\item[\bf On Return]
-\item[prec] the preconditioner.\\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: a precondtioner data structure \precdata\\
-\item[info] Error code.\\
-Scope: {\bf local} \\
-Type: {\bf required} \\
-Intent: {\bf out}.\\
-An integer value; 0 means no error has been detected.
-\end{description}
-
-\subsection*{Usage Notes}
-Legal inputs to this subroutine are interpreted depending on the value
-of \verb|what| input as follows
-\begin{description}
-\item[mld\_coarse\_mat\_]
-\end{description}
-
-\subroutine{mld\_hierarchy\_bld}{Builds a matrix hierarchy}
-
-\syntax{call mld\_hierarchy\_bld}{a, desc\_a, prec, info}
-
-\begin{description}
-\item[Type:] Synchronous.
-\item[\bf On Entry]
-\item[a] the system sparse matrix.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}, target.\\
-Specified as: a sparse matrix data structure \spdata.
-\item[prec] the preconditioner.\\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: an already initialized precondtioner data structure \precdata\\
-\item[desc\_a] the problem communication descriptor.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}, target.\\
-Specified as: a communication descriptor data structure \descdata.
-%% \item[upd]
-%% Scope: {\bf global} \\
-%% Type: {\bf optional}\\
-%% Intent: {\bf in}.\\
-%% Specified as: a character.
-\end{description}
-
-\begin{description}
-\item[\bf On Return]
-\item[prec] the preconditioner, with all the matrices and transfer
- operators updated.\\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: a precondtioner data structure \precdata\\
-\item[info] Error code.\\
-Scope: {\bf local} \\
-Type: {\bf required} \\
-Intent: {\bf out}.\\
-An integer value; 0 means no error has been detected.
-\end{description}
-
-
-\subroutine{mld\_smoothers\_bld}{Builds the smoothers}
-
-\syntax{call mld\_smoothers\_bld}{a, desc\_a, prec, info}
-
-\begin{description}
-\item[Type:] Synchronous.
-\item[\bf On Entry]
-\item[a] the system sparse matrix.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}, target.\\
-Specified as: a sparse matrix data structure \spdata.
-\item[prec] the preconditioner.\\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: an already initialized precondtioner data structure
-\precdata\ with an already built matrix hierarchy \\
-\item[desc\_a] the problem communication descriptor.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}, target.\\
-Specified as: a communication descriptor data structure \descdata.
-%% \item[upd]
-%% Scope: {\bf global} \\
-%% Type: {\bf optional}\\
-%% Intent: {\bf in}.\\
-%% Specified as: a character.
-\end{description}
-
-\begin{description}
-\item[\bf On Return]
-\item[prec] the preconditioner, with all the smoothers and solvers
- updated.\\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: a precondtioner data structure \precdata\\
-\item[info] Error code.\\
-Scope: {\bf local} \\
-Type: {\bf required} \\
-Intent: {\bf out}.\\
-An integer value; 0 means no error has been detected.
-\end{description}
-
-
-
-
-\subroutine{mld\_precbld}{Builds a preconditioner}
-
-\syntax{call mld\_precbld}{a, desc\_a, prec, info}
-
-\begin{description}
-\item[Type:] Synchronous.
-\item[\bf On Entry]
-\item[a] the system sparse matrix.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}, target.\\
-Specified as: a sparse matrix data structure \spdata.
-\item[prec] the preconditioner.\\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: an already initialized precondtioner data structure \precdata\\
-\item[desc\_a] the problem communication descriptor.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}, target.\\
-Specified as: a communication descriptor data structure \descdata.
-%% \item[upd]
-%% Scope: {\bf global} \\
-%% Type: {\bf optional}\\
-%% Intent: {\bf in}.\\
-%% Specified as: a character.
-\end{description}
-
-\begin{description}
-\item[\bf On Return]
-\item[prec] the preconditioner.\\
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: a precondtioner data structure \precdata\\
-\item[info] Error code.\\
-Scope: {\bf local} \\
-Type: {\bf required} \\
-Intent: {\bf out}.\\
-An integer value; 0 means no error has been detected.
-\end{description}
-
-
-\subsection*{Usage Notes}
-A call to this routine is equivalent to a call to
-\verb|mld_hierarchy_bld| followed by a call to \verb|mld_smnoothers_bld|.
-
-
-\subroutine{mld\_precaply}{Preconditioner application routine}
-
-\syntax{call mld\_precaply}{prec,x,y,desc\_a,info,trans,work}
-\syntax*{call mld\_precaply}{prec,x,desc\_a,info,trans}
-
-\begin{description}
-\item[Type:] Synchronous.
-\item[\bf On Entry]
-\item[prec] the preconditioner.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}.\\
-Specified as: a preconditioner data structure \precdata.
-\item[x] the source vector.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: a double precision array.
-\item[desc\_a] the problem communication descriptor.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}.\\
-Specified as: a communication data structure \descdata.
-\item[trans]
-Scope: {\bf } \\
-Type: {\bf optional}\\
-Intent: {\bf in}.\\
-Specified as: a character.
-\item[work] an optional work space
-Scope: {\bf local} \\
-Type: {\bf optional}\\
-Intent: {\bf inout}.\\
-Specified as: a double precision array.
-\end{description}
-
-\begin{description}
-\item[\bf On Return]
-\item[y] the destination vector.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf inout}.\\
-Specified as: a double precision array.
-\item[info] Error code.\\
-Scope: {\bf local} \\
-Type: {\bf required} \\
-Intent: {\bf out}.\\
-An integer value; 0 means no error has been detected.
-\end{description}
-
-
-
-\subroutine{mld\_prec\_descr}{Prints a description of current preconditioner}
-
-\syntax{call mld\_prec\_descr}{prec}
-
-\begin{description}
-\item[Type:] Asynchronous.
-\item[\bf On Entry]
-\item[prec] the preconditioner.
-Scope: {\bf local} \\
-Type: {\bf required}\\
-Intent: {\bf in}.\\
-Specified as: a preconditioner data structure \precdata.
-\end{description}
-
-
-
-%%% Local Variables:
-%%% mode: latex
-%%% TeX-master: "userguide"
-%%% End:
diff --git a/docs/src/userguide.tex b/docs/src/userguide.tex
index 07cb2688..224a8607 100644
--- a/docs/src/userguide.tex
+++ b/docs/src/userguide.tex
@@ -157,6 +157,7 @@ based on PSBLAS}
\include{background}
\include{gettingstarted}
\include{userinterface}
+\include{newobjects}
\include{errors}
\clearpage
\appendix
diff --git a/docs/src/userhtml.tex b/docs/src/userhtml.tex
index 6425c1c5..de80420e 100644
--- a/docs/src/userhtml.tex
+++ b/docs/src/userhtml.tex
@@ -133,9 +133,8 @@ Software version: 2.1\\
\include{background}
\include{gettingstarted}
\include{userinterface}
-%\include{advanced}
+\include{newobjects}
\include{errors}
-%\include{listofroutines}
\cleardoublepage
\appendix
\include{license}
diff --git a/docs/src/userinterface.tex b/docs/src/userinterface.tex
index ecc6f081..fad93c0a 100644
--- a/docs/src/userinterface.tex
+++ b/docs/src/userinterface.tex
@@ -27,7 +27,7 @@ i.e.,
\verb|s|, \verb|d|, \verb|c|, \verb|z|, according to the sparse
matrix data structure;
\item the arrays containing the vectors $v$ and $w$ involved in
- the preconditioner application $w=M^{-1}v$ must be of type
+ the preconditioner application $w=B^{-1}v$ must be of type
\verb|psb_|\emph{x}\verb|vect_type| with \emph{x} =
\verb|s|, \verb|d|, \verb|c|, \verb|z|, in a manner completely
analogous to the sparse matrix type;
@@ -100,9 +100,8 @@ contained in \verb|val|.
% be chosen according to the real/complex, single/double precision
% version of MLD2P4 under use.\\
\verb|what| & \verb|character(len=*)|. \\
- & The parameter to be set. It can be specified by
- a predefined constant, or through its name; the string
- is case-insensitive. See also
+ & The parameter to be set. It can be specified through its name;
+ the string is case-insensitive. See
Tables~\ref{tab:p_cycle}-\ref{tab:p_smoother_1}.\\
\verb|val | & \verb|integer| \emph{or} \verb|character(len=*)| \emph{or}
\verb|real(psb_spk_)| \emph{or} \verb|real(psb_dpk_)|,
@@ -240,7 +239,7 @@ solver is changed to the default sequential solver.
\textsc{comments} \\ \hline
%\multicolumn{5}{|c|}{\emph{type of the multi-level preconditioner}}\\ \hline
%\verb|mld_ml_cycle_| \par
-\verb|ML_CYCLE| & \verb|character(len=*)|
+\verb|'ML_CYCLE'| & \verb|character(len=*)|
& \texttt{'VCYCLE'} \par \texttt{'WCYCLE'} \par \texttt{'KCYCLE'} \par
\texttt{'MULT'} \par \texttt{'ADD'}
& \texttt{'VCYCLE'}
@@ -249,7 +248,7 @@ solver is changed to the default sequential solver.
Note that hybrid Multiplicative Schwarz is equivalent to V-cycle and
is included for compatibility with previous versions of MLD2P4. \\ \hline
%\verb|mld_outer_sweeps_| \par
- \verb|OUTER_SWEEPS| & \texttt{integer} &
+ \verb|'OUTER_SWEEPS'| & \texttt{integer} &
Any integer \par number $\ge 1$ & 1 &
Number of multi-level cycles. \\ \hline
%\verb|mld_smoother_type_| \par \verb|SMOOTHER_TYPE| & \verb|character(len=*)|
@@ -279,7 +278,7 @@ be applied.
\textsc{comments} \\ \hline
%\multicolumn{5}{|c|}{\emph{aggregation algorithm}} \\ \hline
%\verb|mld_min_coarse_size_| \par
-\verb|MIN_COARSE_SIZE| & \verb|integer|
+\verb|'MIN_COARSE_SIZE'| & \verb|integer|
& Any number \par $> 0$
& $\lfloor 40 \sqrt[3]{n} \rfloor$, where $n$ is the dimension
of the matrix at the finest level
@@ -295,7 +294,7 @@ be applied.
% (see \verb|mld_n_prec_levs_|).
\\ \hline
%\verb|mld_min_cr_ratio_| \par
-\verb|MIN_CR_RATIO| & \verb|real|
+\verb|'MIN_CR_RATIO'| & \verb|real|
& Any number \par $> 1$
& 1.5
& Minimum coarsening ratio. The aggregation stops
@@ -303,13 +302,13 @@ be applied.
at two consecutive levels is lower than or equal to this
threshold (see Note).\\ \hline
%\verb|mld_max_levs_| \par
-\verb|MAX_LEVS| & \verb|integer|
+\verb|'MAX_LEVS'| & \verb|integer|
& Any integer \par number $> 1$
& 20
& Maximum number of levels. The aggregation stops
if the number of levels reaches this value (see Note). \\ \hline
%\verb|mld_par_aggr_alg_| \par
-\verb|PAR_AGGR| & \verb|character(len=*)| \hspace*{-3mm}
+\verb|'PAR_AGGR'| & \verb|character(len=*)| \hspace*{-3mm}
& \texttt{'DEC'}, \texttt{'SYMDEC'}
& \texttt{'DEC'}
& Parallel aggregation algorithm. \par Currently, only the
@@ -318,13 +317,13 @@ be applied.
aggregation to the sparsity pattern
of $A+A^T$.\\ \hline
%\verb|mld_aggr_type_| \par
-\verb|AGGR_TYPE| & \verb|character(len=*)| \hspace*{-3mm}
+\verb|'AGGR_TYPE'| & \verb|character(len=*)| \hspace*{-3mm}
& \textbf{\texttt{'VMB'}} & \textbf{\texttt{'VMB'}}
& Type of aggregation algorithm: currently, the scalar aggregation
algorithm by Van\v{e}k, Mandel and Brezina is implemented
\cite{VANEK_MANDEL_BREZINA}. \\ \hline
%\verb|mld_aggr_prol_| \par
-\verb|AGGR_PROL| & \verb|character(len=*)| \hspace*{-3mm}
+\verb|'AGGR_PROL'| & \verb|character(len=*)| \hspace*{-3mm}
& \texttt{'SMOOTHED'}, \texttt{'UNSMOOTHED'} & \texttt{'SMOOTHED'}
& Prolongator used by the aggregation algorithm: smoothed or unsmoothed
(i.e., tentative prolongator). \\
@@ -351,7 +350,7 @@ of levels. } \\
\verb|what| & \textsc{data type} & \verb|val| & \textsc{default} &
\textsc{comments} \\ \hline
%\verb|mld_aggr_ord_| \par
-\verb|AGGR_ORD| & \verb|character(len=*)|
+\verb|'AGGR_ORD'| & \verb|character(len=*)|
& \texttt{'NATURAL'} \par \texttt{'DEGREE'}
& \texttt{'NATURAL'}
& Initial ordering of indices for the aggregation
@@ -361,11 +360,12 @@ of levels. } \\
%Since aggregation is
%heuristic, results will be different.
%\verb|mld_aggr_thresh_| \par
-\verb|AGGR_THRESH| & \verb|real(|\emph{kind\_parameter}\verb|)|
+\verb|'AGGR_THRESH'| & \verb|real(|\emph{kind\_parameter}\verb|)|
& Any~real \par number~$\in [0, 1]$
& 0.05
- & The threshold $\theta$ in the aggregation algorithm
- (see Note). \\ \hline
+ & The threshold $\theta$ in the aggregation algorithm,
+ see (\ref{eq:strongly_coup}) in Section~\ref{sec:aggregation}.
+ See also the note at the bottom of this table. \\ \hline
%%\verb|mld_aggr_scale_| \par
% \verb|AGGR_SCALE| & \verb|real(|\emph{kind\_parameter}\verb|)|
% & Any~real \par number~$\in [0, 1]$
@@ -373,40 +373,40 @@ of levels. } \\
% & Scale factor applied to the threshold in going
% from level $ilev$ to level $ilev+1$. \\ \hline
%\verb|mld_aggr_omega_alg_| \par
-\verb|AGGR_OMEGA_ALG|& \verb|character(len=*)|
- & \texttt{'EIG\_EST'} \par \texttt{'USER\_CHOICE'}
- & \texttt{'EIG\_EST'}
- & How the damping parameter $\omega$ in the
- smoothed aggregation is obtained:
- either via an estimate of the spectral radius of
- $D^{-1}A$, where $A$ is the matrix at the current
- level and $D$ is the diagonal matrix with
- the same diagonal entires as $A$, or explicily
- specified by the user. \\ \hline
+%\verb|'AGGR_OMEGA_ALG'|& \verb|character(len=*)|
+% & \texttt{'EIG\_EST'} \par \texttt{'USER\_CHOICE'}
+% & \texttt{'EIG\_EST'}
+% & How the damping parameter $\omega$ in the
+% smoothed aggregation is obtained:
+% either via an estimate of the spectral radius of
+% $D^{-1}A$, where $A$ is the matrix at the current
+% level and $D$ is the diagonal matrix with
+% the same diagonal entires as $A$, or explicily
+% specified by the user. \\ \hline
%\verb|mld_aggr_eig_| \par
-\verb|AGGR_EIG| & \verb|character(len=*)|
- & \texttt{'A\_NORMI'}
- & \texttt{'A\_NORMI'}
- & How to estimate the spectral radius of $D^{-1}A$.
- Currently only the infinity norm estimate
- is available. \\ \hline
+%\verb|'AGGR_EIG'| & \verb|character(len=*)|
+% & \texttt{'A\_NORMI'}
+% & \texttt{'A\_NORMI'}
+% & How to estimate the spectral radius of $D^{-1}A$.
+% Currently only the infinity norm estimate
+% is available. \\ \hline
%\verb|mld_aggr_omega_val_| \par
-\verb|AGGR_OMEGA_VAL| & \verb|real(|\emph{kind\_parameter}\verb|)|
- & Any real \par number $>0$
- & $4/(3\rho(D^{-1}A))$
- & Damping parameter $\omega$ in the smoothed aggregation algorithm.
- It must be set by the user if
- \verb|USER_CHOICE| was specified for
- \verb|mld_aggr_omega_alg_|,
- otherwise it is computed by the library, using the
- selected estimate of the spectral radius $\rho(D^{-1}A)$ of
- $D^{-1}A$.\\ \hline
+%\verb|'AGGR_OMEGA_VAL'| & \verb|real(|\emph{kind\_parameter}\verb|)|
+% & Any real \par number $>0$
+% & $4/(3\rho(D^{-1}A))$
+% & Damping parameter $\omega$ in the smoothed aggregation algorithm.
+% It must be set by the user if
+% \verb|USER_CHOICE| was specified for
+% \verb|mld_aggr_omega_alg_|,
+% otherwise it is computed by the library, using the
+% selected estimate of the spectral radius $\rho(D^{-1}A)$ of
+% $D^{-1}A$.\\ \hline
%\verb|mld_aggr_filter_| \par
-\verb|AGGR_FILTER|
+\verb|'AGGR_FILTER'|
& \verb|character(len=*)|
& \texttt{'FILTER'} \par \texttt{'NOFILTER'}
- & \texttt{'NOFILTER'} & Matrix used in computing the smoothed
- prolongator: filtered or unfiltered. \\
+ & \texttt{'NOFILTER'} & Matrix used in computing the smoothed
+ prolongator: filtered or unfiltered (see~(\ref{eq:filtered}) in Section~\ref{sec:aggregation}). \\
\hline
\multicolumn{5}{|l|}{{\bfseries Note.} Different thresholds at different levels, such as
those used in \cite[Section~5.1]{VANEK_MANDEL_BREZINA}, can be easily set by
@@ -428,13 +428,13 @@ the parameter \texttt{ilev}.} \\
\textsc{comments} \\ \hline
%\multicolumn{5}{|c|}{\emph{coarse-space correction at the coarsest level}}\\ \hline
%\verb|mld_coarse_mat_| \par
-\verb|COARSE_MAT| & \verb|character(len=*)|
+\verb|'COARSE_MAT'| & \verb|character(len=*)|
& \texttt{'DIST'} \par \texttt{'REPL'}
& \texttt{'REPL'}
& Coarsest matrix layout: distributed among the processes or
replicated on each of them. \\ \hline
%\verb|mld_coarse_solve_| \par
-\verb|COARSE_SOLVE| & \verb|character(len=*)|
+\verb|'COARSE_SOLVE'| & \verb|character(len=*)|
& \texttt{'MUMPS'} \par \texttt{'UMF'} \par
\texttt{'SLU'} \par \texttt{'SLUDIST'} \par
\texttt{'JACOBI'} \par \texttt{'GS'} \par \texttt{'BJAC'}
@@ -456,7 +456,7 @@ the parameter \texttt{ilev}.} \\
value UMFPACK and SuperLU\_Dist
are available only in double precision. \\ \hline
%\verb|mld_coarse_subsolve_| \par
-\verb|COARSE_SUBSOLVE| & \verb|character(len=*)|
+\verb|'COARSE_SUBSOLVE'| & \verb|character(len=*)|
& \texttt{'ILU'} \par \texttt{'ILUT'} \par \texttt{'MILU'} \par
\texttt{'MUMPS'} \par \texttt{'SLU'} \par \texttt{'UMF'}
& See~Note.
@@ -491,18 +491,18 @@ level.\label{tab:p_coarse}}
\textsc{comments} \\ \hline
%\multicolumn{5}{|c|}{\emph{coarse-space correction at the coarsest level}}\\ \hline
%\verb|mld_coarse_sweeps_| \par
-\verb|COARSE_SWEEPS| & \verb|integer|
+\verb|'COARSE_SWEEPS'| & \verb|integer|
& Any integer \par number $> 0$
& 10
& Number of sweeps when \verb|JACOBI|, \verb|GS| or \verb|BJAC|
is chosen as coarsest-level solver. \\ \hline
%\verb|mld_coarse_fillin_| \par
-\verb|COARSE_FILLIN| & \verb|integer|
+\verb|'COARSE_FILLIN'| & \verb|integer|
& Any integer \par number $\ge 0$
& 0
& Fill-in level $p$ of the ILU factorizations. \\ \hline
%\verb|mld_coarse_iluthrs_| \par
-\verb|COARSE_ILUTHRS|
+\verb|'COARSE_ILUTHRS'|
& \verb|real(|\emph{kind\_parameter}\verb|)|
& Any real \par number $\ge 0$
& 0
@@ -523,7 +523,7 @@ level (continued).\label{tab:p_coarse_1}}
\textsc{comments} \\ \hline
%\multicolumn{5}{|c|}{\emph{basic one-level preconditioner (smoother)}} \\ \hline
%\verb|mld_smoother_type_| \par
-\verb|SMOOTHER_TYPE| & \verb|character(len=*)|
+\verb|'SMOOTHER_TYPE'| & \verb|character(len=*)|
& \verb|'JACOBI'| \par \verb|'GS'| \par \verb|'BGS'| \par \verb|'BJAC'|
\par \verb|'AS'|
& \verb|'FBGS'|
@@ -533,7 +533,7 @@ level (continued).\label{tab:p_coarse_1}}
Additive Schwarz. \par
It is ignored by one-level preconditioners. \\ \hline
%\verb|mld_sub_solve_| \par
-\verb|SUB_SOLVE| & \verb|character(len=*)|
+\verb|'SUB_SOLVE'| & \verb|character(len=*)|
& \texttt{'JACOBI'} \par
\texttt{'GS'} \par \texttt{'BGS'} \par \texttt{'ILU'} \par
\texttt{'ILUT'} \par \texttt{'MILU'} \par
@@ -550,7 +550,7 @@ level (continued).\label{tab:p_coarse_1}}
(plus triangular solve). See Note for details on hybrid
Gauss-Seidel. \\ \hline
%\verb|mld_moother_sweeps_| \par
-\verb|SMOOTHER_SWEEPS| & \verb|integer|
+\verb|'SMOOTHER_SWEEPS'| & \verb|integer|
& Any integer \par number~$\ge 0$
& 1
& Number of sweeps of the smoother or one-level preconditioner.
@@ -559,7 +559,7 @@ level (continued).\label{tab:p_coarse_1}}
together with \verb|pos='PRE'| or \verb|pos='POST|,
respectively. \\ \hline
%\verb|mld_sub_ovr_| \par
-\verb|SUB_OVR| & \verb|integer|
+\verb|'SUB_OVR'| & \verb|integer|
& Any integer \par number~$\ge 0$
& 1
& Number of overlap layers, for Additive Schwarz only. \\
@@ -578,7 +578,7 @@ level (continued).\label{tab:p_coarse_1}}
\verb|what| & \textsc{data type} & \verb|val| & \textsc{default} &
\textsc{comments} \\ \hline
%\verb|mld_sub_restr_| \par
-\verb|SUB_RESTR| & \verb|character(len=*)|
+\verb|'SUB_RESTR'| & \verb|character(len=*)|
& \texttt{'HALO'} \par \texttt{'NONE'}
& \texttt{'HALO'}
& Type of restriction operator, for Additive Schwarz only:
@@ -587,7 +587,7 @@ level (continued).\label{tab:p_coarse_1}}
Note that \texttt{HALO} must be chosen for
the classical Addditive Schwarz smoother and its RAS variant.\\ \hline
%\verb|mld_sub_prol_| \par
-\verb|SUB_PROL| & \verb|character(len=*)|
+\verb|'SUB_PROL'| & \verb|character(len=*)|
& \texttt{'SUM'} \par \texttt{'NONE'}
& \texttt{'NONE'}
& Type of prolongation operator, for Additive Schwarz only:
@@ -596,12 +596,12 @@ level (continued).\label{tab:p_coarse_1}}
Note that \texttt{SUM} must be chosen for the classical Additive
Schwarz smoother, and \texttt{NONE} for its RAS variant. \\ \hline
%\verb|mld_sub_fillin_| \par
-\verb|SUB_FILLIN| & \verb|integer|
+\verb|'SUB_FILLIN'| & \verb|integer|
& Any integer \par number~$\ge 0$
& 0
& Fill-in level $p$ of the incomplete LU factorizations. \\ \hline
%\verb|mld_sub_iluthrs_| \par
-\verb|SUB_ILUTHRS| & \verb|real(|\emph{kind\_parameter}\verb|)|
+\verb|'SUB_ILUTHRS'| & \verb|real(|\emph{kind\_parameter}\verb|)|
& Any real number~$\ge 0$
& 0
& Drop tolerance $t$ in the ILU($p,t$) factorization. \\ %\hline
@@ -741,7 +741,7 @@ hierarchy produced by a previous call to \verb|hierarchy_build|
\end{center}
\noindent
-This routine computes $y = op(M^{-1})\, x$, where $M$ is a previously built
+This routine computes $y = op(B^{-1})\, x$, where $B$ is a previously built
preconditioner, stored into \verb|p|, and $op$
denotes the preconditioner itself or its transpose, according to
the value of \verb|trans|.
@@ -770,10 +770,10 @@ and hence it is completely transparent to the user.
\verb|info| & \verb|integer, intent(out)|.\\
& Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\
\verb|trans| & \verb|character(len=1), optional, intent(in).|\\
- & If \verb|trans| = \verb|'N','n'| then $op(M^{-1}) = M^{-1}$;
- if \verb|trans| = \verb|'T','t'| then $op(M^{-1}) = M^{-T}$
- (transpose of $M^{-1})$; if \verb|trans| = \verb|'C','c'| then $op(M^{-1}) = M^{-C}$
- (conjugate transpose of $M^{-1})$.\\
+ & If \verb|trans| = \verb|'N','n'| then $op(B^{-1}) = B^{-1}$;
+ if \verb|trans| = \verb|'T','t'| then $op(B^{-1}) = B^{-T}$
+ (transpose of $B^{-1})$; if \verb|trans| = \verb|'C','c'| then $op(B^{-1}) = B^{-C}$
+ (conjugate transpose of $B^{-1})$.\\
\verb|work| & \emph{type}\verb|(|\emph{kind\_parameter}\verb|), dimension(:), optional, target|.\\
& Workspace. Its size should be at
least \verb|4 * psb_cd_get_local_| \verb|cols(desc_a)| (see the PSBLAS User's Guide).
@@ -854,66 +854,6 @@ as follows:
\end{center}
-\clearpage
-
-\section{Adding new smoothers and solvers to MLD2P4\label{sec:adding}}
-
-Developers can add completely new smoother and/or solver classes
-derived from the base objects in the library may be used without
-recompiling the library itself.
-
-To do so it is necessary first to select the base type to be extended;
-in our experience, it is quite likely that the new application needs
-only require the definition of a ``solver'' object, which is almost
-always acting only on the local part of the distributed matrix.
-
-The parallel actions required to connect the various solver objects
-are most often already provided by the Block Jacobi or the Additive
-Schwarz smoothers. To define a new solver, the developer will then
-have to define its components and methods, perhaps taking one of the
-predefined solvers as a starting point if possible.
-
-
-Once the new smoother/solver class has been developed, to use it in
-the context of the multilevel preconditioners it is necessary to:
-\begin{itemize}
-\item Declare in the application program a variable of the new type;
-\item Pass that variable as the argument to the se routine as in the
- following:
-\begin{center}
-\verb|call p%set(smoother,info [,ilev, ilmax,pos])|\\
-\verb|call p%set(solver,info [,ilev, ilmax,pos])|
-\end{center}
-\item Link into the application executable the code implementing the
- various methods.
-\end{itemize}
-The new solver object is then dynamically included in the
-preconditioner structure, and will act as a \emph{mold} to which the
-preconditioner will conform, even though the MLD2P4 library has not
-been modified to account for this new development.
-
-It is possible to define new values for the keyword \verb|WHAT| in the
-\verb|set| routines; if the library code does not recognize a keyword,
-it passes it down the composition hierarchy (levels containing
-smoothers containing solvers), so that it can be eventually caught by
-the new solver.
-
-An example is contained in the source code distribution under the
-folder \verb|tests/newslv|. This example solver is simply the ILU(0)
-solver under a new name, but it should give an idea of what needs to
-be done.
-
-\ \\
-
-\begin{tabular}{p{1.2cm}p{12cm}}
-\verb|smoother| & \verb|class(mld_x_base_smoother_type)| \\
- & The user-defined new smoother to be employed in the
- preconditioner.\\
-\verb|solver| & \verb|class(mld_x_base_solver_type)| \\
- & The user-defined new solver to be employed in the
- preconditioner.
-\end{tabular}
-
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "userguide"