You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
282 lines
14 KiB
TeX
282 lines
14 KiB
TeX
\section{Multigrid Background\label{sec:background}}
|
|
\markboth{\textsc{MLD2P4 User's and Reference Guide}}
|
|
{\textsc{\ref{sec:background} Multigrid Background}}
|
|
|
|
Multigrid preconditioners, coupled with Krylov iterative
|
|
solvers, are widely used in the parallel solution of large and sparse linear systems,
|
|
because of their optimality in the solution of linear systems arising from the
|
|
discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids.
|
|
Optimality, also known as algorithmic scalability, is the property
|
|
of having a computational cost per iteration that depends linearly on
|
|
the problem size, and a convergence rate that is independent of the problem size.
|
|
|
|
Multigrid preconditioners are based on a recursive application of a two-grid process
|
|
consisting of smoother iterations and a coarse-space (or coarse-level) correction.
|
|
The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones,
|
|
or more complex subspace-correction methods, such as the Schwarz ones.
|
|
The coarse-space correction consists of solving, in an appropriately chosen
|
|
coarse space, the residual equation associated with the approximate solution computed
|
|
by the smoother, and of using the solution of this equation to correct the
|
|
previous approximation. The transfer of information between the original
|
|
(fine) space and the coarse one is performed by using suitable restriction and
|
|
prolongation operators. The construction of the coarse space and the corresponding
|
|
transfer operators is carried out by applying a so-called coarsening algorithm to the system
|
|
matrix. Two main approaches can be used to perform coarsening: the geometric approach,
|
|
which exploits the knowledge of some physical grid associated with the matrix
|
|
and requires the user to define transfer operators from the fine
|
|
to the coarse level and vice versa, and the algebraic approach, which builds
|
|
the coarse-space correction and the associate transfer operators using only matrix
|
|
information. The first approach may be difficult when the system comes from
|
|
discretizations on complex geometries;
|
|
furthermore, ad hoc one-level smoothers may be required to get an efficient
|
|
interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients
|
|
are considered. The second approach performs a fully automatic coarsening and enforces the
|
|
interplay between fine and coarse level by suitably choosing the coarse space and
|
|
the coarse-to-fine interpolation (see, e.g., \cite{Briggs2000,Stuben_01,dd2_96} for details.)
|
|
|
|
MLD2P4 uses a pure algebraic approach, based on the smoothed
|
|
aggregation algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA},
|
|
for building the sequence of coarse matrices and transfer operators,
|
|
starting from the original one.
|
|
A decoupled version of this algorithm is implemented, where the smoothed
|
|
aggregation is applied locally to each submatrix \cite{TUMINARO_TONG}.
|
|
A brief description of the AMG preconditioners implemented in MLD2P4 is given in
|
|
Sections~\ref{sec:multilevel}-\ref{sec:smoothers}. For further details the reader
|
|
is referred to \cite{para_04,aaecc_07,apnum_07,MLD2P4_TOMS}.
|
|
|
|
We note that optimal multigrid preconditioners do not necessarily correspond
|
|
to minimum execution times in a parallel setting. Indeed, to obtain effective parallel
|
|
multigrid preconditioners, a tradeoff between the optimality and the cost of building and
|
|
applying the smoothers and the coarse-space corrections must be achieved. Effective
|
|
parallel preconditioners require algorithmic scalability to be coupled with implementation
|
|
scalability, i.e., a computational cost per iteration which remains (almost) constant as
|
|
the number of parallel processors increases.
|
|
|
|
|
|
\subsection{AMG preconditioners\label{sec:multilevel}}
|
|
|
|
In order to describe the AMG preconditioners available in MLD2P4, we consider a
|
|
linear system
|
|
\begin{equation}
|
|
Ax=b, \label{eq:system}
|
|
\end{equation}
|
|
where $A=(a_{ij}) \in \mathbb{R}^{n \times n}$ is a nonsingular sparse matrix;
|
|
for ease of presentation we assume $A$ has a symmetric sparsity
|
|
pattern.
|
|
|
|
Let us consider as finest index space the set of row (column) indices of $A$, i.e.,
|
|
$\Omega = \{1, 2, \ldots, n\}$.
|
|
Any algebraic multilevel preconditioners implemented in MLD2P4 generates
|
|
a hierarchy of index spaces and a corresponding hierarchy of matrices,
|
|
\[ \Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev},
|
|
\quad A^1 \equiv A, A^2, \ldots, A^{nlev}, \]
|
|
by using the information contained in $A$, without assuming any
|
|
knowledge of the geometry of the problem from which $A$ originates.
|
|
A vector space $\mathbb{R}^{n_{k}}$ is associated with $\Omega^k$,
|
|
where $n_k$ is the size of $\Omega^k$.
|
|
For all $k < nlev$, a restriction operator and a prolongation one are built,
|
|
which connect two levels $k$ and $k+1$:
|
|
\[
|
|
P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad
|
|
R^k \in \mathbb{R}^{n_{k+1}\times n_k};
|
|
\]
|
|
the matrix $A^{k+1}$ is computed by using the previous operators according
|
|
to the Galerkin approach, i.e.,
|
|
\[
|
|
A^{k+1}=R^kA^kP^k.
|
|
\]
|
|
In the current implementation of MLD2P4 we have $R^k=(P^k)^T$
|
|
A smoother with iteration matrix $M^k$ is set up at each level $k < nlev$, and a solver
|
|
is set up at the coarsest level, so that they are ready for application
|
|
(for example, setting up a solver based on the $LU$ factorization means computing
|
|
and storing the $L$ and $U$ factors). The construction of the hierarchy of AMG components
|
|
described so far corresponds to the so-called build phase of the preconditioner.
|
|
|
|
\begin{figure}[t]
|
|
\begin{center}
|
|
\framebox{
|
|
\begin{minipage}{.85\textwidth}
|
|
\begin{tabbing}
|
|
\quad \=\quad \=\quad \=\quad \\[-3mm]
|
|
procedure V-cycle$\left(k,A^k,b^k,u^k\right)$ \\[2mm]
|
|
\>if $\left(k \ne nlev \right)$ then \\[1mm]
|
|
\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]
|
|
\>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$ \\[1mm]
|
|
\>\> $u^{k+1} =$ V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$ \\[1mm]
|
|
\>\> $u^k = u^k + P^{k+1} u^{k+1}$ \\[1mm]
|
|
\>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm]
|
|
\>else \\[1mm]
|
|
\>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm]
|
|
\>endif \\[1mm]
|
|
\>return $u^k$ \\[1mm]
|
|
end
|
|
\end{tabbing}
|
|
\end{minipage}
|
|
}
|
|
\caption{Application phase of a V-cycle preconditioner.\label{fig:application_alg}}
|
|
\end{center}
|
|
\end{figure}
|
|
|
|
The components produced in the build phase may be combined in several ways
|
|
to obtain different multilevel preconditioners;
|
|
this is done in the application phase, i.e., in the computation of a vector
|
|
of type $w=B^{-1}v$, where $B$ denotes the preconditioner, usually within an iteration
|
|
of a Krylov solver \cite{Saad_book}. An example of such a combination, known as
|
|
V-cycle, is given in Figure~\ref{fig:application_alg}. In this case, a single iteration
|
|
of the same smoother is used before and after the the recursive call to the V-cycle (i.e.,
|
|
in the pre-smoothing and post-smoothing phases); however, different choices can be
|
|
performed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycle
|
|
and W-cycle~\cite{Briggs2000}, and a version of the K-cycle described
|
|
in~\cite{Notay2008}.
|
|
|
|
|
|
\subsection{Smoothed Aggregation\label{sec:aggregation}}
|
|
|
|
In order to define the prolongator $P^k$, used to compute
|
|
the coarse-level matrix $A^{k+1}$, MLD2P4 uses the smoothed aggregation
|
|
algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.
|
|
The basic idea of this algorithm is to build a coarse set of indices
|
|
$\Omega^{k+1}$ by suitably grouping the indices of $\Omega^k$ into disjoint
|
|
subsets (aggregates), and to define the coarse-to-fine space transfer operator
|
|
$P^k$ by applying a suitable smoother to a simple piecewise constant
|
|
prolongation operator, with the aim of improving the quality of the coarse-space correction.
|
|
|
|
Three main steps can be identified in the smoothed aggregation procedure:
|
|
\begin{enumerate}
|
|
\item aggregation of the indices of $\Omega^k$ to obtain $\Omega^{k+1}$;
|
|
\item construction of the prolongator $P^k$;
|
|
\item application of $P^k$ and $R^k=(P^k)^T$ to build $A^{k+1}$.
|
|
\end{enumerate}
|
|
|
|
In order to perform the coarsening step, the smoothed aggregation algorithm
|
|
described in~\cite{VANEK_MANDEL_BREZINA} is used. In this algorithm,
|
|
each index $j \in \Omega^{k+1}$ corresponds to an aggregate $\Omega^k_j$ of $\Omega^k$,
|
|
consisting of a suitably chosen index $i \in \Omega^k$ and indices that are (usually) contained in a
|
|
strongly-coupled neighborood of $i$, i.e.,
|
|
\begin{equation}
|
|
\label{eq:strongly_coup}
|
|
\Omega^k_j \subset \mathcal{N}_i^k(\theta) =
|
|
\left\{ r \in \Omega^k: |a_{ir}^k| > \theta \sqrt{|a_{ii}^ka_{rr}^k|} \right \} \cup \left\{ i \right\},
|
|
\end{equation}
|
|
for a given threshold $\theta \in [0,1]$ (see~\cite{VANEK_MANDEL_BREZINA} for the details).
|
|
Since this algorithm has a sequential nature, a decoupled
|
|
version of it is applied, where each processor independently executes
|
|
the algorithm on the set of indices assigned to it in the initial data
|
|
distribution. This version is embarrassingly parallel, since it does not require any data
|
|
communication. On the other hand, it may produce some nonuniform aggregates
|
|
and is strongly dependent on the number of processors and on the initial partitioning
|
|
of the matrix $A$. Nevertheless, this parallel algorithm has been chosen for
|
|
MLD2P4, since it has been shown to produce good results in practice
|
|
\cite{aaecc_07,apnum_07,TUMINARO_TONG}.
|
|
|
|
The prolongator $P^k$ is built starting from a tentative prolongator
|
|
$\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$, defined as
|
|
\begin{equation}
|
|
\bar{P}^k =(\bar{p}_{ij}^k), \quad \bar{p}_{ij}^k =
|
|
\left\{ \begin{array}{ll}
|
|
1 & \quad \mbox{if} \; i \in \Omega^k_j, \\
|
|
0 & \quad \mbox{otherwise},
|
|
\end{array} \right.
|
|
\label{eq:tent_prol}
|
|
\end{equation}
|
|
where $\Omega^k_j$ is the aggregate of $\Omega^k$
|
|
corresponding to the index $j \in \Omega^{k+1}$.
|
|
$P^k$ is obtained by applying to $\bar{P}^k$ a smoother
|
|
$S^k \in \mathbb{R}^{n_k \times n_k}$:
|
|
$$
|
|
P^k = S^k \bar{P}^k,
|
|
$$
|
|
in order to remove nonsmooth components from the range of the prolongator,
|
|
and hence to improve the convergence properties of the multilevel
|
|
method~\cite{BREZINA_VANEK,Stuben_01}.
|
|
A simple choice for $S^k$ is the damped Jacobi smoother:
|
|
\[
|
|
S^k = I - \omega^k (D^k)^{-1} A^k_F ,
|
|
\]
|
|
where $D^k$ is the diagonal matrix with the same diagonal entries as $A^k$,
|
|
$A^k_F = (\bar{a}_{ij}^k)$ is the filtered matrix defined as
|
|
\begin{equation}
|
|
\label{eq:filtered}
|
|
\bar{a}_{ij}^k =
|
|
\left \{ \begin{array}{ll}
|
|
a_{ij}^k & \mbox{if } j \in \mathcal{N}_i^k(\theta), \\
|
|
0 & \mbox{otherwise},
|
|
\end{array} \right.
|
|
\; (j \ne i),
|
|
\qquad
|
|
\bar{a}_{ii}^k = a_{ii}^k - \sum_{j \ne i} (a_{ij}^k - \bar{a}_{ij}^k),
|
|
\end{equation}
|
|
and $\omega^k$ is an approximation of $4/(3\rho^k)$, where
|
|
$\rho^k$ is the spectral radius of $(D^k)^{-1}A^k_F$ \cite{BREZINA_VANEK}.
|
|
In MLD2P4 this approximation is obtained by using $\| A^k_F \|_\infty$ as an estimate
|
|
of $\rho^k$. Note that for systems coming from uniformly elliptic
|
|
problems, filtering the matrix $A^k$ has little or no effect, and
|
|
$A^k$ can be used instead of $A^k_F$. The latter choice is the default in MLD2P4.
|
|
|
|
\subsection{Smoothers and coarsest-level solvers\label{sec:smoothers}}
|
|
|
|
The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods,
|
|
a hybrid version of the forward and backward Gauss-Seidel methods, and the
|
|
additive Schwarz (AS) ones (see, e.g., \cite{Saad_book,dd2_96}).
|
|
|
|
The hybrid Gauss-Seidel
|
|
version is considered because the original Gauss-Seidel method is inherently sequential.
|
|
At each iteration of the hybrid version, each parallel process uses the most recent values
|
|
of its own local variables and the values of the non-local variables computed at the
|
|
previous iteration, obtained by exchanging data with other processes before
|
|
the beginning of the current iteration.
|
|
|
|
In the AS methods, the index space $\Omega^k$ is divided into $m_k$
|
|
subsets $\Omega^k_i$ of size $n_{k,i}$, possibly
|
|
overlapping. For each $i$ we consider the restriction
|
|
operator $R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$
|
|
that maps a vector $x^k$ to the vector $x_i^k$ made of the components of $x^k$
|
|
with indices in $\Omega^k_i$, and the prolongation operator
|
|
$P^k_i = (R_i^k)^T$. These operators are then used to build
|
|
$A_i^k=R_i^kA^kP_i^k$, which is the restriction of $A^k$ to the index
|
|
space $\Omega^k_i$.
|
|
The classical AS preconditioner $M^k_{AS}$ is defined as
|
|
\[
|
|
( M^k_{AS} )^{-1} = \sum_{i=1}^{m_k} P_i^k (A_i^k)^{-1} R_i^{k},
|
|
\]
|
|
where $A_i^k$ is supposed to be nonsingular. We observe that an approximate
|
|
inverse of $A_i^k$ is usually considered instead of $(A_i^k)^{-1}$.
|
|
The setup of $M^k_{AS}$ during the multilevel build phase
|
|
involves
|
|
\begin{itemize}
|
|
\item the definition of the index subspaces $\Omega_i^k$ and of the corresponding
|
|
operators $R_i^k$ (and $P_i^k$);
|
|
\item the computation of the submatrices $A_i^k$;
|
|
\item the computation of their inverses (usually approximated
|
|
through some form of incomplete factorization).
|
|
\end{itemize}
|
|
The computation of $z^k=M^k_{AS}w^k$, with $w^k \in \mathbb{R}^{n_k}$, during the
|
|
multilevel application phase, requires
|
|
\begin{itemize}
|
|
\item the restriction of $w^k$ to the subspaces $\mathbb{R}^{n_{k,i}}$,
|
|
i.e.\ $w_i^k = R_i^{k} w^k$;
|
|
\item the computation of the vectors $z_i^k=(A_i^k)^{-1} w_i^k$;
|
|
\item the prolongation and the sum of the previous vectors,
|
|
i.e.\ $z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$.
|
|
\end{itemize}
|
|
Variants of the classical AS method, which use modifications of the
|
|
restriction and prolongation operators, are also implemented in MLD2P4.
|
|
Among them, the Restricted AS (RAS) preconditioner usually
|
|
outperforms the classical AS preconditioner in terms of convergence
|
|
rate and of computation and communication time on parallel distributed-memory
|
|
computers, and is therefore the most widely used among the AS
|
|
preconditioners~\cite{CAI_SARKIS}.
|
|
|
|
Direct solvers based on sparse LU factorizations, implemented in the
|
|
third-party libraries reported in Section~\ref{sec:third-party}, can be applied
|
|
as coarsest-level solvers by MLD2P4. Native inexact solvers based on
|
|
incomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel,
|
|
and block Jacobi preconditioners are also available. Direct solvers usually
|
|
lead to more effective preconditioners in terms of algorithmic scalability;
|
|
however, this does not guarantee parallel efficiency.
|
|
|
|
%%% Local Variables:
|
|
%%% mode: latex
|
|
%%% TeX-master: "userguide"
|
|
%%% End:
|