|
|
|
\section{Multi-level Domain Decomposition Background\label{sec:background}}
|
|
|
|
\markboth{\underline{MLD2P4 User's and Reference Guide}}
|
|
|
|
{\underline{\ref{sec:background} Background}}
|
|
|
|
|
|
|
|
\emph{Domain Decomposition} (DD) preconditioners, coupled with Krylov iterative
|
|
|
|
solvers, are widely used in the parallel solution of large and sparse linear systems.
|
|
|
|
These preconditioners are based on the divide and conquer technique: the matrix
|
|
|
|
to be preconditioned is divided into submatrices, a ``local linear system''
|
|
|
|
involving each submatrix is (approximately) solved, and the local solutions are used
|
|
|
|
to build a preconditioner for the whole original matrix. This process
|
|
|
|
often corresponds to dividing a physical domain associated to the original matrix
|
|
|
|
into subdomains, e.g. in a PDE discretization, to (approximately) solving the
|
|
|
|
subproblems corresponding to the subdomains and to building an approximate
|
|
|
|
solution of the original problem from the local solutions
|
|
|
|
\cite{Cai_Widlund_92,dd1_94,dd2_96}.
|
|
|
|
|
|
|
|
\emph{Additive Schwarz} preconditioners are DD preconditioners using overlapping
|
|
|
|
submatrices, i.e.\ with some common rows, to couple the local information
|
|
|
|
related to the submatrices (see, e.g., \cite{dd2_96}).
|
|
|
|
The main motivations for choosing Additive Schwarz preconditioners are their
|
|
|
|
intrinsic parallelism. A drawback of these
|
|
|
|
preconditioners is that the number of iterations of the preconditioned solvers
|
|
|
|
generally grows with the number of submatrices. This may be a serious limitation
|
|
|
|
on parallel computers, since the number of submatrices usually matches the number
|
|
|
|
of available processors. Optimal convergence rates, i.e.\ iteration numbers
|
|
|
|
independent of the number of submatrices, can be obtained by correcting the
|
|
|
|
preconditioner through a suitable approximation of the original linear system
|
|
|
|
in a coarse space, which globally couples the information related to the single
|
|
|
|
submatrices.
|
|
|
|
|
|
|
|
\emph{Two-level Schwarz} preconditioners are obtained
|
|
|
|
by combining basic (one-level) Schwarz preconditioners with coarse-level
|
|
|
|
corrections. In this context, the one-level preconditioner is often
|
|
|
|
called smoother. Different two-level preconditioners are obtained by varying the
|
|
|
|
choice of the smoother, of the coarse-level correction and the
|
|
|
|
way they are combined \cite{dd2_96}. The same reasoning can be applied starting
|
|
|
|
from the coarse-level system, i.e.\ a coarse-space correction can be built
|
|
|
|
from this system, thus obtaining \emph{multi-level} preconditioners.
|
|
|
|
|
|
|
|
It is worth noting that optimal preconditioners do not necessarily correspond
|
|
|
|
to minimum execution times. Indeed, to obtain effective multilevel preconditioners
|
|
|
|
a tradeoff between optimality of convergence and the cost of building and applying
|
|
|
|
the coarse-space corrections must be achieved. The choice of the number of levels,
|
|
|
|
i.e.\ of the coarse-space corrections, also affects the effectiveness of the
|
|
|
|
preconditioners. One more goal is to get convergence rates as less sensitive
|
|
|
|
as possible to variations in the matrix coefficients.
|
|
|
|
|
|
|
|
Two main approaches can be used to build coarse-space corrections. The geometric approach
|
|
|
|
applies coarsening strategies based on the knowledge of some physical grid associated
|
|
|
|
to the matrix and requires the user to define grid transfer operators from the fine
|
|
|
|
to the coarse levels and vice versa. This may result difficult for complex geometries;
|
|
|
|
furthermore, suitable one-level preconditioners may be required to get efficient
|
|
|
|
interplay between fine and coarse levels, e.g.\ when matrices with highly varying coefficients
|
|
|
|
are considered. The algebraic approach builds coarse-space corrections using only matrix
|
|
|
|
information. It performs a fully automatic coarsening and enforces the interplay between
|
|
|
|
the fine and coarse levels by suitably choosing the coarse space and the coarse-to-fine
|
|
|
|
interpolation \cite{StubenGMD69_99}.
|
|
|
|
|
|
|
|
MLD2P4 uses a pure algebraic approach for building the sequence of coarse matrices
|
|
|
|
starting from the original matrix. The algebraic approach is based on the \emph{smoothed
|
|
|
|
aggregation} algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}. A decoupled version
|
|
|
|
of this algorithm is implemented, where the smoothed aggregation is applied locally
|
|
|
|
to each submatrix \cite{TUMINARO_TONG}. In the next two subsections we provide
|
|
|
|
a brief description of the multi-level Schwarz preconditioners and on the smoothed
|
|
|
|
aggregation technique as implemented in MLD2P4. For further details the user
|
|
|
|
is referred to \cite{para_04,aaecc_07,apnum_07,dd2_96}.
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{Multi-level Schwarz Preconditioners\label{sec:multilevel}}
|
|
|
|
|
|
|
|
The Multilevel preconditioners implemented in MLD2P4 are obtained by combining
|
|
|
|
AS preconditioners with coarse-space corrections; therefore
|
|
|
|
we first provide a sketch of the AS preconditioners.
|
|
|
|
|
|
|
|
Given the linear system \Ref{system1},
|
|
|
|
where $A=(a_{ij}) \in \Re^{n \times n}$ is a
|
|
|
|
nonsingular sparse matrix with a symmetric non-zero pattern,
|
|
|
|
let $G=(W,E)$ be the adjacency graph of $A$, where $W=\{1, 2, \ldots, n\}$
|
|
|
|
and $E=\{(i,j) : a_{ij} \neq 0\}$ are the vertex set and the edge set of $G$,
|
|
|
|
respectively. Two vertices are called adjacent if there is an edge connecting
|
|
|
|
them. For any integer $\delta > 0$, a $\delta$-overlap
|
|
|
|
partition of $W$ can be defined recursively as follows.
|
|
|
|
Given a 0-overlap (or non-overlapping) partition of $W$,
|
|
|
|
i.e.\ a set of $m$ disjoint nonempty sets $W_i^0 \subset W$ such that
|
|
|
|
$\cup_{i=1}^m W_i^0 = W$, a $\delta$-overlap
|
|
|
|
partition of $W$ is obtained by considering the sets
|
|
|
|
$W_i^\delta \supset W_i^{\delta-1}$, obtained by including the vertices that
|
|
|
|
are adjacent to any vertex in $W_i^{\delta-1}$.
|
|
|
|
|
|
|
|
Let $n_i^\delta$ be the size of $W_i^\delta$ and $R_i^{\delta} \in
|
|
|
|
\Re^{n_i^\delta \times n}$ the restriction operator that maps
|
|
|
|
a vector $v \in \Re^n$ onto the vector $v_i^{\delta} \in \Re^{n_i^\delta}$
|
|
|
|
containing the components of $v$ corresponding to the vertices in
|
|
|
|
$W_i^\delta$. The transpose of $R_i^{\delta}$ is a
|
|
|
|
prolongation operator from $\Re^{n_i^\delta}$ to $\Re^n$.
|
|
|
|
The matrix $A_i^\delta=R_i^\delta A (R_i^\delta)^T \in
|
|
|
|
\Re^{n_i^\delta \times n_i^\delta}$ can be considered
|
|
|
|
as a restriction of $A$ corresponding to the set $W_i^{\delta}$.
|
|
|
|
|
|
|
|
The \emph{classical one-level AS} preconditioner is defined by
|
|
|
|
\[
|
|
|
|
M_{AS}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T
|
|
|
|
(A_i^\delta)^{-1} R_i^{\delta},
|
|
|
|
\]
|
|
|
|
where $A_i^\delta$ is assumed to be nonsingular. Its application
|
|
|
|
to a vector $v \in \Re^n$ within a Krylov solver requires the following
|
|
|
|
three steps:
|
|
|
|
\begin{enumerate}
|
|
|
|
\item restriction of $v$ as $v_i = R_i^{\delta} v$, $i=1,\ldots,m$;
|
|
|
|
\item (approximate) solution of the linear systems $A_i^\delta w_i = v_i$,
|
|
|
|
$i=1,\ldots,m$;
|
|
|
|
\item prolongation and sum of the $w_i$'s, i.e. $w = \sum_{i=1}^m (R_i^{\delta})^T w_i$.
|
|
|
|
\end{enumerate}
|
|
|
|
A variant of the classical AS preconditioner that outperforms it
|
|
|
|
in terms of both convergence rate and of computation and communication
|
|
|
|
time on parallel distributed-memory computers is the so-called \emph{Restricted AS
|
|
|
|
(RAS)} preconditioner~\cite{CAI_SARKIS,EFSTATHIOU}. It
|
|
|
|
is obtained by zeroing the components of $w_i$ corresponding to the
|
|
|
|
overlapping vertices when applying the prolongation. Therefore,
|
|
|
|
RAS differs from classical AS by the prolongation operators,
|
|
|
|
which are substituted by $(\tilde{R}_i^0)^T \in \Re^{n_i^\delta \times n}$,
|
|
|
|
where $\tilde{R}_i^0$ is obtained by zeroing the rows of $R_i^\delta$
|
|
|
|
corresponding to the vertices in $W_i^\delta \backslash W_i^0$:
|
|
|
|
\[
|
|
|
|
M_{RAS}^{-1}= \sum_{i=1}^m (\tilde{R}_i^0)^T
|
|
|
|
(A_i^\delta)^{-1} R_i^{\delta}.
|
|
|
|
\]
|
|
|
|
Analogously, the AS variant called \emph{AS with Harmonic extension (ASH)}
|
|
|
|
is defined by
|
|
|
|
\[ M_{ASH}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T
|
|
|
|
(A_i^\delta)^{-1} \tilde{R}_i^0.
|
|
|
|
\]
|
|
|
|
We note that for $\delta=0$ the three variants of the AS preconditioner are
|
|
|
|
all equal to the block-Jacobi preconditioner.
|
|
|
|
|
|
|
|
As already observed, the convergence rate of the one-level Schwarz
|
|
|
|
preconditioned iterative solvers deteriorates as the number $m$ of partitions
|
|
|
|
of $W$ increases \cite{dd1_94,dd2_96}. To reduce the dependency
|
|
|
|
of the number of iterations on the degree of parallelism we may
|
|
|
|
introduce a global coupling among the overlapping partitions by defining
|
|
|
|
a coarse-space approximation $A_C$ of the matrix $A$.
|
|
|
|
In a pure algebraic setting, $A_C$ is usually built with
|
|
|
|
a Galerkin approach. Given a set $W_C$ of \emph{coarse vertices},
|
|
|
|
with size $n_C$, and a suitable restriction operator
|
|
|
|
$R_C \in \Re^{n_C \times n}$, $A_C$ is defined as
|
|
|
|
\[
|
|
|
|
A_C=R_C A R_C^T
|
|
|
|
\]
|
|
|
|
and the coarse-level correction matrix to be combined with a generic
|
|
|
|
one-level AS preconditioner $M_{1L}$ is obtained as
|
|
|
|
\[
|
|
|
|
M_{C}^{-1}= R_C^T A_C^{-1} R_C,
|
|
|
|
\]
|
|
|
|
where $A_C$ is assumed to be nonsingular. The application of $M_{C}^{-1}$
|
|
|
|
to a vector $v$ corresponds to a restriction, a solution and
|
|
|
|
a prolongation step; the solution step, involving the matrix $A_C$,
|
|
|
|
may be carried out also approximately.
|
|
|
|
|
|
|
|
The combination of $M_{C}$ and $M_{1L}$ may be
|
|
|
|
performed in either an additive or a multiplicative framework.
|
|
|
|
In the former case, the \emph{two-level additive} Schwarz preconditioner
|
|
|
|
is obtained:
|
|
|
|
\[
|
|
|
|
M_{2LA}^{-1} = M_{C}^{-1} + M_{1L}^{-1}.
|
|
|
|
\]
|
|
|
|
Applying $M_{2L-A}^{-1}$ to a vector $v$ within a Krylov solver
|
|
|
|
corresponds to applying $M_{C}^{-1}$
|
|
|
|
and $M_{1L}^{-1}$ to $v$ independently and then summing up
|
|
|
|
the results.
|
|
|
|
|
|
|
|
In the multiplicative case, the combination can be
|
|
|
|
performed by first applying the smoother $M_{1L}^{-1}$ and then
|
|
|
|
the coarse-level correction operator $M_{C}^{-1}$:
|
|
|
|
\[
|
|
|
|
\begin{array}{l}
|
|
|
|
w = M_{1L}^{-1} v, \\
|
|
|
|
z = w + M_{C}^{-1} (v-Aw);
|
|
|
|
\end{array}
|
|
|
|
\]
|
|
|
|
this corresponds to the following \emph{two-level hybrid pre-smoothed}
|
|
|
|
Schwarz preconditioner:
|
|
|
|
\[
|
|
|
|
M_{2LH-PRE}^{-1} = M_{C}^{-1} + \left( I - M_{C}^{-1}A \right) M_{1L}^{-1}.
|
|
|
|
\]
|
|
|
|
On the other hand, by applying the smoother after the coarse-level correction,
|
|
|
|
i.e.\ by computing
|
|
|
|
\[
|
|
|
|
\begin{array}{l}
|
|
|
|
w = M_{C}^{-1} v , \\
|
|
|
|
z = w + M_{1L}^{-1} (v-Aw) ,
|
|
|
|
\end{array}
|
|
|
|
\]
|
|
|
|
the \emph{two-level hybrid post-smoothed}
|
|
|
|
Schwarz preconditioner is obtained:
|
|
|
|
\[
|
|
|
|
M_{2LH-POST}^{-1} = M_{1L}^{-1} + \left( I - M_{1L}^{-1}A \right) M_{C}^{-1}.
|
|
|
|
\]
|
|
|
|
One more variant of two-level hybrid preconditioner is obtained by applying
|
|
|
|
the smoother before and after the coarse-level correction. In this case, the
|
|
|
|
preconditioner is symmetric if $A$, $M_{1L}$ and $M_{C}$ are symmetric.
|
|
|
|
|
|
|
|
As previously noted, on parallel computers the number of sumatrices usually matches
|
|
|
|
the number of available processors. When the size of the system to be preconditioned
|
|
|
|
is very large, the use of many processors, i.e.\ of many small submatrices, often
|
|
|
|
leads to a large coarse-level system, whose solution may be computationally expensive.
|
|
|
|
On the other hand, the use of few processors often leads to local sumatrices that
|
|
|
|
are too expensive to be processed on single processors, because of memory and/or
|
|
|
|
computing requirements. Therefore, it seems natural to use a recursive approach,
|
|
|
|
in which the coarse-level correction is re-applied starting from the current
|
|
|
|
coarse-level system. The corresponding preconditioners are called \emph{multi-level}.
|
|
|
|
One more reason for the multi-level approach is that it may significantly
|
|
|
|
reduce the computational cost of preconditioning with respect to the two-level case
|
|
|
|
(see \cite[Chapter 3]{dd2_96}). Additive and hybrid multilevel preconditioners
|
|
|
|
are obtained as direct extensions of the two-level counterparts.
|
|
|
|
The algorithm for applying a multi-level version of the two-level hybrid
|
|
|
|
post-smoothed preconditioner is reported in Figure~\ref{fig:mlhpost_alg}.
|
|
|
|
Other combinations of the smoothers and coarse-level corrections are possible, leading to variants
|
|
|
|
of the previous algorithms. For a detailed descrition of them, the reader is
|
|
|
|
referred to \cite[Chapter 3]{dd2_96}.
|
|
|
|
%
|
|
|
|
\begin{figure}[t]
|
|
|
|
\begin{center}
|
|
|
|
\framebox{
|
|
|
|
\begin{minipage}{.85\textwidth} {\small
|
|
|
|
\begin{tabbing}
|
|
|
|
\quad \=\quad \=\quad \=\quad \\[-1mm]
|
|
|
|
! assigned the finest matrix\\
|
|
|
|
$A_1 \leftarrow A$;\\[1mm]
|
|
|
|
! defined the number of levels $nlev$ \\[1mm]
|
|
|
|
! defined $nlev-1$ prolongators\\
|
|
|
|
$R_l^T, l=2, \ldots, nlev$;\\[1mm]
|
|
|
|
! defined $nlev-1$ coarser matrices\\
|
|
|
|
$A_l \leftarrow R_lA_{l-1}R_l^T, \; l=2, \ldots, nlev$;\\[1mm]
|
|
|
|
! defined the $nlev-1$ basic Schwarz preconditioners\\
|
|
|
|
$M_l$, basic preconditioner for $A_l \; l=1, \ldots, nlev-1$;\\[1mm]
|
|
|
|
! assigned a vector $v$\\
|
|
|
|
$v_1 \leftarrow v$; \\[2mm]
|
|
|
|
\textbf{for $l=2, nlev$ do}\\[1mm]
|
|
|
|
\> ! transfer $v_{l-1}$ to the next coarser level\\
|
|
|
|
\> $v_l \leftarrow R_lv_{l-1}$; \\[1mm]
|
|
|
|
\textbf{endfor} \\[2mm]
|
|
|
|
! apply the coarsest-level correction\\[1mm]
|
|
|
|
$y_{nlev} \leftarrow A_{nlev}^{-1}*v_{nlev}$;\\[2mm]
|
|
|
|
\textbf{for $l=nlev -1 , 1, -1$ do}\\[1mm]
|
|
|
|
\> ! transfer $y_{l+1}$ to the next finer level\\
|
|
|
|
\> $y_l \leftarrow R_{l+1}^T*y_{l+1}$;\\[1mm]
|
|
|
|
\> ! compute the residual at the current level\\
|
|
|
|
\> $r_l \leftarrow v_l-A_l^{-1}*y_l$;\\[1mm]
|
|
|
|
\> ! apply the basic Schwarz preconditioner to $r_l$\\
|
|
|
|
\> $r_l \leftarrow M_l^{-1}*r_l$\\[1mm]
|
|
|
|
\> ! update $y_l$\\
|
|
|
|
\> $y_l \leftarrow y_l+r_l$\\
|
|
|
|
\textbf{endfor} \\[1mm]
|
|
|
|
! preconditioned vector
|
|
|
|
$w \leftarrow y_1$;
|
|
|
|
\end{tabbing}
|
|
|
|
}
|
|
|
|
\end{minipage}
|
|
|
|
}
|
|
|
|
\caption{Multi-level hybrid post-smoothed preconditioner.\label{fig:mlhpost_alg}}
|
|
|
|
\end{center}
|
|
|
|
\end{figure}
|
|
|
|
%
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{Smoothed Aggregation\label{sec:aggregation}}
|
|
|
|
|
|
|
|
To define the restriction operator $R_C$, which is used to compute
|
|
|
|
the coarse-level matrix $A_C$, MLD2P4 uses the \emph{smoothed aggregation}
|
|
|
|
algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.
|
|
|
|
The basic idea of this algorithm is to build a coarse set of vertices
|
|
|
|
$W_C$ by suitably grouping the vertices of $W$ into disjoint subsets
|
|
|
|
(aggregates), and to define the coarse-to-fine space transfer operator $R_C^T$ by
|
|
|
|
applying a suitable smoother to a simple piecewise constant
|
|
|
|
prolongation operator, to improve the quality of the coarse-space correction.
|
|
|
|
|
|
|
|
Three main steps can be identified in the smoothed aggregation procedure:
|
|
|
|
\begin{enumerate}
|
|
|
|
\item coarsening of the vertex set $W$, to obtain $W_C$;
|
|
|
|
\item construction of the prolongator $R_C^T$;
|
|
|
|
\item application of $R_C$ and $R_C^T$ to build $A_C$.
|
|
|
|
\end{enumerate}
|
|
|
|
%\textbf{NOTA: Controllare cosa fa trilinos dopo il primo passo.}
|
|
|
|
|
|
|
|
To perform the coarsening step, we have implemented the aggregation algorithm sketched
|
|
|
|
in \cite{apnum_07}. According to \cite{BREZINA_VANEK}, a modification of this algorithm
|
|
|
|
has been actually considered,
|
|
|
|
in which each aggregate $N_r$ is made of vertices of $W$ that are \emph{strongly coupled}
|
|
|
|
to a certain root vertex $r \in W$, i.e.\
|
|
|
|
\[ N_r = \left\{s \in W: |a_{rs}| > \theta \sqrt{|a_{rr}a_{ss}|} \right\}
|
|
|
|
\cup \left\{ r \right\} ,
|
|
|
|
\]
|
|
|
|
for a given $\theta \in [0,1]$. \textbf{L'ALGORITMO USA IL MAGGIORE STRETTO? ALTRIMENTI
|
|
|
|
CON THETA=0 AGGREGHIAMO ANCHE I VERTICI CON COEFFICIENTE CORRISPONDENTE NULLO}
|
|
|
|
Since the previous algorithm has a sequential nature, a \emph{decoupled} version of
|
|
|
|
it has been chosen, where each processor $i$ independently applies the algorithm to
|
|
|
|
the set of vertices $W_i^0$ assigned to it in the initial data distribution. This
|
|
|
|
version is embarrassingly parallel, since it does not require any data communication.
|
|
|
|
On the other hand, it may produce non-uniform aggregates near boundary vertices,
|
|
|
|
i.e.\ near vertices adjacent to vertices in other processors, and is strongly
|
|
|
|
dependent on the number of processors and on the initial partitioning of the matrix $A$.
|
|
|
|
Nevertheless, this algorithm has been chosen for the implementation in MLD2P4,
|
|
|
|
since it has been shown to produce good results in practice
|
|
|
|
\cite{aaecc_07,apnum_07,TUMINARO_TONG}.
|
|
|
|
|
|
|
|
The prolongator $P_C=R_C^T$ is built starting from a \emph{tentative prolongator}
|
|
|
|
$P \in \Re^{n \times n_C}$, defined as
|
|
|
|
\begin{equation}
|
|
|
|
P=(p_{ij}), \quad p_{ij}=
|
|
|
|
\left\{ \begin{array}{ll}
|
|
|
|
1 & \quad \mbox{if} \; i \in V^j_C \\
|
|
|
|
0 & \quad \mbox{otherwise}
|
|
|
|
\end{array} \right. .
|
|
|
|
\label{eq:tent_prol}
|
|
|
|
\end{equation}
|
|
|
|
$P_C$ is obtained by
|
|
|
|
applying to $P$ a smoother $S \in \Re^{n \times n}$:
|
|
|
|
\begin{equation}
|
|
|
|
P_C = S P,
|
|
|
|
\label{eq:smoothed_prol}
|
|
|
|
\end{equation}
|
|
|
|
in order to remove oscillatory components from the range of the prolongator
|
|
|
|
and hence to improve the convergence properties of the multi-level
|
|
|
|
Schwarz method \cite{BREZINA_VANEK,StubenGMD69_99}.
|
|
|
|
A simple choice for $S$ is the damped Jacobi smoother:
|
|
|
|
\begin{equation}
|
|
|
|
S = I - \omega D^{-1} A ,
|
|
|
|
\label{eq:jac_smoother}
|
|
|
|
\end{equation}
|
|
|
|
where the value of $\omega$ can be chosen
|
|
|
|
using some estimate of the spectral radius of $D^{-1}A$ \cite{BREZINA_VANEK}.
|
|
|
|
%
|
|
|
|
%\textbf{NOTA: filtering di $A$ nello smoothing, da implementare?}
|
|
|
|
%
|
|
|
|
|
|
|
|
%%% Local Variables:
|
|
|
|
%%% mode: latex
|
|
|
|
%%% TeX-master: "userguide"
|
|
|
|
%%% End:
|