You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
349 lines
16 KiB
TeX
349 lines
16 KiB
TeX
17 years ago
|
\section{Multi-level Domain Decomposition Background\label{sec:background}}
|
||
|
\markboth{\textsc{MLD2P4 User's and Reference Guide}}
|
||
|
{\textsc{\ref{sec:background} Multi-level Domain Decomposition Background}}
|
||
|
|
||
|
\emph{Domain Decomposition} (DD) preconditioners, coupled with Krylov iterative
|
||
|
solvers, are widely used in the parallel solution of large and sparse linear systems.
|
||
|
These preconditioners are based on the divide and conquer technique: the matrix
|
||
|
to be preconditioned is divided into submatrices, a ``local'' linear system
|
||
|
involving each submatrix is (approximately) solved, and the local solutions are used
|
||
|
to build a preconditioner for the whole original matrix. This process
|
||
|
often corresponds to dividing a physical domain associated to the original matrix
|
||
|
into subdomains, e.g. in a PDE discretization, to (approximately) solving the
|
||
|
subproblems corresponding to the subdomains and to building an approximate
|
||
|
solution of the original problem from the local solutions
|
||
|
\cite{Cai_Widlund_92,dd1_94,dd2_96}.
|
||
|
|
||
|
\emph{Additive Schwarz} preconditioners are DD preconditioners using overlapping
|
||
|
submatrices, i.e.\ with some common rows, to couple the local information
|
||
|
related to the submatrices (see, e.g., \cite{dd2_96}).
|
||
|
The main motivation for choosing Additive Schwarz preconditioners is their
|
||
|
intrinsic parallelism. A drawback of these
|
||
|
preconditioners is that the number of iterations of the preconditioned solvers
|
||
|
generally grows with the number of submatrices. This may be a serious limitation
|
||
|
on parallel computers, since the number of submatrices usually matches the number
|
||
|
of available processors. Optimal convergence rates, i.e.\ iteration numbers
|
||
|
independent of the number of submatrices, can be obtained by correcting the
|
||
|
preconditioner through a suitable approximation of the original linear system
|
||
|
in a coarse space, which globally couples the information related to the single
|
||
|
submatrices.
|
||
|
|
||
|
\emph{Two-level Schwarz} preconditioners are obtained
|
||
|
by combining basic (one-level) Sch\-warz preconditioners with a coarse-level
|
||
|
correction. In this context, the one-level preconditioner is often
|
||
|
called `smoother'. Different two-level preconditioners are obtained by varying the
|
||
|
choice of the smoother and of the coarse-level correction, and the
|
||
|
way they are combined \cite{dd2_96}. The same reasoning can be applied starting
|
||
|
from the coarse-level system, i.e.\ a coarse-space correction can be built
|
||
|
from this system, thus obtaining \emph{multi-level} preconditioners.
|
||
|
|
||
|
It is worth noting that optimal preconditioners do not necessarily correspond
|
||
|
to minimum execution times. Indeed, to obtain effective multi-level preconditioners
|
||
|
a tradeoff between optimality of convergence and the cost of building and applying
|
||
|
the coarse-space corrections must be achieved. The choice of the number of levels,
|
||
|
i.e.\ of the coarse-space corrections, also affects the effectiveness of the
|
||
|
preconditioners. One more goal is to get convergence rates as less sensitive
|
||
|
as possible to variations in the matrix coefficients.
|
||
|
|
||
|
Two main approaches can be used to build coarse-space corrections. The geometric approach
|
||
|
applies coarsening strategies based on the knowledge of some physical grid associated
|
||
|
to the matrix and requires the user to define grid transfer operators from the fine
|
||
|
to the coarse levels and vice versa. This may result difficult for complex geometries;
|
||
|
furthermore, suitable one-level preconditioners may be required to get efficient
|
||
|
interplay between fine and coarse levels, e.g.\ when matrices with highly varying coefficients
|
||
|
are considered. The algebraic approach builds coarse-space corrections using only matrix
|
||
|
information. It performs a fully automatic coarsening and enforces the interplay between
|
||
|
the fine and coarse levels by suitably choosing the coarse space and the coarse-to-fine
|
||
|
interpolation \cite{StubenGMD69_99}.
|
||
|
|
||
|
MLD2P4 uses a pure algebraic approach for building the sequence of coarse matrices
|
||
|
starting from the original matrix. The algebraic approach is based on the \emph{smoothed
|
||
|
aggregation} algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}. A decoupled version
|
||
|
of this algorithm is implemented, where the smoothed aggregation is applied locally
|
||
|
to each submatrix \cite{TUMINARO_TONG}. In the next two subsections we provide
|
||
|
a brief description of the multi-level Schwarz preconditioners and of the smoothed
|
||
|
aggregation technique as implemented in MLD2P4. For further details the user
|
||
|
is referred to \cite{para_04,aaecc_07,apnum_07,dd2_96}.
|
||
|
|
||
|
|
||
|
\subsection{Multi-level Schwarz Preconditioners\label{sec:multilevel}}
|
||
|
|
||
|
The Multilevel preconditioners implemented in MLD2P4 are obtained by combining
|
||
|
AS preconditioners with coarse-space corrections; therefore
|
||
|
we first provide a sketch of the AS preconditioners.
|
||
|
|
||
|
Given the linear system \Ref{system1},
|
||
|
where $A=(a_{ij}) \in \Re^{n \times n}$ is a
|
||
|
nonsingular sparse matrix with a symmetric nonzero pattern,
|
||
|
let $G=(W,E)$ be the adjacency graph of $A$, where $W=\{1, 2, \ldots, n\}$
|
||
|
and $E=\{(i,j) : a_{ij} \neq 0\}$ are the vertex set and the edge set of $G$,
|
||
|
respectively. Two vertices are called adjacent if there is an edge connecting
|
||
|
them. For any integer $\delta > 0$, a $\delta$-overlap
|
||
|
partition of $W$ can be defined recursively as follows.
|
||
|
Given a 0-overlap (or non-overlapping) partition of $W$,
|
||
|
i.e.\ a set of $m$ disjoint nonempty sets $W_i^0 \subset W$ such that
|
||
|
$\cup_{i=1}^m W_i^0 = W$, a $\delta$-overlap
|
||
|
partition of $W$ is obtained by considering the sets
|
||
|
$W_i^\delta \supset W_i^{\delta-1}$ obtained by including the vertices that
|
||
|
are adjacent to any vertex in $W_i^{\delta-1}$.
|
||
|
|
||
|
Let $n_i^\delta$ be the size of $W_i^\delta$ and $R_i^{\delta} \in
|
||
|
\Re^{n_i^\delta \times n}$ the restriction operator that maps
|
||
|
a vector $v \in \Re^n$ onto the vector $v_i^{\delta} \in \Re^{n_i^\delta}$
|
||
|
containing the components of $v$ corresponding to the vertices in
|
||
|
$W_i^\delta$. The transpose of $R_i^{\delta}$ is a
|
||
|
prolongation operator from $\Re^{n_i^\delta}$ to $\Re^n$.
|
||
|
The matrix $A_i^\delta=R_i^\delta A (R_i^\delta)^T \in
|
||
|
\Re^{n_i^\delta \times n_i^\delta}$ can be considered
|
||
|
as a restriction of $A$ corresponding to the set $W_i^{\delta}$.
|
||
|
|
||
|
The \emph{classical one-level AS} preconditioner is defined by
|
||
|
\[
|
||
|
M_{AS}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T
|
||
|
(A_i^\delta)^{-1} R_i^{\delta},
|
||
|
\]
|
||
|
where $A_i^\delta$ is assumed to be nonsingular. Its application
|
||
|
to a vector $v \in \Re^n$ within a Krylov solver requires the following
|
||
|
three steps:
|
||
|
\begin{enumerate}
|
||
|
\item restriction of $v$ as $v_i = R_i^{\delta} v$, $i=1,\ldots,m$;
|
||
|
\item solution of the linear systems $A_i^\delta w_i = v_i$,
|
||
|
$i=1,\ldots,m$;
|
||
|
\item prolongation and sum of the $w_i$'s, i.e. $w = \sum_{i=1}^m (R_i^{\delta})^T w_i$.
|
||
|
\end{enumerate}
|
||
|
Note that the linear systems at step 2 are usually solved approximately,
|
||
|
e.g.\ using incomplete LU factorizations such as ILU($p$), MILU($p$) and
|
||
|
ILU($p,t$) \cite[Chapter 10]{Saad_book}.
|
||
|
|
||
|
A variant of the classical AS preconditioner that outperforms it
|
||
|
in terms of convergence rate and of computation and communication
|
||
|
time on parallel distributed-memory computers is the so-called \emph{Restricted AS
|
||
|
(RAS)} preconditioner~\cite{CAI_SARKIS,EFSTATHIOU}. It
|
||
|
is obtained by zeroing the components of $w_i$ corresponding to the
|
||
|
overlapping vertices when applying the prolongation. Therefore,
|
||
|
RAS differs from classical AS by the prolongation operators,
|
||
|
which are substituted by $(\tilde{R}_i^0)^T \in \Re^{n_i^\delta \times n}$,
|
||
|
where $\tilde{R}_i^0$ is obtained by zeroing the rows of $R_i^\delta$
|
||
|
corresponding to the vertices in $W_i^\delta \backslash W_i^0$:
|
||
|
\[
|
||
|
M_{RAS}^{-1}= \sum_{i=1}^m (\tilde{R}_i^0)^T
|
||
|
(A_i^\delta)^{-1} R_i^{\delta}.
|
||
|
\]
|
||
|
Analogously, the AS variant called \emph{AS with Harmonic extension (ASH)}
|
||
|
is defined by
|
||
|
\[ M_{ASH}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T
|
||
|
(A_i^\delta)^{-1} \tilde{R}_i^0.
|
||
|
\]
|
||
|
We note that for $\delta=0$ the three variants of the AS preconditioner are
|
||
|
all equal to the block-Jacobi preconditioner.
|
||
|
|
||
|
As already observed, the convergence rate of the one-level Schwarz
|
||
|
preconditioned iterative solvers deteriorates as the number $m$ of partitions
|
||
|
of $W$ increases \cite{dd1_94,dd2_96}. To reduce the dependency
|
||
|
of the number of iterations on the degree of parallelism we may
|
||
|
introduce a global coupling among the overlapping partitions by defining
|
||
|
a coarse-space approximation $A_C$ of the matrix $A$.
|
||
|
In a pure algebraic setting, $A_C$ is usually built with
|
||
|
a Galerkin approach. Given a set $W_C$ of \emph{coarse vertices},
|
||
|
with size $n_C$, and a suitable restriction operator
|
||
|
$R_C \in \Re^{n_C \times n}$, $A_C$ is defined as
|
||
|
\[
|
||
|
A_C=R_C A R_C^T
|
||
|
\]
|
||
|
and the coarse-level correction matrix to be combined with a generic
|
||
|
one-level AS preconditioner $M_{1L}$ is obtained as
|
||
|
\[
|
||
|
M_{C}^{-1}= R_C^T A_C^{-1} R_C,
|
||
|
\]
|
||
|
where $A_C$ is assumed to be nonsingular. The application of $M_{C}^{-1}$
|
||
|
to a vector $v$ corresponds to a restriction, a solution and
|
||
|
a prolongation step; the solution step, involving the matrix $A_C$,
|
||
|
may be carried out also approximately.
|
||
|
|
||
|
The combination of $M_{C}$ and $M_{1L}$ may be
|
||
|
performed in either an additive or a multiplicative framework.
|
||
|
In the former case, the \emph{two-level additive} Schwarz preconditioner
|
||
|
is obtained:
|
||
|
\[
|
||
|
M_{2LA}^{-1} = M_{C}^{-1} + M_{1L}^{-1}.
|
||
|
\]
|
||
|
Applying $M_{2L-A}^{-1}$ to a vector $v$ within a Krylov solver
|
||
|
corresponds to applying $M_{C}^{-1}$
|
||
|
and $M_{1L}^{-1}$ to $v$ independently and then summing up
|
||
|
the results.
|
||
|
|
||
|
In the multiplicative case, the combination can be
|
||
|
performed by first applying the smoother $M_{1L}^{-1}$ and then
|
||
|
the coarse-level correction operator $M_{C}^{-1}$:
|
||
|
\[
|
||
|
\begin{array}{l}
|
||
|
w = M_{1L}^{-1} v, \\
|
||
|
z = w + M_{C}^{-1} (v-Aw);
|
||
|
\end{array}
|
||
|
\]
|
||
|
this corresponds to the following \emph{two-level hybrid pre-smoothed}
|
||
|
Schwarz preconditioner:
|
||
|
\[
|
||
|
M_{2LH-PRE}^{-1} = M_{C}^{-1} + \left( I - M_{C}^{-1}A \right) M_{1L}^{-1}.
|
||
|
\]
|
||
|
On the other hand, by applying the smoother after the coarse-level correction,
|
||
|
i.e.\ by computing
|
||
|
\[
|
||
|
\begin{array}{l}
|
||
|
w = M_{C}^{-1} v , \\
|
||
|
z = w + M_{1L}^{-1} (v-Aw) ,
|
||
|
\end{array}
|
||
|
\]
|
||
|
the \emph{two-level hybrid post-smoothed}
|
||
|
Schwarz preconditioner is obtained:
|
||
|
\[
|
||
|
M_{2LH-POST}^{-1} = M_{1L}^{-1} + \left( I - M_{1L}^{-1}A \right) M_{C}^{-1}.
|
||
|
\]
|
||
|
One more variant of two-level hybrid preconditioner is obtained by applying
|
||
|
the smoother before and after the coarse-level correction. In this case, the
|
||
|
preconditioner is symmetric if $A$, $M_{1L}$ and $M_{C}$ are symmetric.
|
||
|
|
||
|
As previously noted, on parallel computers the number of submatrices usually matches
|
||
|
the number of available processors. When the size of the system to be preconditioned
|
||
|
is very large, the use of many processors, i.e.\ of many small submatrices, often
|
||
|
leads to a large coarse-level system, whose solution may be computationally expensive.
|
||
|
On the other hand, the use of few processors often leads to local sumatrices that
|
||
|
are too expensive to be processed on single processors, because of memory and/or
|
||
|
computing requirements. Therefore, it seems natural to use a recursive approach,
|
||
|
in which the coarse-level correction is re-applied starting from the current
|
||
|
coarse-level system. The corresponding preconditioners, called \emph{multi-level}
|
||
|
preconditioners, can significantly reduce the computational cost of preconditioning
|
||
|
with respect to the two-level case (see \cite[Chapter 3]{dd2_96}).
|
||
|
Additive and hybrid multilevel preconditioners
|
||
|
are obtained as direct extensions of the two-level counterparts.
|
||
|
For a detailed descrition of them, the reader is
|
||
|
referred to \cite[Chapter 3]{dd2_96}.
|
||
|
The algorithm for the application of a multi-level hybrid
|
||
|
post-smoothed preconditioner $M$ to a vector $v$, i.e.\ for the
|
||
|
computation of $w=M^{-1}v$, is reported, for
|
||
|
example, in Figure~\ref{fig:mlhpost_alg}. Here the number of levels
|
||
|
is denoted by $nlev$ and the levels are numbered in increasing order starting
|
||
|
from the finest one, i.e.\ the finest level is level 1; the coarse matrix
|
||
|
and the corresponding basic preconditioner at each level $l$ are denoted by $A_l$ and
|
||
|
$M_l$, respectively, with $A_1=A$.
|
||
|
%
|
||
|
\begin{figure}[t]
|
||
|
\begin{center}
|
||
|
\framebox{
|
||
|
\begin{minipage}{.85\textwidth} {\small
|
||
|
\begin{tabbing}
|
||
|
\quad \=\quad \=\quad \=\quad \\[-1mm]
|
||
|
%
|
||
|
%! assign the finest matrix\\
|
||
|
%$A_1 \leftarrow A$;\\[1mm]
|
||
|
%! define the number of levels $nlev$ \\[1mm]
|
||
|
%! define $nlev-1$ prolongators\\
|
||
|
%$R_l^T, l=2, \ldots, nlev$;\\[1mm]
|
||
|
%! define $nlev-1$ coarser matrices\\
|
||
|
%$A_l \leftarrow R_lA_{l-1}R_l^T, \; l=2, \ldots, nlev$;\\[1mm]
|
||
|
%! define the $nlev-1$ basic Schwarz preconditioners\\
|
||
|
%$M_l$, basic preconditioner for $A_l \; l=1, \ldots, nlev-1$;\\[1mm]
|
||
|
%$! assign a vector $v$\\
|
||
|
%
|
||
|
$v_1 = v$; \\[2mm]
|
||
|
\textbf{for $l=2, nlev$ do}\\[1mm]
|
||
|
\> ! transfer $v_{l-1}$ to the next coarser level\\
|
||
|
\> $v_l = R_lv_{l-1}$ \\[1mm]
|
||
|
\textbf{endfor} \\[2mm]
|
||
|
! apply the coarsest-level correction\\[1mm]
|
||
|
$y_{nlev} = A_{nlev}^{-1} v_{nlev}$\\[2mm]
|
||
|
\textbf{for $l=nlev -1 , 1, -1$ do}\\[1mm]
|
||
|
\> ! transfer $y_{l+1}$ to the next finer level\\
|
||
|
\> $y_l = R_{l+1}^T y_{l+1}$;\\[1mm]
|
||
|
\> ! compute the residual at the current level\\
|
||
|
\> $r_l = v_l-A_l^{-1} y_l$;\\[1mm]
|
||
|
\> ! apply the basic Schwarz preconditioner to the residual\\
|
||
|
\> $r_l = M_l^{-1} r_l$\\[1mm]
|
||
|
\> ! update $y_l$\\
|
||
|
\> $y_l = y_l+r_l$\\
|
||
|
\textbf{endfor} \\[1mm]
|
||
|
$w = y_1$;
|
||
|
\end{tabbing}
|
||
|
}
|
||
|
\end{minipage}
|
||
|
}
|
||
|
\caption{Application of the multi-level hybrid post-smoothed preconditioner.\label{fig:mlhpost_alg}}
|
||
|
\end{center}
|
||
|
\end{figure}
|
||
|
%
|
||
|
|
||
|
|
||
|
\subsection{Smoothed Aggregation\label{sec:aggregation}}
|
||
|
|
||
|
In order to define the restriction operator $R_C$, which is used to compute
|
||
|
the coarse-level matrix $A_C$, MLD2P4 uses the \emph{smoothed aggregation}
|
||
|
algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.
|
||
|
The basic idea of this algorithm is to build a coarse set of vertices
|
||
|
$W_C$ by suitably grouping the vertices of $W$ into disjoint subsets
|
||
|
(aggregates), and to define the coarse-to-fine space transfer operator $R_C^T$ by
|
||
|
applying a suitable smoother to a simple piecewise constant
|
||
|
prolongation operator, to improve the quality of the coarse-space correction.
|
||
|
|
||
|
Three main steps can be identified in the smoothed aggregation procedure:
|
||
|
\begin{enumerate}
|
||
|
\item coarsening of the vertex set $W$, to obtain $W_C$;
|
||
|
\item construction of the prolongator $R_C^T$;
|
||
|
\item application of $R_C$ and $R_C^T$ to build $A_C$.
|
||
|
\end{enumerate}
|
||
|
%\textbf{NOTA: Controllare cosa fa trilinos dopo il primo passo.}
|
||
|
|
||
|
To perform the coarsening step, we have implemented the aggregation algorithm sketched
|
||
|
in \cite{apnum_07}. According to \cite{VANEK_MANDEL_BREZINA}, a modification of
|
||
|
this algorithm has been actually considered,
|
||
|
in which each aggregate $N_r$ is made of vertices of $W$ that are \emph{strongly coupled}
|
||
|
to a certain root vertex $r \in W$, i.e.\
|
||
|
\[ N_r = \left\{s \in W: |a_{rs}| > \theta \sqrt{|a_{rr}a_{ss}|} \right\}
|
||
|
\cup \left\{ r \right\} ,
|
||
|
\]
|
||
|
for a given $\theta \in [0,1]$.
|
||
|
Since this algorithm has a sequential nature, a \emph{decoupled} version of
|
||
|
it has been chosen, where each processor $i$ independently applies the algorithm to
|
||
|
the set of vertices $W_i^0$ assigned to it in the initial data distribution. This
|
||
|
version is embarrassingly parallel, since it does not require any data communication.
|
||
|
On the other hand, it may produce non-uniform aggregates near boundary vertices,
|
||
|
i.e.\ near vertices adjacent to vertices in other processors, and is strongly
|
||
|
dependent on the number of processors and on the initial partitioning of the matrix $A$.
|
||
|
Nevertheless, this algorithm has been chosen for the implementation in MLD2P4,
|
||
|
since it has been shown to produce good results in practice
|
||
|
\cite{aaecc_07,apnum_07,TUMINARO_TONG}.
|
||
|
|
||
|
The prolongator $P_C=R_C^T$ is built starting from a \emph{tentative prolongator}
|
||
|
$P \in \Re^{n \times n_C}$, defined as
|
||
|
\begin{equation}
|
||
|
P=(p_{ij}), \quad p_{ij}=
|
||
|
\left\{ \begin{array}{ll}
|
||
|
1 & \quad \mbox{if} \; i \in V^j_C \\
|
||
|
0 & \quad \mbox{otherwise}
|
||
|
\end{array} \right. .
|
||
|
\label{eq:tent_prol}
|
||
|
\end{equation}
|
||
|
$P_C$ is obtained by
|
||
|
applying to $P$ a smoother $S \in \Re^{n \times n}$:
|
||
|
\begin{equation}
|
||
|
P_C = S P,
|
||
|
\label{eq:smoothed_prol}
|
||
|
\end{equation}
|
||
|
in order to remove oscillatory components from the range of the prolongator
|
||
|
and hence to improve the convergence properties of the multi-level
|
||
|
Schwarz method \cite{BREZINA_VANEK,StubenGMD69_99}.
|
||
|
A simple choice for $S$ is the damped Jacobi smoother:
|
||
|
\begin{equation}
|
||
|
S = I - \omega D^{-1} A ,
|
||
|
\label{eq:jac_smoother}
|
||
|
\end{equation}
|
||
|
where the value of $\omega$ can be chosen
|
||
|
using some estimate of the spectral radius of $D^{-1}A$ \cite{BREZINA_VANEK}.
|
||
|
%
|
||
|
%\textbf{NOTA: filtering di $A$ nello smoothing, da implementare?}
|
||
|
%
|
||
|
|
||
|
%%% Local Variables:
|
||
|
%%% mode: latex
|
||
|
%%% TeX-master: "userguide"
|
||
|
%%% End:
|