TOPFILE = userguide.tex
SECFILE = title.tex intro.tex methods.tex precs.tex
SECFILE = title.tex abstract.tex overview.tex conventions.tex distribution.tex \
building.tex gettingstarted.tex highlevelview.tex advanced.tex errors.tex \
listofroutines.tex bibliography.tex
FIGDIR = figures

\emph{MLD2P4 (Multi-Level Domain Decomposition Parallel Preconditioners Package based on
PSBLAS}) is a package of parallel algebraic multi-level preconditioners.
It implements various versions of one-level additive and of multi-level additive
and hybrid Schwarz algorithms. In the multi-level case, a purely algebraic approach
is applied to generate coarse-level corrections, so that no geometric background is needed
concerning the matrix to be preconditioned. The matrix is required to be square, real or complex, with a symmetric sparsity pattern
con $(A+A^T)/2$?}.
MLD2P4 has been designed to provide scalable and easy-to-use preconditioners in the
context of the PSBLAS (Parallel Sparse Basic Linear Algebra Subprograms)
computational framework and can be used in conjuction with the Krylov solvers
available in this framework. MLD2P4 enables the user to easily specify different aspects
of a generic algebraic multilevel Schwarz preconditioner, thus allowing to search
for the ``best'' preconditioner for the problem at hand. The package has been designed
employing object-oriented techniques, using Fortran 95 and MPI, with interfaces to
additional external libraries such as UMFPACK, SuperLU and SuperLU\_Dist, that
can be exploited in building multi-level preconditioners.

\section{Advanced Use}\label{sec:advanced}
- MLD2P4 software architecture \\
- preconditioner data structure (descrizione "dettagliata") + possibilita' di settare singolarmente
i vari livelli (possibilita' accennata solamente nella precedente descrizione di precset) \\
- descrizione routine medium level (con introduzione sulle potenzialita' di ampliamento (?), offerte
da queto strato software) \\
\section{Multi-level Domain Decomposition Background\label{sec:background}}
\emph{Domain Decomposition} (DD) preconditioners, coupled with Krylov iterative
solvers, are widely used in the parallel solution of large and sparse linear systems.
These preconditioners are based on the divide and conquer technique: the matrix
to be preconditioned is divided into submatrices, a ``local linear system''
involving each submatrix is (approximately) solved, and the local solutions are used
to build a preconditioner for the whole original matrix. This process
often corresponds to dividing a physical domain associated to the original matrix
into subdomains, e.g. in a PDE discretization, to (approximately) solving the
subproblems corresponding to the subdomains and to building an approximate
solution of the original problem from the local solutions
\emph{Additive Schwarz} preconditioners are DD preconditioners using overlapping
submatrices, i.e.\ with some common rows, to couple the local information
related to the submatrices (see, e.g., \cite{dd2_96}).
The main motivations for choosing Additive Schwarz preconditioners are their
intrinsic parallelism and good \textbf{(dire good e' un po' "`forte"', dato che
subito dopo diciamo che la convergenza dipende dal numero di sottomatrici)}
convergence properties. A drawback of these
preconditioners is that the number of iterations of the preconditioned solvers
generally grows with the number of submatrices. This may be a serious limitation
on parallel computers, since the number of submatrices usually matches the number
of available processors. Optimal convergence rates, i.e.\ iteration numbers
independent of the number of submatrices, can be obtained by correcting the
preconditioner through a suitable approximation of the original linear system
in a coarse space, which globally couples the information related to the single
\emph{Two-level Schwarz} preconditioners are obtained
by combining basic (one-level) Schwarz preconditioners with coarse-level
corrections. In this context, the one-level preconditioner is often
called smoother. Different two-level preconditioners are obtained by varying the
choice of the smoother, of the coarse-level correction and the
way they are combined \cite{dd2_96}. The same reasoning can be applied starting
from the coarse-level system, i.e.\ a coarse-space correction can be built
from this system, thus obtaining \emph{multi-level} preconditioners.
It is worth noting that optimal preconditioners do not necessarily correspond
to minimum execution times. Indeed, to obtain effective multilevel preconditioners
a tradeoff between optimality of convergence and the cost of building and applying
the coarse-space corrections must be achieved. The choice of the number of levels,
i.e.\ of the coarse-space corrections, also affects the effectiveness of the
preconditioners. One more goal is to get convergence rates as less sensitive
as possible to variations in the matrix coefficients.
Two main approaches can be used to build coarse-space corrections. The geometric approach
applies coarsening strategies based on the knowledge of some physical grid associated
to the matrix and requires the user to define grid transfer operators from the fine
to the coarse levels and vice versa. This may result difficult for complex geometries;
furthermore, suitable one-level preconditioners may be required to get efficient
interplay between fine and coarse levels, e.g.\ when matrices with highly varying coefficients
are considered. The algebraic approach builds coarse-space corrections using only matrix
information. It performs a fully automatic coarsening and enforces the interplay between
the fine and coarse levels by suitably choosing the coarse space and the coarse-to-fine
interpolation \cite{StubenGMD69_99}.
MLD2P4 uses a pure algebraic approach for building the sequence of coarse matrices
starting from the original matrix. The algebraic approach is based on the \emph{smoothed
aggregation} algorithm \cite{Brezina_Vanek_,Vanek_Mandel_Brezina_}. A decoupled version
of this algorithm is implemented, where the smoothed aggregation is applied locally
to each submatrix \cite{Tuminaro_Tong_00}. In the next two subsections we provide
a brief description of the multi-level Schwarz preconditioners and on the smoothed
aggregation technique as implemented in MLD2P4. For further details the user
is referred to \cite{para_04,apnum_07,aaecc_07,dd2_96}.
\subsection{Multi-level Schwarz Preconditioners\label{sec:multilevel}}
The Multilevel preconditioners implemented in MLD2P4 are obtained by combining
Additive Schwarz preconditioners with coarse-space corrections; therefore
we first provide a sketch of the Additive Schwarz preconditioners.
Given a linear system
\[ Ax=b, \]
where $A=(a_{ij}) \in \Re^{n \times n}$ is a
nonsingular sparse matrix with a symmetric non-zero pattern,
let $G=(W,E)$ be the adjacency graph of $A$, where $W=\{1, 2, \ldots, n\}$
and $E=\{(i,j) : a_{ij} \neq 0\}$ are the vertex set and the edge set of $G$,
respectively. Two vertices are called adjacent if there is an edge connecting
them. For any integer $\delta > 0$, a $\delta$-overlap
partition of $W$ can be defined recursively as follows.
Given a 0-overlap (or non-overlapping) partition of $W$,
i.e.\ a set of $m$ disjoint nonempty sets $W_i^0 \subset W$ such that
$\cup_{i=1}^m W_i^0 = W$, a $\delta$-overlap
partition of $W$ is obtained by considering the sets
$W_i^\delta \supset W_i^{\delta-1}$, obtained by including the vertices that
are adjacent to any vertex in $W_i^{\delta-1}$.
Let $n_i^\delta$ be the size of $W_i^\delta$ and $R_i^{\delta} \in
\Re^{n_i^\delta \times n}$ the restriction operator that maps
a vector $v \in \Re^n$ onto the vector $v_i^{\delta} \in \Re^{n_i^\delta}$
containing the components of $v$ corresponding to the vertices in
$W_i^\delta$. The transpose of $R_i^{\delta}$ is a
prolongation operator from $\Re^{n_i^\delta}$ to $\Re^n$.
The matrix $A_i^\delta=R_i^\delta A (R_i^\delta)^T \in
\Re^{n_i^\delta \times n_i^\delta}$ can be considered
as a restriction of $A$ corresponding to the set $W_i^{\delta}$.
The \emph{classical one-level AS} preconditioner is defined by
M_{AS}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T
(A_i^\delta)^{-1} R_i^{\delta},
where $A_i^\delta$ is assumed to be nonsingular. Its application
to a vector $v \in \Re^n$ within a Krylov solver requires the following
three steps:
\item restriction of $v$ as $v_i = R_i^{\delta} v$, $i=1,\ldots,m$;
\item (approximate) solution of the linear systems $A_i^\delta w_i = v_i$,
\item prolongation and sum of the $w_i$'s, i.e. $w = \sum_{i=1}^m (R_i^{\delta})^T w_i$.
A variant of the classical AS preconditioner that outperforms it
in terms of both convergence rate and of computation and communication
time on parallel distributed-memory computers is the so-called \emph{Restricted AS
(RAS)} preconditioner~\cite{Cai_Sarkis,Efstathiou_Gander}. It
is obtained by zeroing the components of $w_i$ corresponding to the
overlapping vertices when applying the prolongation. Therefore,
RAS differs from classical AS by the prolongation operator $(R_i^{\delta})^T$,
which is substituted by $(\tilde{R}_i^0)^T \in \Re^{n_i^\delta \times n}$,
where $\tilde{R}_i^0$ obtained by zeroing the rows of $R_i^\delta$
corresponding to the vertices in $W_i^\delta \backslash W_i^0$:
M_{RAS}^{-1}= \sum_{i=1}^m (\tilde{R}_i^0)^T
(A_i^\delta)^{-1} R_i^{\delta}.
Analogously, the AS variant called \emph{AS with Harmonic extension (ASH)}
is defined by
\[ M_{ASH}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T
(A_i^\delta)^{-1} \tilde{R}_i^0.
We note that for $\delta=0$ the three variants of the AS preconditioner are
all equal to the block-Jacobi preconditioner.
As already observed, the convergence rate of the one-level Schwarz
preconditioned iterative solvers deteriorates as the number $m$ of partitions
of $W$ increases \cite{dd1_94,dd2_96}. To reduce the dependency
of the number of iterations on the degree of parallelism we may
introduce a global coupling among the overlapping partitions by defining
a coarse-space approximation $A_C$ of the matrix $A$.
In a pure algebraic setting, $A_C$ is usually built with
a Galerkin approach. Given a set $W_C$ of \emph{coarse vertices},
with size $n_C$, and a suitable restriction operator
$R_C \in \Re^{n_C \times n}$, $A_C$ is defined as
and the coarse-level correction matrix to be combined with a generic
one-level AS preconditioner $M_{1L}$ is obtained as
M_{C}^{-1}= R_C^T A_C^{-1} R_C,
where $A_C$ is assumed to be nonsingular. The application of $M_{C}^{-1}$
to a vector $v$ corresponds to a restriction, a solution and
a prolongation step; the solution step, involving the matrix $A_C$,
may be carried out also approximately.
The combination of $M_{C}$ and $M_{1L}$ may be
performed in either an additive or a multiplicative framework.
In the former case, the \emph{two-level additive} Schwarz preconditioner
is obtained:
M_{2LA}^{-1} = M_{C}^{-1} + M_{1L}^{-1}.
Applying $M_{2L-A}^{-1}$ to a vector $v$ within a Krylov solver
corresponds to applying $M_{C}^{-1}$
and $M_{1L}^{-1}$ to $v$ independently and then summing up
the results.
In the multiplicative case, the combination can be
performed by first applying the smoother $M_{1L}^{-1}$ and then
the coarse-level correction operator $M_{C}^{-1}$:
w = M_{1L}^{-1} v, \\
z = w + M_{C}^{-1} (v-Aw);
this corresponds to the following \emph{two-level hybrid pre-smoothed}
Schwarz preconditioner:
M_{2LH-PRE}^{-1} = M_{C}^{-1} + \left( I - M_{C}^{-1}A \right) M_{1L}^{-1}.
On the other hand, by applying the smoother after the coarse-level correction,
i.e.\ by computing
w = M_{C}^{-1} v , \\
z = w + M_{1L}^{-1} (v-Aw) ,
the \emph{two-level hybrid post-smoothed}
Schwarz preconditioner is obtained:
M_{2LH-POST}^{-1} = M_{1L}^{-1} + \left( I - M_{1L}^{-1}A \right) M_{C}^{-1}.
One more variant of two-level hybrid preconditioner is obtained by applying
the smoother before and after the coarse-level correction. In this case, the
preconditioner is symmetric if $A$, $M_{1L}$ and $M_{C}$ are symmetric.
As previously noted, on parallel computers the number of sumatrices usually matches
the number of available processors. When the size of the system to be preconditioned
is very large, the use of many proccessors, i.e.\ of many small submatrices, often
leads to a large coarse-level system, whose solution may be computationally expensive.
On the other hand, the use of few processors often leads to local sumatrices that
are too expensive to be processed on single processors, because of memory and/or
computing requirements. Therefore, it seems natural to use a recursive approach,
in which the coarse-level correction is re-applied starting from the current
coarse-level system. The corresponding preconditioners are called \emph{multi-level}.
One more reason for the multi-level approach is that it may significantly
reduce the computational cost of preconditioning with respect to the two-level case
(see \cite[Chapter 3]{dd2_96}). Additive and hybrid multilevel preconditioners
are obtained as direct extensions of the two-level counterparts. Other combinations
of the smoothers and coarse-level corrections are possible, leading to variants
of the previous algorithms. For a detailed descrition of them, the reader is
referred to \cite[Chapter 3]{dd2_96}.
of the previous algorithms. For a detailed descrition of them, the reader is
referred to \cite[Chapter 3]{dd2_96}.
di un precondizionatore multilevel, ad esempio quello ibrido con pre-smoothing, sul tipo
della descrizione in figura 1 della guida di Trilinos ML 4.0. CHE NE PENSATE?}
\subsection{Smoothed Aggregation\label{sec:aggregation}}
To define the restriction operator $R_C$, which is used to compute
the coarse-level matrix $A_C$, MLD2P4 uses the \emph{smoothed aggregation}
algorithm described in \cite{Brezina_Vanek_,Vanek_Mandel_Brezina_}.
The basic idea of this algorithm is to build a coarse set of vertices
$W_C$ by suitably grouping the vertices of $W$ into disjoint subsets
(aggregates), and to define the coarse-to-fine space transfer operator $R_C^T$ by
applying a suitable smoother to a simple piecewise constant
prolongation operator, to improve the quality of the coarse-space correction.
Three main steps can be identified in the smoothed aggregation procedure:
\item coarsening of the vertex set $W$, to obtain $W_C$;
\item construction of the prolongator $R_C^T$;
\item application of $R_C$ and $R_C^T$ to build $A_C$.
To perform the coarsening step, we have implemented the aggregation algorithm sketched
in \cite{apnum_07}. According to \cite{brezina_vanek}, a modification of this algorithm
has been actually considered,
in which each aggregate $N_r$ is made of vertices of $W$ that are \emph{strongly coupled}
to a certain root vertex $r \in W$, i.e.\
\[ N_r = \left\{s \in W: |a_{rs}| \geq \theta \sqrt{|a_{rr}a_{ss}|} \right\} \]
for a given $\theta \in [0,1]$.
Since the previous algorithm has a sequential nature, a \emph{decoupled} version of
it has been chosen, where each processor $i$ independently applies the algorithm to
the set of vertices $W_i^0$ assigned to it in the initial data distribution. This
version is embarrassingly parallel, since it does not require any data communication.
On the other hand, it may produce non-uniform aggregates near boundary vertices,
i.e.\ near vertices adjacent to vertices in other processors, and is strongly
dependent on the number of processors and on the initial partitioning of the matrix $A$.
Nevertheless, this algorithm has been chosen for the implementation in MLD2P4,
since it has been shown to produce good results in practice \cite{Tuminaro_Tong_00}.
The prolongator $P_C=R_C^T$ is built starting from a \emph{tentative prolongator}
$P \in \Re^{n \times n_C}$, defined as
P=(p_{ij}), \quad p_{ij}=
\left\{ \begin{array}{ll}
1 & \quad \mbox{if} \; i \in V^j_C \\
0 & \quad \mbox{otherwise}
\end{array} \right. .
$P_C$ is obtained by
applying to $P$ a smoother $S \in \Re^{n \times n}$:
P_C = S P,
in order to remove oscillatory components from the range of the prolongator
and hence to improve the convergence properties of the multi-level
Schwarz method \cite{Brezina_Vanek_,StubenGMD69_99}.
A simple choice for $S$ is the damped Jacobi smoother:
S = I - \omega D^{-1} A ,
where the value of $\omega$ can be chosen
using some estimate of the spectral radius of $D^{-1}A$ \cite{Brezina_Vanek}.
using some estimate of the spectral radius of $D^{-1}A$ \cite{Brezina_Vanek}.
Bella, G., Filippone, S., De Maio, A., Testa, M.:
A Simulation Model for Forest Fires.
In: Dongarra, J., Madsen, K., Wasniewski, J. (eds.):
Proceedings of PARA~04 Workshop on State of the Art
in Scientific Computing. Lecture Notes in Computer Science, 3732. Berlin:
Springer, 2005
\bibitem{aaecc_07} A. Buttari, D. di Serafino, P. D'Ambra, S. Filippone,\newblock
2LEV-D2P4: a package of high-performance preconditioners,\newblock
Applicable Algebra in Engineering, Communications and Computing,
Volume 18, Number 3, May, 2007, pp. 223-239
%Published online: 13 February 2007, {\tt}
\bibitem{apnum_07} P. D'Ambra, S. Filippone, D. Di Serafino\newblock
On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners
Applied Numerical Mathematics, Elsevier Science,
Volume 57, Issues 11-12, November-December 2007, Pages 1181-1196.
%published online 3 February 2007, {\tt
%% \bibitem{DOUGLAS}
%% R.E.~Bank and C.C.~Douglas,
%% {\em SMMP: Sparse Matrix Multiplication Package},
%% Advances in Computational Mathematics, 1993, 1, 127-137.
%% (See also {\tt})
A.~Buttari, P.~D'Ambra, D.~di Serafino and S.~Filippone,
{\em Extending PSBLAS to Build Parallel Schwarz Preconditioners},
in , J.~Dongarra, K.~Madsen, J.~Wasniewski, editors,
Proceedings of PARA~04 Workshop on State of the Art
in Scientific Computing, pp.~593--602, Lecture Notes in Computer Science,
Springer, 2005.
%% \bibitem{CAI_SAAD}
%% X.~C.~Cai and Y.~Saad,
%% {\em Overlapping Domain Decomposition Algorithms for General Sparse Matrices},
%% Numerical Linear Algebra with Applications, 3(3), pp.~221--237, 1996.
%% %
%% \bibitem{CAI_SARKIS}
%% X.C.~Cai and M.~Sarkis,
%% {\em A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems},
%% SIAM Journal on Scientific Computing, 21(2), pp.~792--797, 1999.
X.C.~Cai and O.~B.~Widlund,
{\em Domain Decomposition Algorithms for Indefinite Elliptic Problems},
SIAM Journal on Scientific and Statistical Computing, 13(1), pp.~243--258, 1992.
T.~Chan and T.~Mathew,
{\em Domain Decomposition Algorithms},
in A.~Iserles, editor, Acta Numerica 1994, pp.~61--143, 1994.
Cambridge University Press.
%% %
%% \bibitem{UMFPACK}
%% T.A.~Davis,
%% {\em Algorithm 832: UMFPACK - an Unsymmetric-pattern Multifrontal
%% Method with a Column Pre-ordering Strategy},
%% ACM Transactions on Mathematical Software, 30, pp.~196--199, 2004.
%% (See also {\tt})
%% %
%% \bibitem{SUPERLU}
%% J.W.~Demmel, S.C.~Eisenstat, J.R.~Gilbert, X.S.~Li and J.W.H.~Liu,
%% A supernodal approach to sparse partial pivoting,
%% SIAM Journal on Matrix Analysis and Applications, 20(3), pp.~720--755, 1999.
J.~J.~Dongarra and R.~C.~Whaley,
{\em A User's Guide to the BLACS v.~1.1},
Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of
Tennessee, March 1995 (updated May 1997).
I.~Duff, M.~Marrone, G.~Radicati and C.~Vittoli,
{\em Level 3 Basic Linear Algebra Subprograms for Sparse Matrices:
a User Level Interface},
ACM Transactions on Mathematical Software, 23(3), pp.~379--401, 1997.
I.~Duff, M.~Heroux and R.~Pozo,
{\em An Overview of the Sparse Basic Linear
Algebra Subprograms: the New Standard from the BLAS Technical Forum},
ACM Transactions on Mathematical Software, 28(2), pp.~239--267, 2002.
S.~Filippone and M.~Colajanni,
{\em PSBLAS: A Library for Parallel Linear Algebra
Computation on Sparse Matrices},
ACM Transactions on Mathematical Software, 26(4), pp.~527--550, 2000.
S.~Filippone, P.~D'Ambra, M.~Colajanni,
{\em Using a Parallel Library of Sparse Linear Algebra in a Fluid Dynamics
Applications Code on Linux Clusters},
in G.~Joubert, A.~Murli, F.~Peters, M.~Vanneschi, editors,
Parallel Computing - Advances \& Current Issues,
pp.~441--448, Imperial College Press, 2002.
Karypis, G. and Kumar, V.,
{\em {METIS}: Unstructured Graph Partitioning and Sparse Matrix
Ordering System}.
Minneapolis, MN 55455: University of Minnesota, Department of
Computer Science, 1995.
Internet Address: {\verb||}.
Lawson, C., Hanson, R., Kincaid, D. and Krogh, F.,
Basic {L}inear {A}lgebra {S}ubprograms for {F}ortran usage,
{ACM Trans. Math. Softw.} vol.~{5}, 38--329, 1979.
{Machiels, L. and Deville, M.}
{\em Fortran 90: An entry to object-oriented programming for the solution
of partial differential equations.}
{ACM Trans. Math. Softw.} vol.~{23}, 32--49.
{Metcalf, M., Reid, J. and Cohen, M.}
{\em Fortran 95/2003 explained.}
{Oxford University Press}, 2004.
B.~Smith, P.~Bjorstad and W.~Gropp,
{\em Domain Decomposition: Parallel Multilevel Methods for Elliptic
Partial Differential Equations},
Cambridge University Press, 1996.
M.~Snir, S.~Otto, S.~Huss-Lederman, D.~Walker and J.~Dongarra,
{\em MPI: The Complete Reference. Volume 1 - The MPI Core}, second edition,
MIT Press, 1998.
M.~Brezina and P.~Van{\v e}k,
{\em A Black-Box Iterative Solver Based on a Two-Level Schwarz Method},
Computing, 1999, 63, 233-263.
P.~Van{\v e}k, J.~Mandel and M.~Brezina,
{\em Algebraic Multigrid by Smoothed Aggregation for Second and Fourth Order Elliptic Problems},
Computing, 1996, 56, 179-196.

\section{Configuring and Building MLD2P4\label{sec:configuring}}
- uso di GNU autoconf e automake \\
- software di base necessario (MPI, BLACS, BLAS, PSBLAS - specificare versioni) \\
- software opzionale (UMFPACK, SuperLU, SuperLUdist - specificare versioni e opzioni di configure) \\
- sistemi operativi e compilatori su cui MLD2P4 e' stato costruito con successo \\
- sono previste opzioni di configurazione per il debugging o per il profiling? \\
- albero delle directory \\

\section{Notational Conventions\label{sec:conventions}}
- caratteri tipografici usati nella guida (vedi guida ML recente e guida Aztec) \\
- convenzioni sui nomi di routine (differenza tra high-level e medium-level),
strutture dati,\\
moduli, costanti, etc. (vedi guida psblas) \\
- versione reale e complessa\\

\section{Code Distribution\label{sec:distribution}}
The MLD2P4 is freely distributable under the following copyright
MLD2P4 version 1.0
MultiLevel Domain Decomposition Parallel Preconditioners Package
based on PSBLAS (Parallel Sparse BLAS version 2.3)
(C) Copyright 2008
Salvatore Filippone University of Rome Tor Vergata
Alfredo Buttari University of Rome Tor Vergata
Pasqua D'Ambra ICAR-CNR, Naples
Daniela di Serafino Second University of Naples
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions, and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. The name of the MLD2P4 group or the names of its contributors may
not be used to endorse or promote products derived from this
software without specific written permission.

\section{Error Handling}\label{sec:errors}
Error handling
- Breve descrizione con rinvio alla guida di PSBLAS
\section{Getting Started\label{sec:started}}
We describe the basics for building and applying MLD2P4 one-level and multi-level
Schwarz preconditioners with the Krylov solvers included in PSBLAS \cite{}.
The following five steps are required:
\item \emph{Allocate and initialize the preconditioner data structure, according to
a preconditioner type chosen by the user}. This is performed by the routine
\verb|mld_precinit|, which also sets a default preconditioner for each preconditioner
type selected by the user. The default preconditioner associated to each preconditioner
type is listed in Table~\ref{tab:precinit}; the string used by \verb|mld_precinit|
to identify each preconditioner type is also given. The preconditioner data structure is
the derived data type \verb|mld_prec_type|, which is accessed to the user only
through the MLD2P4 routines.
\item \emph{Choose a specific variant of the selected preconditioner type, by setting
the preconditioner parameters.} This is performed by the routine \verb|mld_precset|.
A few examples concerning the use of \verb|mld_precset| are given in
Sections~\ref{sec:example1} and \ref{sec:example1}; a complete list of all the
preconditioner parameters and their allowed values is provided in
\item \emph{Build the preconditioner for a given matrix.} This is performed by
the routine \verb|mld_precbld|.
\item \emph{Apply the preconditioner at each iteration of a Krylov solver.}
This is performed by the routine \verb|mld_precaply|. When using the PSBLAS Krylov solvers,
this step is completely transparent to the user, since \verb|mld_precaply| is called
by the PSBLAS routine implementing the Krylov solver (\verb|psb_krylov|).
\item \emph{Deallocate the preconditioner data structure}. This is performed by
the routine \verb|mld_precfree|. This step is complementary to step 1 and should
be performed when the preconditioner is no more used.
A detailed description of the above routines is given in Section~\ref{sec:highlevel}.
Note that the Fortran 95 module \verb|mld_prec_mod| must be used in the program
calling the MLD2P4 routines. Furthermore, to apply MLD2P4 with the Krylov solvers
from PSBLAS, the module \verb|psb_krylov_mod| must be used too.
Two simple example programs showing the (basic) use of MLD2P4 are reported in
Type & String & Default preconditioner \\ \hline
No preconditioner &'NOPREC'& (Considered only to use the PSBLAS
Krylov solvers with no preconditioner.) \\
Diagonal & 'DIAG' & --- \\
Block Jacobi & 'BJAC' & ILU(0) on the local blocks.\\
Additive Schwarz & 'AS' & Restricted Additive Schwarz (RAS),
with overlap 1 and ILU(0) on the local blocks. \\
Multilevel &'ML' & Multi-level hybrid preconditioner (additive on the
same level and multiplicative through the levels),
with post-smoothing only. Number of levels: 2;
post-smoother: block-Jacobi preconditioner, with ILU(0)
on the local blocks; coarsest matrix: distributed among the
processors; corase-level solver: 4 sweeps of the
block-Jacobi solver, with ILU(0) on the blocks. \\
\caption{Preconditioner types and default choices.\label{tab:precinit}}
The simple code reported below shows how to set and apply the MLD2P4 default multi-level
preconditioned, i.e.\ the two-level hybrid post-smoothed Schwarz preconditioner, using block-Jacobi with ILU(0) on the blocks as basic preconditioner,
a coarse matrix distributed among the processors, and four block-Jacobi sweeps with ILU(0) on the blocks as approximate coarse-level solver. The choice of this preconditioner is made
by simply specifying \verb|'ML'| as second argument of \verb|mld_precinit|
(a call to \verb|mld_precset| is not needed).
The preconditioner is applied within the BiCGSTAB solver provided by PSBLAS.
The part of the code concerning the
reading and assembling of the sparse matrix and the right-hand side vector, performed
through the PSBLAS routines for sparse matrix and vector management, is not reported
here for brevity. Other statements concerning the use of PSBLAS are neglected too.
The complete code can be found in the example program file \verb|example_2lev_default.f90|
in the directory \textbf{XXXXXX (SPECIFICARE).} Note that the modules \verb|psb_base_mod|
and \verb|psb_util_mod| at the beginning of the code are required by PSBLAS.
For details on the use of the PSBLAS routines, see the PSBLAS User's Guide \cite{}.
use psb_base_mod
use psb_util_mod
use mld_prec_mod
use psb_krylov_mod
... ...
! sparse matrix
type(psb_dspmat_type) :: A
! sparse matrix descriptor
type(psb_desc_type) :: DESC_A
! preconditioner
type(mld_prec_type) :: PRE
... ...
! initialize the parallel environment
call psb_init(ictxt)
call psb_info(ictxt,iam,np)
... ...
! read and assemble the matrix A and the right-hand
! side b using PSBLAS routines for sparse matrix /
! vector management
... ...
! initialize the default multi-level preconditioner
! (two-level hybrid post-smoothed Schwarz)
call mld_precinit(PRE,'ML',info)
! build the preconditioner
call psb_precbld(A,PRE,DESC_A,info)
! set the solver parameters and the initial guess
... ...
! solve Ax=b with preconditioned BiCGSTAB
call psb_krylov('BICGSTAB',A,PRE,b,x,tol,DESC_A,info)
... ...
! cleanup the preconditioner
call mld_precfree(PRE,info)
! cleanup other data structures
... ...
! exit the parallel environment
call psb_exit(ictxt)
- solo istruzioni diverse dall'esempio precedente (essenzialmente il setting del precondizionatore, magari con piu' chiamate a precset;\\
- lasciare l'osservazione sulla specifica esplicita del numero di livelli;\\
- rimandare al paragrafo successivo per una decrizione accurata di tutti i parametri;\\
- lasciare l'osservazione sui vecchi utenti di PSBLAS.}\\
In the following we describe the general procedure for setting and building one of the MLD2P4 preconditioners.
The user has first to prepare the preconditioner data structure by using the routine \verb|mld_precinit|. Input parameters
for this routine include a string parameter, needed to define the preconditioner type, and an optional integer parameter
specifying the number of the levels in the case of a multi-level preconditioner.
Note that if the optional parameter is not present and a multi-level preconditioner has been chosen,
a two-level preconditioner is set. On the other hand, the integer parameter is ignored if the type of the preconditioner is not multilevel.
In Table \ref{tab:precinit} we report both the possible choices for the preconditioner type
and the related default preconditioners.
The user of MLD2P4 may set a lot of parameters for one-level and multi-level Schwarz, in order
to define a different preconditioner than that of default choices. The parameters
can be set through the routine \verb|mld_precset|. The APIs of \verb|mld_precinit| and \verb|mld_precset| as well as the complete
list of the parameters that can be set with the corresponding allowed values are reported in Section \ref{sec:highlevel}. In the following a simple code
for a three-level hybrid post-smoothed Schwarz preconditioner, using RAS with overlap 1 as local preconditioner,
with ILU(0) on the local blocks, a distributed coarse matrix, four block-Jacobi sweeps with the UMFPACK LU
factorization on the blocks as coarse-matrix solver, is reported. Note that for the multi-level preconditioners, the levels are numbered in increasing
order starting from the finest one, i.e. level 1 is the finest level.
For more details, see the test program \verb|example2.f90| in xxxx(directory dei test).
use psb_base_mod
use psb_util_mod
use mld_prec_mod
use psb_krylov_mod
... ...
! sparse matrix
type(psb_dspmat_type) :: A
! sparse matrix descriptor
type(psb_desc_type) :: DESC_A
! preconditioner data
type(mld_dprec_type) :: PRE
... ...
! initialization of the parallel environment
call psb_init(ictxt)
call psb_info(ictxt,iam,np)
... ...
! read and assemble the matrix A and the right-hand
! side vector b using PSBLAS routines for sparse
! matrix/vector management
... ...
! prepare the three-level hybrid post-smoothed Schwarz
! using RAS with overlap 1 as local preconditioner
call mld_precinit(PRE,'ML',info,nlev=3)
call mld_precset(PRE,mld_n_ovr_,novr=1,info,ilev=1)
call mld_precset(PRE,mld_sub_restr_,psb_halo_,info,ilev=1)
! build preconditioner
call psb_precbld(A,PRE,DESC_A,info)
! set solver parameters and initial guess
... ...
! solve Ax=b with preconditioned BiCGSTAB
call psb_krylov('BICGSTAB',A,PRE,b,x,tol,DESC_A,info)
... ...
! cleanup storage and exit
call mld_precfree(PRE,info)
call psb_gefree(b,DESC_A,info)
call psb_gefree(x,DESC_A,info)
call psb_spfree(A,DESC_A,info)
call psb_cdfree(DESC_A,info)
call psb_exit(ictxt)
{\bf Remark for users with PSBLAS-based legacy codes:} when MLD2P4 is installed, a PSBLAS user, with a PSBLAS-based legacy code
calling base preconditioners included in PSBLAS (NOPREC, DIAG and BJAC), is able to use the same preconditioners without changes to the code, if she/he
includes in her/his program the file \verb|psb_prec_mod|.
\section{High-Level User Interface\label{sec:highlevel}}
At the upper layer of MLD2P4, five black-box routines encapsulate all the functionalities for the construction
and the application of any of the multi-level preconditioners.
In the following we give the details of the above routines. Note that for each routine are available four
different versions depending on involved data types: Real-Single/Double Precision, Complex-Single/Double Precision.
\subsection{Preconditioner Setup and Building}\label{sec:setup}
The setup of a MLD2P4 preconditioner is obtained by using the \verb|mld_precinit| routine, which
allocates and initializes the preconditioner data structure.
The API of this routine as well as the description of the arguments is reported in Fig.~\ref{fig:prcinit}.
Note that the allowed values for the \verb|ptype| argument are reported in Table~\ref{tab:precinit} (Sec. \ref{sec:started}).
p type(mld_dprec_type), input/output.
The preconditioner data structure.
ptype character, input. The type of preconditioner.
info integer, output. Error code.
nlev integer, optional, input.
The number of levels of the multilevel preconditioner.
If nlev is not present and ptype=`ML'/`ml',
then nlev=2 is assumed.
Otherwise, nlev is ignored.
\caption{API of the routine for preconditioner allocation and inizialization.\label{fig:prcinit}}
p - type(mld_dprec_type), input/output.
The preconditioner data structure to be deallocated.
info - integer, output.
Error code.
\caption{API of the routine for preconditioner deallocation.\label{fig:prcfree}}
A twin routine for deallocation of the preconditioner data structure is the \verb|mld_precfree| routine, whose API is reported in
As mentioned in Section~\ref{sec:multilevel}, a multi-level preconditioner is a combination
of coarse-level corrections and one-level preconditioner (or smoothers).
Different combinations of these components together with different type of one-level preconditioner
as well as different algorithms to build and apply coarse-level corrections allow to the user of defining different multi-level
The user of MLD2P4 may specify the type of multi-level framework (additive or multiplicative), details on the
aggregation algorithm, details on the type and the way for applying the one-level preconditioner
(as pre-smoother, post-smoother or both), the coarsest matrix storage
(distributed or replicated), the type of the solver to be employed at the coarsest level
and related details, by setting some parameters through the routine \verb|mld_precset| (see Section~\ref{sec:list}).
The API of this routine is reported in Fig.~\ref{fig:prcset}.
p - type(mld_dprec_type), input/output.
The preconditioner data structure.
what - integer, input.
The number identifying the parameter to be set.
A mnemonic constant has been associated to each of these
val - integer/character, input.
The value of the parameter to be set.
info - integer, output.
Error code.
ilev - integer, optional, input.
For the multilevel preconditioner, the level at which the
preconditioner parameter has to be set.
If nlev is not present, the parameter identified by 'what'
is set at all the appropriate levels.
\caption{API of the routine for preconditioner setup.\label{fig:prcset}}
Finally, to build a preconditioner, according to the requirements made trough the routines \verb|mld_precinit| and \verb|mld_precset|,
a user of MLD2P4 have to call the \verb|prec_build| routine, whose API is reported in Figure~\ref{fig:prcbld}.
a - type(psb_dspmat_type).
The sparse matrix structure containing the local part of the
matrix to be preconditioned.
desc_a - type(psb_desc_type), input.
The communication descriptor of a.
p - type(mld_dprec_type), input/output.
The preconditioner data structure containing the local part
of the preconditioner to be built.
info - integer, output.
Error code.
\caption{API of the routine for preconditioner building.\label{fig:prcbld}}
\subsubsection{List of the preconditioner parameters\label{sec:list}}
In the following we report the list of possible parameters to be set through the \verb|mld_precset| routine,
in order to choose the type of multi-level preconditioner. The parameters are classified depending on their scope.
Note that for character data both uppercase and lowercase strings are allowed.
{\small \label{tab:prec_type}
Parameter (\verb|what|) & Allowed values ( \verb|val|)\\
\verb|mld_ml_type_| & 'ADD', 'MULT'\\
& Define the type of multi-level preconditioner.\\
\verb|mld_prec_type_| & 'DIAG', 'BJAC', 'AS' \\
& Define the smoother at a certain level.\\
\verb|mld_smooth_pos_| & 'PRE', 'POST', 'BOTH'\\
& Define the way to apply the smoother.\\
\caption{Parameters for preconditioner type.}
In order to build a coarse matrix from a fine one, this version of MLD2P4 implements the
smoothed aggregation algorithm described in Section~\ref{sec:aggregation}. However, since for nonsymmetric problems the
application of a correct smoothed procedure is yet an open problem~\cite{lin}, the user
may also choose to apply a nonsmoothed aggregation technique, where the prolongator operator from
the coarse to fine-space vertices is the simple piecewice constant interpolation
(the tentative prolongator) operator defined in Section~\ref{sec:aggregation}.
The coarsening scheme takes into account possible anisotropic features of the problems, by using
a threshold level to be used for dropping matrix coefficients during the process.
The parallel implementation of the coarsening algorithm is based on a decoupled approach, where each process applies the coarsening scheme
to its own local data. The uncoupled scheme can be applied to the matrix $A+A^T$, in the case of matrices with nonsymmetric sparsity pattern.
In the Table \ref{tab:aggr_type} we list the parameters that the user can specify for the aggregation algorithm.
{\small \label{tab:aggr_type}
Parameter & Allowed values \\
(\verb|what|) & ( \verb|val|)\\
\verb|mld_aggr_alg_| & 'DEC', 'SYMDEC'\\
& Define the aggregation scheme\\
& Now, only decoupled aggregation is available \\
& (if 'SYMDEC' is set, the symmetric part of the matrix is considered)\\
\verb|mld_aggr_kind_| & 'SMOOTH', 'RAW'\\
& Define the type of aggregation technique (smoothed or nonsmoothed).\\
\verb|mld_aggr_thresh_| & Dropping threshold in aggregation.\\
& Default 0.0\\
\verb|mld_aggr_eig_| & NON E' DEFINITA LA STRINGA CORRISPONDENTE a mldmaxnorm\\
& Define the algorithm to evaluate the maximum eigenvalue\\
& of $D^{-1}A$ for smoothed aggregation. Now only the A-norm of the\\
& matrix is available.\\
\caption{Parameters for aggregation type.}
Some options are available for the system involving the coarsest matrix.
Indeed, this matrix can be replicated or distributed among the processors.
In the former case, various versions of incomplete LU (ILU) factorizations of the
coarsest matrix are available in order to solve the coarsest system.
In the current version of MLD2P4, the following factorizations are available~\cite{saad}:
\item[ILU(k):] ILU factorization with fill-in level $k$;
\item[MILU(k):] modified ILU factorization with fill-in level $k$;
\item[ILU(k,t):] ILU with threshold $t$ and $k$ additional entries in each row of the L and U factors with respect to the initial sparsity pattern.
Furthermore, interfaces to UMFPACK~\cite{UMFPACK}, version 4.4, and to SuperLU package~\cite{SUPERLU}, version 3.0, have been also available to deal
with the coarsest system, when the coarsest matrix is replicated among the processors.
On the other hand, to solve the coarsest-level system when the coarsest matrix is distributed,
a block-Jacobi routine has been developed. It uses the different versions of ILU or the LU
factorization on the coarse matrix diagonal blocks held by the processors. In the case of
distributed coarsest matrix is also available an interface to SupeLU$\_$dist~\cite{SUPERLUDIST}, version 2.0, for distributed
sparse factorization and solve.
See the Table \ref{tab:coarse_mat} for details.
{\small \label{tab:coarse_mat}
Parameter & Allowed values\\
( \verb|what|) & ( \verb|val|)\\
\verb|mld_coarse_mat_| & 'DISTR', 'REPL' \\
& Coarse Matrix: distributed or replicated \\
\verb|mld_coarse_solve_| & 'ILU', 'MILU', 'ILUT', 'SLU', 'UMF', SLUDIST', BJAC????\\
& Available Coarse solver.\\
& Only SLUDIST e BJAC can be used when coarse matrix is distributed\\
\verb|mld_coarse_BJAC_sweeps_| & (NON VA BENE mldcoarsesweeps) number of Block-Jacobi sweeps when BJAC is used as coarsest solver\\
\verb|mld_coarse_fill_in_| & level of fill-in in MILU and ILU factorization\\
\caption{Parameters for coarsest matrix solver.}
When a Schwarz algorithm is considered as smoother at a certain level or as one-level preconditioner, the user may set many parameters
in order to choose the type of additive Schwarz version (AS,RAS,ASH), the number of overlaps as well as the local solver.
All the parameters are reported in Table \ref{tab:schwarz_type}.
{\small \label{tab:schwarz_type}
Parameter & Allowed values\\
(\verb|what|) & (\verb|val|)\\
\verb|mld_n_ovr_| & Number of overlaps \\
\verb|mld_sub_restr_| & 'HALO', 'NONE'\\
\verb|mld_sub_prol_| & 'SUM', 'NONE'\\
\verb|mld_sub_solve_| & 'ILU', 'MILU', 'ILUT', 'SLU', 'UMF'\\
\verb|mld_sub_ren_| & MANCANO LE STRINGHE\\
\verb|mld_sub_fill_in_| & level of fill-in in local diagonal blocks, when ILU-type factorizations are used\\
\caption{Parameters for Schwarz smoother/preconditioner type.}
Its worth noting that, the classical AS method corresponds to the couple of values 'HALO' and 'SUM' of the argument \verb|val|,
for the values \verb|mld_sub_restr_| and \verb|mld_sub_prol_| of the argument \verb|what|, respectively. While, the RAS method corresponds to
the couple of values 'NONE' and 'SUM' and ASH method corresponds to the couple of values 'HALO' and 'NONE'.
\subsection{Preconditioner Application} \label{sec:application}
Once the preconditioner has been built, it may be applied at each iteration
of a Krylov solver by calling the routine \verb|mld_precaply| (CAMBIARE NOME ROUTINE NEL SOFTWARE EVITANDO L'UNDERSCORE),
whose API is shown in Figure~\ref{fig:prcaply}.
This routine computes $y = op(M^{-1})\, x$, where $M$ is the previously built
preconditioner, stored in the \verb|prec| data structure, and $op$
denotes the matrix itself or its transpose, according to the value of \verb|trans|.
Note that this routine is called within the PSBLAS-based Krylov solver available in the PSBLAS library (see the PSBLAS User's Guide for details),
therefore, the use of this routine is generally transparent to the MLD2P4 user.
prec - type(mld_dprec_type), input.
The preconditioner data structure containing the local part
of the preconditioner to be applied.
x - real(psb_dpk_), dimension(:), input.
The local part of the vector X in Y := op(M^(-1)) * X.
y - real(psb_dpk_), dimension(:), output.
The local part of the vector Y in Y := op(M^(-1)) * X.
desc_data - type(psb_desc_type), input.
The communication descriptor associated to the matrix to be
info - integer, output.
Error code.
trans - character(len=1), optional.
If trans='N','n' then op(M^(-1)) = M^(-1);
if trans='T','t' then op(M^(-1)) = M^(-T) (transpose of M^(-1)).
work - real(psb_dpk_), dimension (:), optional, target.
Workspace. Its size must be at
least 4*psb_cd_get_local_cols(desc_data).
\caption{API of the routine for preconditioner application.\label{fig:prcaply}}
\section{List of Routines}\label{sec:routines}
Elenco (ordine alfabetico) di tutte le routine, con rinvio (ipertestuale e num. pag.) alla descrizione
di ciascuna in qualche paragrafo precedente
(una specie di indice analitico, che rimanda alle routine descritte precedentemente nei rispettivi paragrafi)
\section{General Overview\label{sec:overview}}
The \emph{Multi-Level Domain Decomposition Parallel Preconditioners Package based on
PSBLAS (MLD2P4}) provides various versions of multi-level Schwarz preconditioners~\cite{DD2},
to be used in the iterative solutions of sparse linear systems $Ax=b$, where
$A$ is a square, real or complex, sparse matrix with a symmetric sparsity pattern.
$A$ is a square, real or complex, sparse matrix with a symmetric sparsity pattern.
lavoriamo su $(A+A^T)/2$? Ma questo vale solo per l'aggregazione? Dovremmo fare
qualcosa di consistente anche con 1-lev Schwarz.}
Both additive and hybrid preconditioners, i.e.\ multiplicative among the levels
and additive inside a level, are implemented; the basic additive Schwarz preconditioners
are obtained by considering only one level. A purely algebraic approach is used to
generate a sequence of coarse-level corrections to a basic preconditioner, without
explicitly using any information on the geometry of the original problem (e.g.\ the
discretization of a PDE). The smoothed aggregation technique is applied
as algebraic coarsening strategy~\cite{}.
and is based on a distributed-memory parallel programming paradigm.
and is based on a distributed-memory parallel programming paradigm. \textbf{SALVATORE,
potresti aggiungere due righe sulla scelta del Fortran 95 e sul semplice interfacciamento
con i legacy codes, senza ripetere quello che e' detto sotto sulla scelta di PSBLAS?}
Single and double precision implementations of MLD2P4 are available for both the
real and the complex case, that can be used through a single interface.
\textbf{SALVATORE, funziona tutto?}
MLD2P4 has been designed to implement scalable and easy-to-use multilevel preconditioners
in the context of the PSBLAS (Parallel Sparse BLAS) computational framework~\cite{}.
PSBLAS is a library originally developed to address the parallel implementation of
iterative solvers for sparse linear system, by providing basic linear algebra
operators and data management facilities for distributed sparse matrices; it
also includes parallel Krylov solvers, built on the top of the basic PSBLAS kernels.
The preconditioners available in MLD2P4 can be used with these Krylov solvers.
The choice of PSBLAS has been mainly motivated by the need of having
a portable and efficient software infrastructure implementing ``de facto'' standard
parallel sparse linear algebra kernels, to pursue goals such as performance,
portability, modularity ed extensibility in the development of the preconditioner
package. On the other hand, the implementation of MLD2P4 has led to some
revisions and extentions of the PSBLAS kernels, leading to the
recent PSBLAS 2.0 version~\cite{}. The inter-process comunication required
by MLD2P4 is encapsulated into the PSBLAS routines, except few cases where
MPI~\cite{} is explicitly called. Therefore, MLD2P4 can be run on any parallel
machine where PSBLAS and MPI implementations are available.
MLD2P4 has a layered and modular software architecture where three main layers can be identified. The lower layer consists of the PSBLAS kernels, the middle one implements
the construction and application phases of the preconditioners, and the upper one
provides a uniform and easy-to-use interface to all the preconditioners.
This architecture allows for different levels of use of the package:
few black-box routines at the upper level allow non-expert users to easily
build any preconditioner available in MLD2P4 and to apply it within a PSBLAS Krylov solver.
On the other hand, the routines of the middle and lower layer can be used and extended
by expert users to build new versions of multi-level Schwarz preconditioners.\\
\textbf{Organizzazione della guida:\\
dire che per il momento non
forniamo anche la documentazione del middle layer, ma lo faremo in seguito\\}
\textbf{Evidenziare le parole chiave che caratterizzano il nostro package}
\pdfcompresslevel=0 %-- 0 = none, 9 = best
\pdfinfo{ %-- Info dictionary of PDF output /Author (Alfredo Buttari)
\pdfinfo{ %-- Info dictionary of PDF output /Author (PD, DdS, SF)
/Title (MultiLevel Domain Decomposition Parallel Preconditioners Package
based on PSBLAS V. 1.0)
/Subject ( MultiLevel Domain Decomposition Parallel Preconditioners
/Keywords (Computer Science Linear Algebra Fluid Dynamics Parallel Linux MPI PSBLAS Iterative Solvers Preconditioners)
based on PSBLAS, V. 1.0)
/Subject (MultiLevel Domain Decomposition Parallel Preconditioners Package)
/Keywords (Parallel Numerical Software, Algebraic Multilevel Preconditioners, Sparse Iterative Solvers, PSBLAS, MPI)
/Creator (pdfLaTeX)
/Producer ($Id: userguide.tex 1978 2007-10-19 14:51:12Z sfilippo $)
/Producer ($Id: userguide.tex 2008-04-08 Pasqua D'Ambra, Daniela di Serafino,
Salvatore Filippone$)
\pdfcatalog{ %-- Catalog dictionary of PDF output.
/URI (
% /URI (
@ -78,175 +78,43 @@
\ \\
\pagenumbering{roman} % Roman numbering
\setcounter{page}{1} % Abstract start on page i
\pagenumbering{roman} % Roman numbering
\setcounter{page}{1} % Abstract start on page ii
%\pagenumbering{roman} % Roman numbering
%\setcounter{page}{1} % Abstract start on page ii
\pagenumbering{arabic} % Arabic numbering
\setcounter{page}{1} % Chapters start on page 1
G.~Bella, S.~Filippone, A.~De Maio and M.~Testa,
{\em A Simulation Model for Forest Fires},
in J.~Dongarra, K.~Madsen, J.~Wasniewski, editors,
Proceedings of PARA~04 Workshop on State of the Art
in Scientific Computing, pp.~546--553, Lecture Notes in Computer Science,
Springer, 2005.
\bibitem{2007d} A. Buttari, D. di Serafino, P. D'Ambra, S. Filippone,\newblock
2LEV-D2P4: a package of high-performance preconditioners,\newblock
Applicable Algebra in Engineering, Communications and Computing,
Volume 18, Number 3, May, 2007, pp. 223-239
%Published online: 13 February 2007, {\tt}
\bibitem{2007c} P. D'Ambra, S. Filippone, D. Di Serafino\newblock
On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners
Applied Numerical Mathematics, Elsevier Science,
Volume 57, Issues 11-12, November-December 2007, Pages 1181-1196.
%published online 3 February 2007, {\tt
Dongarra, J. J., DuCroz, J., Hammarling, S. and Hanson, R.,
An Extended Set of {F}ortran {B}asic {L}inear {A}lgebra {S}ubprograms,
{ACM Trans. Math. Softw.} vol.~{14}, 1--17, 1988.
Dongarra, J., DuCroz, J., Hammarling, S. and Duff, I.,
A Set of level 3 Basic Linear Algebra Subprograms,
{ACM Trans. Math. Softw.} vol.~{16}, 1--17, 1990.
%% \bibitem{DOUGLAS}
%% R.E.~Bank and C.C.~Douglas,
%% {\em SMMP: Sparse Matrix Multiplication Package},
%% Advances in Computational Mathematics, 1993, 1, 127-137.
%% (See also {\tt})
%% \bibitem{PARA04}
%% A.~Buttari, P.~D'Ambra, D.~di Serafino and S.~Filippone,
%% {\em Extending PSBLAS to Build Parallel Schwarz Preconditioners},
%% in , J.~Dongarra, K.~Madsen, J.~Wasniewski, editors,
%% Proceedings of PARA~04 Workshop on State of the Art
%% in Scientific Computing, pp.~593--602, Lecture Notes in Computer Science,
%% Springer, 2005.
%% \bibitem{CAI_SAAD}
%% X.~C.~Cai and Y.~Saad,
%% {\em Overlapping Domain Decomposition Algorithms for General Sparse Matrices},
%% Numerical Linear Algebra with Applications, 3(3), pp.~221--237, 1996.
%% %
%% \bibitem{CAI_SARKIS}
%% X.C.~Cai and M.~Sarkis,
%% {\em A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems},
%% SIAM Journal on Scientific Computing, 21(2), pp.~792--797, 1999.
%% \bibitem{CAI_WIDLUND}
%% X.C.~Cai and O.~B.~Widlund,
%% {\em Domain Decomposition Algorithms for Indefinite Elliptic Problems},
%% SIAM Journal on Scientific and Statistical Computing, 13(1), pp.~243--258, 1992.
%% \bibitem{DD1}
%% T.~Chan and T.~Mathew,
%% {\em Domain Decomposition Algorithms},
%% in A.~Iserles, editor, Acta Numerica 1994, pp.~61--143, 1994.
%% Cambridge University Press.
%% %
%% \bibitem{APNUM06}
%% P.~D'Ambra, D.~di Serafino and S.~Filippone,
%% On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners,
%% Applied Numerical Mathematics, to appear, 2007.
%% \bibitem{UMFPACK}
%% T.A.~Davis,
%% {\em Algorithm 832: UMFPACK - an Unsymmetric-pattern Multifrontal
%% Method with a Column Pre-ordering Strategy},
%% ACM Transactions on Mathematical Software, 30, pp.~196--199, 2004.
%% (See also {\tt})
%% %
%% \bibitem{SUPERLU}
%% J.W.~Demmel, S.C.~Eisenstat, J.R.~Gilbert, X.S.~Li and J.W.H.~Liu,
%% A supernodal approach to sparse partial pivoting,
%% SIAM Journal on Matrix Analysis and Applications, 20(3), pp.~720--755, 1999.
J.~J.~Dongarra and R.~C.~Whaley,
{\em A User's Guide to the BLACS v.~1.1},
Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of
Tennessee, March 1995 (updated May 1997).
I.~Duff, M.~Marrone, G.~Radicati and C.~Vittoli,
{\em Level 3 Basic Linear Algebra Subprograms for Sparse Matrices:
a User Level Interface},
ACM Transactions on Mathematical Software, 23(3), pp.~379--401, 1997.
I.~Duff, M.~Heroux and R.~Pozo,
{\em An Overview of the Sparse Basic Linear
Algebra Subprograms: the New Standard from the BLAS Technical Forum},
ACM Transactions on Mathematical Software, 28(2), pp.~239--267, 2002.
S.~Filippone and M.~Colajanni,
{\em PSBLAS: A Library for Parallel Linear Algebra
Computation on Sparse Matrices},
ACM Transactions on Mathematical Software, 26(4), pp.~527--550, 2000.
S.~Filippone, P.~D'Ambra, M.~Colajanni,
{\em Using a Parallel Library of Sparse Linear Algebra in a Fluid Dynamics
Applications Code on Linux Clusters},
in G.~Joubert, A.~Murli, F.~Peters, M.~Vanneschi, editors,
Parallel Computing - Advances \& Current Issues,
pp.~441--448, Imperial College Press, 2002.
Karypis, G. and Kumar, V.,
{\em {METIS}: Unstructured Graph Partitioning and Sparse Matrix
Ordering System}.
Minneapolis, MN 55455: University of Minnesota, Department of
Computer Science, 1995.
Internet Address: {\verb||}.
Lawson, C., Hanson, R., Kincaid, D. and Krogh, F.,
Basic {L}inear {A}lgebra {S}ubprograms for {F}ortran usage,
{ACM Trans. Math. Softw.} vol.~{5}, 38--329, 1979.
{Machiels, L. and Deville, M.}
{\em Fortran 90: An entry to object-oriented programming for the solution
of partial differential equations.}
{ACM Trans. Math. Softw.} vol.~{23}, 32--49.
{Metcalf, M., Reid, J. and Cohen, M.}
{\em Fortran 95/2003 explained.}
{Oxford University Press}, 2004.
%% \bibitem{DD2}
%% B.~Smith, P.~Bjorstad and W.~Gropp,
%% {\em Domain Decomposition: Parallel Multilevel Methods for Elliptic
%% Partial Differential Equations},
%% Cambridge University Press, 1996.
M.~Snir, S.~Otto, S.~Huss-Lederman, D.~Walker and J.~Dongarra,
{\em MPI: The Complete Reference. Volume 1 - The MPI Core}, second edition,
MIT Press, 1998.
File diff suppressed because one or more lines are too long