mld2p4:

docs/pdf/Makefile docs/pdf/abstract.tex docs/pdf/advanced.tex docs/pdf/background.tex docs/pdf/bibliography.tex docs/pdf/building.tex docs/pdf/conventions.tex docs/pdf/distribution.tex docs/pdf/errors.tex docs/pdf/gettingstarted.tex docs/pdf/highlevelview.tex docs/pdf/listofroutines.tex docs/pdf/overview.tex docs/pdf/userguide.tex docs/userguide.pdf New documentation, partial fixes.
17 years ago · 001f6693b8
parent 9eeef87a3a
commit 001f6693b8
14 changed files with 3037 additions and 2816 deletions
--- a/docs/pdf/abstract.tex
+++ b/docs/pdf/abstract.tex
@ -1,19 +1,19 @@
-\begin{abstract}
-\emph{MLD2P4 (Multi-Level Domain Decomposition Parallel Preconditioners Package based on
-PSBLAS}) is a package of parallel algebraic multi-level preconditioners.
-It implements various versions of one-level additive and of multi-level additive
-and hybrid Schwarz algorithms. In the multi-level case, a purely algebraic approach
-is applied to generate coarse-level corrections, so that no geometric background is needed
-concerning the matrix to be preconditioned. The matrix is required to be square, real or complex, with a symmetric sparsity pattern \textbf{Non consideriamo anche il caso non simmetrico
-con $(A+A^T)/2$?}.
-
-MLD2P4 has been designed to provide scalable and easy-to-use preconditioners in the
-context of the PSBLAS (Parallel Sparse Basic Linear Algebra Subprograms)
-computational framework and can be used in conjuction with the Krylov solvers
-available in this framework. MLD2P4 enables the user to easily specify different aspects
-of a generic algebraic multilevel Schwarz preconditioner, thus allowing to search
-for the ``best'' preconditioner for the problem at hand. The package has been designed 
-employing object-oriented techniques, using Fortran 95 and MPI, with interfaces to
-additional external libraries such as UMFPACK, SuperLU and SuperLU\_Dist, that
-can be exploited in building multi-level preconditioners.
+\begin{abstract}
+\emph{MLD2P4 (Multi-Level Domain Decomposition Parallel Preconditioners Package based on
+PSBLAS}) is a package of parallel algebraic multi-level preconditioners.
+It implements various versions of one-level additive and of multi-level additive
+and hybrid Schwarz algorithms. In the multi-level case, a purely algebraic approach
+is applied to generate coarse-level corrections, so that no geometric background is needed
+concerning the matrix to be preconditioned. The matrix is required to be square, real or complex, with a symmetric sparsity pattern \textbf{Non consideriamo anche il caso non simmetrico
+con $(A+A^T)/2$?}.
+
+MLD2P4 has been designed to provide scalable and easy-to-use preconditioners in the
+context of the PSBLAS (Parallel Sparse Basic Linear Algebra Subprograms)
+computational framework and can be used in conjuction with the Krylov solvers
+available in this framework. MLD2P4 enables the user to easily specify different aspects
+of a generic algebraic multilevel Schwarz preconditioner, thus allowing to search
+for the ``best'' preconditioner for the problem at hand. The package has been designed 
+employing object-oriented techniques, using Fortran 95 and MPI, with interfaces to
+additional external libraries such as UMFPACK, SuperLU and SuperLU\_Dist, that
+can be exploited in building multi-level preconditioners.
 \end{abstract}
--- a/docs/pdf/advanced.tex
+++ b/docs/pdf/advanced.tex
@ -1,12 +1,12 @@
-\section{Advanced Use}\label{sec:advanced}
-
-    - MLD2P4 software architecture \\
-    - preconditioner data structure (descrizione "dettagliata") + possibilita' di settare singolarmente
-      i vari livelli (possibilita' accennata solamente nella precedente descrizione di precset) \\
-    - descrizione routine medium level (con introduzione sulle potenzialita' di ampliamento (?), offerte
-      da queto strato software) \\
-
-%%% Local Variables: 
-%%% mode: latex
-%%% TeX-master: "userguide"
-%%% End: 
+\section{Advanced Use}\label{sec:advanced}
+
+    - MLD2P4 software architecture \\
+    - preconditioner data structure (descrizione "dettagliata") + possibilita' di settare singolarmente
+      i vari livelli (possibilita' accennata solamente nella precedente descrizione di precset) \\
+    - descrizione routine medium level (con introduzione sulle potenzialita' di ampliamento (?), offerte
+      da queto strato software) \\
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "userguide"
+%%% End: 
--- a/docs/pdf/background.tex
+++ b/docs/pdf/background.tex
@ -1,291 +1,291 @@
-\section{Multi-level Domain Decomposition Background\label{sec:background}}
-
-\emph{Domain Decomposition} (DD) preconditioners, coupled with Krylov iterative
-solvers, are widely used in the parallel solution of large and sparse linear systems.
-These preconditioners are based on the divide and conquer technique: the matrix
-to be preconditioned is divided into submatrices, a ``local linear system''
-involving each submatrix is (approximately) solved, and the local solutions are used
-to build a preconditioner for the whole original matrix. This process
-often corresponds to dividing a physical domain associated to the original matrix
-into subdomains, e.g. in a PDE discretization, to (approximately) solving the
-subproblems corresponding to the subdomains and to building an approximate
-solution of the original problem from the local solutions 
-\cite{Cai_Widlund_92,dd1_94,dd2_96}. 
-
-\emph{Additive Schwarz} preconditioners are DD preconditioners using overlapping
-submatrices, i.e.\ with some common rows, to couple the local information
-related to the submatrices (see, e.g., \cite{dd2_96}).
-The main motivations for choosing Additive Schwarz preconditioners are their
-intrinsic parallelism and good \textbf{(dire good e' un po' "`forte"', dato che
-subito dopo diciamo che la convergenza dipende dal numero di sottomatrici)}
-convergence properties. A drawback of these
-preconditioners is that the number of iterations of the preconditioned solvers
-generally grows with the number of submatrices. This may be a serious limitation
-on parallel computers, since the number of submatrices usually matches the number
-of available processors. Optimal convergence rates, i.e.\ iteration numbers
-independent of the number of submatrices, can be obtained by correcting the
-preconditioner through a suitable approximation of the original linear system
-in a coarse space, which globally couples the information related to the single
-submatrices. 
-
-\emph{Two-level Schwarz} preconditioners are obtained
-by combining basic (one-level) Schwarz preconditioners with coarse-level
-corrections. In this context, the one-level preconditioner is often
-called smoother. Different two-level preconditioners are obtained by varying the
-choice of the smoother, of the coarse-level correction and the
-way they are combined \cite{dd2_96}. The same reasoning can be applied starting
-from the coarse-level system, i.e.\ a coarse-space correction can be built
-from this system, thus obtaining \emph{multi-level} preconditioners.
-
-It is worth noting that optimal preconditioners do not necessarily correspond
-to minimum execution times. Indeed, to obtain effective multilevel preconditioners
-a tradeoff between optimality of convergence and the cost of building and applying
-the coarse-space corrections must be achieved. The choice of the number of levels,
-i.e.\ of the coarse-space corrections, also affects the effectiveness of the
-preconditioners. One more goal is to get convergence rates as less sensitive
-as possible to variations in the matrix coefficients.
-
-Two main approaches can be used to build coarse-space corrections. The geometric approach
-applies coarsening strategies based on the knowledge of some physical grid associated
-to the matrix and requires the user to define grid transfer operators from the fine
-to the coarse levels and vice versa. This may result difficult for complex geometries;
-furthermore, suitable one-level preconditioners may be required to get efficient
-interplay between fine and coarse levels, e.g.\ when matrices with highly varying coefficients
-are considered. The algebraic approach builds coarse-space corrections using only matrix
-information. It performs a fully automatic coarsening and enforces the interplay between
-the fine and coarse levels by suitably choosing the coarse space and the coarse-to-fine
-interpolation \cite{StubenGMD69_99}.
-
-MLD2P4 uses a pure algebraic approach for building the sequence of coarse matrices
-starting from the original matrix. The algebraic approach is based on the \emph{smoothed 
-aggregation} algorithm \cite{Brezina_Vanek_,Vanek_Mandel_Brezina_}. A decoupled version
-of this algorithm is implemented, where the smoothed aggregation is applied locally
-to each submatrix \cite{Tuminaro_Tong_00}. In the next two subsections we provide
-a brief description of the multi-level Schwarz preconditioners and on the smoothed
-aggregation technique as implemented in MLD2P4. For further details the user
-is referred to \cite{para_04,apnum_07,aaecc_07,dd2_96}.
-
-
-\subsection{Multi-level Schwarz Preconditioners\label{sec:multilevel}}
-
-The Multilevel preconditioners implemented in MLD2P4 are obtained by combining
-Additive Schwarz preconditioners with coarse-space corrections; therefore
-we first provide a sketch of the Additive Schwarz preconditioners.
-
-Given a linear system
-\[ Ax=b, \]
-where $A=(a_{ij}) \in \Re^{n \times n}$ is a
-nonsingular sparse matrix with a symmetric non-zero pattern,
-let $G=(W,E)$ be the adjacency graph of $A$, where $W=\{1, 2, \ldots, n\}$
-and $E=\{(i,j) : a_{ij} \neq 0\}$ are the vertex set and the edge set of $G$,
-respectively. Two vertices are called adjacent if there is an edge connecting
-them. For any integer $\delta > 0$, a $\delta$-overlap
-partition of $W$ can be defined recursively as follows.
-Given a 0-overlap (or non-overlapping) partition of $W$,
-i.e.\ a set of $m$ disjoint nonempty sets $W_i^0 \subset W$ such that
-$\cup_{i=1}^m W_i^0 = W$, a $\delta$-overlap
-partition of $W$ is obtained by considering the sets
-$W_i^\delta \supset W_i^{\delta-1}$, obtained by including the vertices that
-are adjacent to any vertex in $W_i^{\delta-1}$.
-
-Let $n_i^\delta$ be the size of $W_i^\delta$ and $R_i^{\delta} \in 
-\Re^{n_i^\delta \times n}$ the restriction operator that maps
-a vector $v \in \Re^n$ onto the vector $v_i^{\delta} \in \Re^{n_i^\delta}$
-containing the components of $v$ corresponding to the vertices in
-$W_i^\delta$. The transpose of $R_i^{\delta}$ is a
-prolongation operator from $\Re^{n_i^\delta}$ to $\Re^n$.
-The matrix $A_i^\delta=R_i^\delta A (R_i^\delta)^T \in
-\Re^{n_i^\delta \times n_i^\delta}$ can be considered
-as a restriction of $A$ corresponding to the set $W_i^{\delta}$.
-
-The \emph{classical one-level AS} preconditioner is defined by
-\[
-M_{AS}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T 
-(A_i^\delta)^{-1} R_i^{\delta},
-\]
-where $A_i^\delta$ is assumed to be nonsingular. Its application
-to a vector $v \in \Re^n$ within a Krylov solver requires the following
-three steps:
-\begin{enumerate}
-	\item restriction of $v$ as $v_i = R_i^{\delta} v$, $i=1,\ldots,m$;
-	\item (approximate) solution of the linear systems $A_i^\delta w_i = v_i$,
-	      $i=1,\ldots,m$;
-	\item prolongation and sum of the $w_i$'s, i.e. $w = \sum_{i=1}^m (R_i^{\delta})^T w_i$.
-\end{enumerate}
-A variant of the classical AS preconditioner that outperforms it
-in terms of both convergence rate and of computation and communication
-time on parallel distributed-memory computers is the so-called \emph{Restricted AS
-(RAS)} preconditioner~\cite{Cai_Sarkis,Efstathiou_Gander}. It
-is obtained by zeroing the components of $w_i$ corresponding to the
-overlapping vertices when applying the prolongation. Therefore,
-RAS differs from classical AS by the prolongation operator $(R_i^{\delta})^T$,
-which is substituted by $(\tilde{R}_i^0)^T \in \Re^{n_i^\delta \times n}$,
-where $\tilde{R}_i^0$ obtained by zeroing the rows of $R_i^\delta$
-corresponding to the vertices in $W_i^\delta \backslash W_i^0$:
-\[
-M_{RAS}^{-1}= \sum_{i=1}^m (\tilde{R}_i^0)^T 
-(A_i^\delta)^{-1} R_i^{\delta}.
-\]
-Analogously, the AS variant called \emph{AS with Harmonic extension (ASH)}
-is defined by
-\[ M_{ASH}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T 
-(A_i^\delta)^{-1} \tilde{R}_i^0.
-\]
-We note that for $\delta=0$ the three variants of the AS preconditioner are
-all equal to the block-Jacobi preconditioner.
-
-As already observed, the convergence rate of the one-level Schwarz
-preconditioned iterative solvers deteriorates as the number $m$ of partitions
-of $W$ increases \cite{dd1_94,dd2_96}. To reduce the dependency
-of the number of iterations on the degree of parallelism we may
-introduce a global coupling among the overlapping partitions by defining 
-a coarse-space approximation $A_C$ of the matrix $A$. 
-In a pure algebraic setting, $A_C$ is usually built with
-a Galerkin approach. Given a set $W_C$ of \emph{coarse vertices},
-with size $n_C$, and a suitable restriction operator
-$R_C \in \Re^{n_C \times n}$, $A_C$ is defined as
-\[
-A_C=R_C A R_C^T
-\]
-and the coarse-level correction matrix to be combined with a generic
-one-level AS preconditioner $M_{1L}$ is obtained as
-\[
-M_{C}^{-1}= R_C^T A_C^{-1} R_C,
-\]
-where $A_C$ is assumed to be nonsingular. The application of $M_{C}^{-1}$
-to a vector $v$ corresponds to a restriction, a solution and
-a prolongation step; the solution step, involving the matrix $A_C$,
-may be carried out also approximately.
-
-The combination of $M_{C}$ and $M_{1L}$ may be
-performed in either an additive or a multiplicative framework.
-In the former case, the \emph{two-level additive} Schwarz preconditioner
-is obtained:
-\[
-M_{2LA}^{-1} = M_{C}^{-1} + M_{1L}^{-1}. 
-\]
-Applying $M_{2L-A}^{-1}$ to a vector $v$ within a Krylov solver
-corresponds to applying $M_{C}^{-1}$
-and $M_{1L}^{-1}$ to $v$ independently and then summing up
-the results.
-
-In the multiplicative case, the combination can be
-performed by first applying the smoother $M_{1L}^{-1}$ and then
-the coarse-level correction operator $M_{C}^{-1}$:
-\[
-\begin{array}{l}
-w = M_{1L}^{-1} v, \\
-z = w + M_{C}^{-1} (v-Aw);
-\end{array}
-\]
-this corresponds to the following \emph{two-level hybrid pre-smoothed}
-Schwarz preconditioner:
-\[
-M_{2LH-PRE}^{-1} = M_{C}^{-1} + \left( I - M_{C}^{-1}A \right) M_{1L}^{-1}. 
-\]
-On the other hand, by applying the smoother after the coarse-level correction,
-i.e.\ by computing
-\[
-\begin{array}{l}
-w = M_{C}^{-1} v , \\
-z = w + M_{1L}^{-1} (v-Aw) , 
-\end{array}
-\]
-the \emph{two-level hybrid post-smoothed}
-Schwarz preconditioner is obtained:
-\[
-M_{2LH-POST}^{-1} = M_{1L}^{-1} + \left( I - M_{1L}^{-1}A \right) M_{C}^{-1}. 
-\]
-One more variant of two-level hybrid preconditioner is obtained by applying
-the smoother before and after the coarse-level correction. In this case, the
-preconditioner is symmetric if $A$, $M_{1L}$ and $M_{C}$ are symmetric.
-
-As previously noted, on parallel computers the number of sumatrices usually matches
-the number of available processors. When the size of the system to be preconditioned
-is very large, the use of many proccessors, i.e.\ of many small submatrices, often
-leads to a large coarse-level system, whose solution may be computationally expensive.
-On the other hand, the use of few processors often leads to local sumatrices that
-are too expensive to be processed on single processors, because of memory and/or
-computing requirements. Therefore, it seems natural to use a recursive approach,
-in which the coarse-level correction is re-applied starting from the current
-coarse-level system. The corresponding preconditioners are called \emph{multi-level}.
-One more reason for the multi-level approach is that it may significantly
-reduce the computational cost of preconditioning with respect to the two-level case
-(see \cite[Chapter 3]{dd2_96}). Additive and hybrid multilevel preconditioners
-are obtained as direct extensions of the two-level counterparts. Other combinations
-of the smoothers and coarse-level corrections are possible, leading to variants
-of the previous algorithms. For a detailed descrition of them, the reader is
-referred to \cite[Chapter 3]{dd2_96}.
-\textbf{Secondo me qui ci vorrebbe una descrizione algoritmica, a titolo di esempio,
-di un precondizionatore multilevel, ad esempio quello ibrido con pre-smoothing, sul tipo
-della descrizione in figura 1 della guida di Trilinos ML 4.0. CHE NE PENSATE?}
-
-
-\subsection{Smoothed Aggregation\label{sec:aggregation}}
-
-To define the restriction operator $R_C$, which is used to compute
-the coarse-level matrix $A_C$, MLD2P4 uses the \emph{smoothed aggregation}
-algorithm described in \cite{Brezina_Vanek_,Vanek_Mandel_Brezina_}.
-The basic idea of this algorithm is to build a coarse set of vertices
-$W_C$ by suitably grouping the vertices of $W$ into disjoint subsets
-(aggregates), and to define the coarse-to-fine space transfer operator $R_C^T$ by
-applying a suitable smoother to a simple piecewise constant
-prolongation operator, to improve the quality of the coarse-space correction.
-
-Three main steps can be identified in the smoothed aggregation procedure:
-\begin{itemize}
-	\item coarsening of the vertex set $W$, to obtain $W_C$;
-	\item construction of the prolongator $R_C^T$;
-	\item application of $R_C$ and $R_C^T$ to build $A_C$.
-\end{itemize}
- 
-To perform the coarsening step, we have implemented the aggregation algorithm sketched
-in \cite{apnum_07}. According to \cite{brezina_vanek}, a modification of this algorithm
-has been actually considered,
-in which each aggregate $N_r$ is made of vertices of $W$ that are \emph{strongly coupled}
-to a certain root vertex $r \in W$, i.e.\
-\[  N_r = \left\{s \in W: |a_{rs}| \geq \theta \sqrt{|a_{rr}a_{ss}|} \right\} \]
-for a given $\theta \in [0,1]$.
-Since the previous algorithm has a sequential nature, a \emph{decoupled} version of
-it has been chosen, where each processor $i$ independently applies the algorithm to
-the set of vertices $W_i^0$ assigned to it in the initial data distribution. This
-version is embarrassingly parallel, since it does not require any data communication.
-On the other hand, it may produce non-uniform aggregates near boundary vertices,
-i.e.\ near vertices adjacent to vertices in other processors, and is strongly
-dependent on the number of processors and on the initial partitioning of the matrix $A$.
-Nevertheless, this algorithm has been chosen for the implementation in MLD2P4,
-since it has been shown to produce good results in practice \cite{Tuminaro_Tong_00}.
-
-The prolongator $P_C=R_C^T$ is built starting from a \emph{tentative prolongator}
-$P \in \Re^{n \times n_C}$, defined as
-\begin{equation} 
-P=(p_{ij}), \quad  p_{ij}= 
-\left\{ \begin{array}{ll}
-1 & \quad \mbox{if} \; i \in V^j_C \\
-0 & \quad \mbox{otherwise}
-\end{array} \right. .
-\label{eq:tent_prol}
-\end{equation}
-$P_C$ is obtained by
-applying to $P$ a smoother $S \in \Re^{n \times n}$:
-\begin{equation}
-P_C = S P,
-\label{eq:smoothed_prol}
-\end{equation}
-in order to remove oscillatory components from the range of the prolongator
-and hence to improve the convergence properties of the multi-level
-Schwarz method \cite{Brezina_Vanek_,StubenGMD69_99}.
-A simple choice for $S$ is the damped Jacobi smoother:
-\begin{equation}
-S = I - \omega D^{-1} A , 
-\label{eq:jac_smoother}
-\end{equation}
-where the value of $\omega$ can be chosen
-using some estimate of the spectral radius of $D^{-1}A$ \cite{Brezina_Vanek}.
-\textbf{Cenno al filtering di $A$ nello smoothing, dicendo che pero' non e' stato
-implementato?}
-
-%%% Local Variables: 
-%%% mode: latex
-%%% TeX-master: "userguide"
-%%% End: 
+\section{Multi-level Domain Decomposition Background\label{sec:background}}
+
+\emph{Domain Decomposition} (DD) preconditioners, coupled with Krylov iterative
+solvers, are widely used in the parallel solution of large and sparse linear systems.
+These preconditioners are based on the divide and conquer technique: the matrix
+to be preconditioned is divided into submatrices, a ``local linear system''
+involving each submatrix is (approximately) solved, and the local solutions are used
+to build a preconditioner for the whole original matrix. This process
+often corresponds to dividing a physical domain associated to the original matrix
+into subdomains, e.g. in a PDE discretization, to (approximately) solving the
+subproblems corresponding to the subdomains and to building an approximate
+solution of the original problem from the local solutions 
+\cite{Cai_Widlund_92,dd1_94,dd2_96}. 
+
+\emph{Additive Schwarz} preconditioners are DD preconditioners using overlapping
+submatrices, i.e.\ with some common rows, to couple the local information
+related to the submatrices (see, e.g., \cite{dd2_96}).
+The main motivations for choosing Additive Schwarz preconditioners are their
+intrinsic parallelism and good \textbf{(dire good e' un po' "`forte"', dato che
+subito dopo diciamo che la convergenza dipende dal numero di sottomatrici)}
+convergence properties. A drawback of these
+preconditioners is that the number of iterations of the preconditioned solvers
+generally grows with the number of submatrices. This may be a serious limitation
+on parallel computers, since the number of submatrices usually matches the number
+of available processors. Optimal convergence rates, i.e.\ iteration numbers
+independent of the number of submatrices, can be obtained by correcting the
+preconditioner through a suitable approximation of the original linear system
+in a coarse space, which globally couples the information related to the single
+submatrices. 
+
+\emph{Two-level Schwarz} preconditioners are obtained
+by combining basic (one-level) Schwarz preconditioners with coarse-level
+corrections. In this context, the one-level preconditioner is often
+called smoother. Different two-level preconditioners are obtained by varying the
+choice of the smoother, of the coarse-level correction and the
+way they are combined \cite{dd2_96}. The same reasoning can be applied starting
+from the coarse-level system, i.e.\ a coarse-space correction can be built
+from this system, thus obtaining \emph{multi-level} preconditioners.
+
+It is worth noting that optimal preconditioners do not necessarily correspond
+to minimum execution times. Indeed, to obtain effective multilevel preconditioners
+a tradeoff between optimality of convergence and the cost of building and applying
+the coarse-space corrections must be achieved. The choice of the number of levels,
+i.e.\ of the coarse-space corrections, also affects the effectiveness of the
+preconditioners. One more goal is to get convergence rates as less sensitive
+as possible to variations in the matrix coefficients.
+
+Two main approaches can be used to build coarse-space corrections. The geometric approach
+applies coarsening strategies based on the knowledge of some physical grid associated
+to the matrix and requires the user to define grid transfer operators from the fine
+to the coarse levels and vice versa. This may result difficult for complex geometries;
+furthermore, suitable one-level preconditioners may be required to get efficient
+interplay between fine and coarse levels, e.g.\ when matrices with highly varying coefficients
+are considered. The algebraic approach builds coarse-space corrections using only matrix
+information. It performs a fully automatic coarsening and enforces the interplay between
+the fine and coarse levels by suitably choosing the coarse space and the coarse-to-fine
+interpolation \cite{StubenGMD69_99}.
+
+MLD2P4 uses a pure algebraic approach for building the sequence of coarse matrices
+starting from the original matrix. The algebraic approach is based on the \emph{smoothed 
+aggregation} algorithm \cite{Brezina_Vanek_,Vanek_Mandel_Brezina_}. A decoupled version
+of this algorithm is implemented, where the smoothed aggregation is applied locally
+to each submatrix \cite{Tuminaro_Tong_00}. In the next two subsections we provide
+a brief description of the multi-level Schwarz preconditioners and on the smoothed
+aggregation technique as implemented in MLD2P4. For further details the user
+is referred to \cite{para_04,apnum_07,aaecc_07,dd2_96}.
+
+
+\subsection{Multi-level Schwarz Preconditioners\label{sec:multilevel}}
+
+The Multilevel preconditioners implemented in MLD2P4 are obtained by combining
+Additive Schwarz preconditioners with coarse-space corrections; therefore
+we first provide a sketch of the Additive Schwarz preconditioners.
+
+Given a linear system
+\[ Ax=b, \]
+where $A=(a_{ij}) \in \Re^{n \times n}$ is a
+nonsingular sparse matrix with a symmetric non-zero pattern,
+let $G=(W,E)$ be the adjacency graph of $A$, where $W=\{1, 2, \ldots, n\}$
+and $E=\{(i,j) : a_{ij} \neq 0\}$ are the vertex set and the edge set of $G$,
+respectively. Two vertices are called adjacent if there is an edge connecting
+them. For any integer $\delta > 0$, a $\delta$-overlap
+partition of $W$ can be defined recursively as follows.
+Given a 0-overlap (or non-overlapping) partition of $W$,
+i.e.\ a set of $m$ disjoint nonempty sets $W_i^0 \subset W$ such that
+$\cup_{i=1}^m W_i^0 = W$, a $\delta$-overlap
+partition of $W$ is obtained by considering the sets
+$W_i^\delta \supset W_i^{\delta-1}$, obtained by including the vertices that
+are adjacent to any vertex in $W_i^{\delta-1}$.
+
+Let $n_i^\delta$ be the size of $W_i^\delta$ and $R_i^{\delta} \in 
+\Re^{n_i^\delta \times n}$ the restriction operator that maps
+a vector $v \in \Re^n$ onto the vector $v_i^{\delta} \in \Re^{n_i^\delta}$
+containing the components of $v$ corresponding to the vertices in
+$W_i^\delta$. The transpose of $R_i^{\delta}$ is a
+prolongation operator from $\Re^{n_i^\delta}$ to $\Re^n$.
+The matrix $A_i^\delta=R_i^\delta A (R_i^\delta)^T \in
+\Re^{n_i^\delta \times n_i^\delta}$ can be considered
+as a restriction of $A$ corresponding to the set $W_i^{\delta}$.
+
+The \emph{classical one-level AS} preconditioner is defined by
+\[
+M_{AS}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T 
+(A_i^\delta)^{-1} R_i^{\delta},
+\]
+where $A_i^\delta$ is assumed to be nonsingular. Its application
+to a vector $v \in \Re^n$ within a Krylov solver requires the following
+three steps:
+\begin{enumerate}
+	\item restriction of $v$ as $v_i = R_i^{\delta} v$, $i=1,\ldots,m$;
+	\item (approximate) solution of the linear systems $A_i^\delta w_i = v_i$,
+	      $i=1,\ldots,m$;
+	\item prolongation and sum of the $w_i$'s, i.e. $w = \sum_{i=1}^m (R_i^{\delta})^T w_i$.
+\end{enumerate}
+A variant of the classical AS preconditioner that outperforms it
+in terms of both convergence rate and of computation and communication
+time on parallel distributed-memory computers is the so-called \emph{Restricted AS
+(RAS)} preconditioner~\cite{Cai_Sarkis,Efstathiou_Gander}. It
+is obtained by zeroing the components of $w_i$ corresponding to the
+overlapping vertices when applying the prolongation. Therefore,
+RAS differs from classical AS by the prolongation operator $(R_i^{\delta})^T$,
+which is substituted by $(\tilde{R}_i^0)^T \in \Re^{n_i^\delta \times n}$,
+where $\tilde{R}_i^0$ obtained by zeroing the rows of $R_i^\delta$
+corresponding to the vertices in $W_i^\delta \backslash W_i^0$:
+\[
+M_{RAS}^{-1}= \sum_{i=1}^m (\tilde{R}_i^0)^T 
+(A_i^\delta)^{-1} R_i^{\delta}.
+\]
+Analogously, the AS variant called \emph{AS with Harmonic extension (ASH)}
+is defined by
+\[ M_{ASH}^{-1}= \sum_{i=1}^m (R_i^{\delta})^T 
+(A_i^\delta)^{-1} \tilde{R}_i^0.
+\]
+We note that for $\delta=0$ the three variants of the AS preconditioner are
+all equal to the block-Jacobi preconditioner.
+
+As already observed, the convergence rate of the one-level Schwarz
+preconditioned iterative solvers deteriorates as the number $m$ of partitions
+of $W$ increases \cite{dd1_94,dd2_96}. To reduce the dependency
+of the number of iterations on the degree of parallelism we may
+introduce a global coupling among the overlapping partitions by defining 
+a coarse-space approximation $A_C$ of the matrix $A$. 
+In a pure algebraic setting, $A_C$ is usually built with
+a Galerkin approach. Given a set $W_C$ of \emph{coarse vertices},
+with size $n_C$, and a suitable restriction operator
+$R_C \in \Re^{n_C \times n}$, $A_C$ is defined as
+\[
+A_C=R_C A R_C^T
+\]
+and the coarse-level correction matrix to be combined with a generic
+one-level AS preconditioner $M_{1L}$ is obtained as
+\[
+M_{C}^{-1}= R_C^T A_C^{-1} R_C,
+\]
+where $A_C$ is assumed to be nonsingular. The application of $M_{C}^{-1}$
+to a vector $v$ corresponds to a restriction, a solution and
+a prolongation step; the solution step, involving the matrix $A_C$,
+may be carried out also approximately.
+
+The combination of $M_{C}$ and $M_{1L}$ may be
+performed in either an additive or a multiplicative framework.
+In the former case, the \emph{two-level additive} Schwarz preconditioner
+is obtained:
+\[
+M_{2LA}^{-1} = M_{C}^{-1} + M_{1L}^{-1}. 
+\]
+Applying $M_{2L-A}^{-1}$ to a vector $v$ within a Krylov solver
+corresponds to applying $M_{C}^{-1}$
+and $M_{1L}^{-1}$ to $v$ independently and then summing up
+the results.
+
+In the multiplicative case, the combination can be
+performed by first applying the smoother $M_{1L}^{-1}$ and then
+the coarse-level correction operator $M_{C}^{-1}$:
+\[
+\begin{array}{l}
+w = M_{1L}^{-1} v, \\
+z = w + M_{C}^{-1} (v-Aw);
+\end{array}
+\]
+this corresponds to the following \emph{two-level hybrid pre-smoothed}
+Schwarz preconditioner:
+\[
+M_{2LH-PRE}^{-1} = M_{C}^{-1} + \left( I - M_{C}^{-1}A \right) M_{1L}^{-1}. 
+\]
+On the other hand, by applying the smoother after the coarse-level correction,
+i.e.\ by computing
+\[
+\begin{array}{l}
+w = M_{C}^{-1} v , \\
+z = w + M_{1L}^{-1} (v-Aw) , 
+\end{array}
+\]
+the \emph{two-level hybrid post-smoothed}
+Schwarz preconditioner is obtained:
+\[
+M_{2LH-POST}^{-1} = M_{1L}^{-1} + \left( I - M_{1L}^{-1}A \right) M_{C}^{-1}. 
+\]
+One more variant of two-level hybrid preconditioner is obtained by applying
+the smoother before and after the coarse-level correction. In this case, the
+preconditioner is symmetric if $A$, $M_{1L}$ and $M_{C}$ are symmetric.
+
+As previously noted, on parallel computers the number of sumatrices usually matches
+the number of available processors. When the size of the system to be preconditioned
+is very large, the use of many proccessors, i.e.\ of many small submatrices, often
+leads to a large coarse-level system, whose solution may be computationally expensive.
+On the other hand, the use of few processors often leads to local sumatrices that
+are too expensive to be processed on single processors, because of memory and/or
+computing requirements. Therefore, it seems natural to use a recursive approach,
+in which the coarse-level correction is re-applied starting from the current
+coarse-level system. The corresponding preconditioners are called \emph{multi-level}.
+One more reason for the multi-level approach is that it may significantly
+reduce the computational cost of preconditioning with respect to the two-level case
+(see \cite[Chapter 3]{dd2_96}). Additive and hybrid multilevel preconditioners
+are obtained as direct extensions of the two-level counterparts. Other combinations
+of the smoothers and coarse-level corrections are possible, leading to variants
+of the previous algorithms. For a detailed descrition of them, the reader is
+referred to \cite[Chapter 3]{dd2_96}.
+\textbf{Secondo me qui ci vorrebbe una descrizione algoritmica, a titolo di esempio,
+di un precondizionatore multilevel, ad esempio quello ibrido con pre-smoothing, sul tipo
+della descrizione in figura 1 della guida di Trilinos ML 4.0. CHE NE PENSATE?}
+
+
+\subsection{Smoothed Aggregation\label{sec:aggregation}}
+
+To define the restriction operator $R_C$, which is used to compute
+the coarse-level matrix $A_C$, MLD2P4 uses the \emph{smoothed aggregation}
+algorithm described in \cite{Brezina_Vanek_,Vanek_Mandel_Brezina_}.
+The basic idea of this algorithm is to build a coarse set of vertices
+$W_C$ by suitably grouping the vertices of $W$ into disjoint subsets
+(aggregates), and to define the coarse-to-fine space transfer operator $R_C^T$ by
+applying a suitable smoother to a simple piecewise constant
+prolongation operator, to improve the quality of the coarse-space correction.
+
+Three main steps can be identified in the smoothed aggregation procedure:
+\begin{itemize}
+	\item coarsening of the vertex set $W$, to obtain $W_C$;
+	\item construction of the prolongator $R_C^T$;
+	\item application of $R_C$ and $R_C^T$ to build $A_C$.
+\end{itemize}
+ 
+To perform the coarsening step, we have implemented the aggregation algorithm sketched
+in \cite{apnum_07}. According to \cite{brezina_vanek}, a modification of this algorithm
+has been actually considered,
+in which each aggregate $N_r$ is made of vertices of $W$ that are \emph{strongly coupled}
+to a certain root vertex $r \in W$, i.e.\
+\[  N_r = \left\{s \in W: |a_{rs}| \geq \theta \sqrt{|a_{rr}a_{ss}|} \right\} \]
+for a given $\theta \in [0,1]$.
+Since the previous algorithm has a sequential nature, a \emph{decoupled} version of
+it has been chosen, where each processor $i$ independently applies the algorithm to
+the set of vertices $W_i^0$ assigned to it in the initial data distribution. This
+version is embarrassingly parallel, since it does not require any data communication.
+On the other hand, it may produce non-uniform aggregates near boundary vertices,
+i.e.\ near vertices adjacent to vertices in other processors, and is strongly
+dependent on the number of processors and on the initial partitioning of the matrix $A$.
+Nevertheless, this algorithm has been chosen for the implementation in MLD2P4,
+since it has been shown to produce good results in practice \cite{Tuminaro_Tong_00}.
+
+The prolongator $P_C=R_C^T$ is built starting from a \emph{tentative prolongator}
+$P \in \Re^{n \times n_C}$, defined as
+\begin{equation} 
+P=(p_{ij}), \quad  p_{ij}= 
+\left\{ \begin{array}{ll}
+1 & \quad \mbox{if} \; i \in V^j_C \\
+0 & \quad \mbox{otherwise}
+\end{array} \right. .
+\label{eq:tent_prol}
+\end{equation}
+$P_C$ is obtained by
+applying to $P$ a smoother $S \in \Re^{n \times n}$:
+\begin{equation}
+P_C = S P,
+\label{eq:smoothed_prol}
+\end{equation}
+in order to remove oscillatory components from the range of the prolongator
+and hence to improve the convergence properties of the multi-level
+Schwarz method \cite{Brezina_Vanek_,StubenGMD69_99}.
+A simple choice for $S$ is the damped Jacobi smoother:
+\begin{equation}
+S = I - \omega D^{-1} A , 
+\label{eq:jac_smoother}
+\end{equation}
+where the value of $\omega$ can be chosen
+using some estimate of the spectral radius of $D^{-1}A$ \cite{Brezina_Vanek}.
+\textbf{Cenno al filtering di $A$ nello smoothing, dicendo che pero' non e' stato
+implementato?}
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "userguide"
+%%% End: 
--- a/docs/pdf/bibliography.tex
+++ b/docs/pdf/bibliography.tex
@ -1,152 +1,152 @@
-\begin{thebibliography}{99}
-
-%
-\bibitem{PARA04FOREST}
-Bella, G., Filippone, S., De Maio, A., Testa, M.:
-A Simulation Model for Forest Fires.
-In: Dongarra, J., Madsen, K., Wasniewski, J. (eds.):
-Proceedings of PARA~04 Workshop on State of the Art
-in Scientific Computing. Lecture Notes in Computer Science, 3732. Berlin:
-Springer, 2005
-%
-\bibitem{aaecc_07} A. Buttari, D. di Serafino, P. D'Ambra, S. Filippone,\newblock
-2LEV-D2P4: a package of high-performance preconditioners,\newblock
-Applicable Algebra in Engineering, Communications and Computing, 
-Volume 18, Number 3, May, 2007, pp.  223-239
-%Published online: 13 February 2007, {\tt http://dx.doi.org/10.1007/s00200-007-0035-z}
-%
-\bibitem{apnum_07}  P. D'Ambra, S. Filippone,  D. Di Serafino\newblock
-On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners
-\newblock
-Applied Numerical Mathematics, Elsevier Science, 
-Volume 57, Issues 11-12, November-December 2007, Pages 1181-1196.
-%published online 3 February 2007, {\tt
-%  http://dx.doi.org/10.1016/j.apnum.2007.01.006}
-
-%% \bibitem{DOUGLAS}
-%% R.E.~Bank and C.C.~Douglas,
-%% {\em SMMP: Sparse Matrix Multiplication Package}, 
-%% Advances in Computational Mathematics, 1993, 1, 127-137.
-%% (See also {\tt http://www.mgnet.org/~douglas/ccd-codes.html}) 
-%
-%
-\bibitem{para_04}
-A.~Buttari, P.~D'Ambra, D.~di Serafino and S.~Filippone,
-{\em Extending PSBLAS to Build Parallel Schwarz Preconditioners},
-in , J.~Dongarra, K.~Madsen, J.~Wasniewski, editors,
-Proceedings of PARA~04 Workshop on State of the Art
-in Scientific Computing, pp.~593--602, Lecture Notes in Computer Science,
-Springer, 2005.
-%
-%% \bibitem{CAI_SAAD}
-%% X.~C.~Cai and Y.~Saad,
-%% {\em Overlapping Domain Decomposition Algorithms for General Sparse Matrices},
-%% Numerical Linear Algebra with Applications, 3(3), pp.~221--237, 1996.
-%% %
-%% \bibitem{CAI_SARKIS}
-%% X.C.~Cai and M.~Sarkis,
-%% {\em A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems},
-%% SIAM Journal on Scientific Computing, 21(2), pp.~792--797, 1999.
-%
-\bibitem{Cai_Widlund_92}
-X.C.~Cai and O.~B.~Widlund,
-{\em Domain Decomposition Algorithms for Indefinite Elliptic Problems},
-SIAM Journal on Scientific and Statistical Computing, 13(1), pp.~243--258, 1992.
-%
-\bibitem{dd1_94}
-T.~Chan and T.~Mathew,
-{\em Domain Decomposition Algorithms},
-in A.~Iserles, editor, Acta Numerica 1994, pp.~61--143, 1994.
-Cambridge University Press.
-%% %
-%% \bibitem{UMFPACK}
-%% T.A.~Davis, 
-%% {\em Algorithm 832: UMFPACK - an Unsymmetric-pattern Multifrontal
-%% Method with a Column Pre-ordering Strategy},
-%% ACM Transactions on Mathematical Software, 30, pp.~196--199, 2004.
-%% (See also {\tt http://www.cise.ufl.edu/~davis/})
-%% %
-%% \bibitem{SUPERLU}
-%% J.W.~Demmel, S.C.~Eisenstat, J.R.~Gilbert, X.S.~Li and J.W.H.~Liu,
-%% A supernodal approach to sparse partial pivoting,
-%% SIAM Journal on Matrix Analysis and Applications, 20(3), pp.~720--755, 1999.
-%
-\bibitem{BLACS}
-J.~J.~Dongarra and R.~C.~Whaley,
-{\em A User's Guide to the BLACS v.~1.1},
-Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of
-Tennessee, March 1995 (updated May 1997).
-%
-\bibitem{sblas_97}
-I.~Duff, M.~Marrone, G.~Radicati and C.~Vittoli,
-{\em Level 3 Basic Linear Algebra Subprograms for Sparse Matrices: 
-a User Level Interface},
-ACM Transactions on Mathematical Software, 23(3), pp.~379--401, 1997.
-%
-\bibitem{sblas_02}
-I.~Duff, M.~Heroux and R.~Pozo,
-{\em An Overview of the Sparse Basic Linear
-Algebra Subprograms: the New Standard from the BLAS Technical Forum},
-ACM Transactions on Mathematical Software, 28(2), pp.~239--267, 2002.
-%
-\bibitem{psblas_00}
-S.~Filippone and M.~Colajanni, 
-{\em PSBLAS: A Library for Parallel Linear Algebra
-Computation on Sparse Matrices},
-\newblock
-ACM Transactions on Mathematical Software, 26(4), pp.~527--550, 2000.
-%
-\bibitem{KIVA3PSBLAS}
-S.~Filippone, P.~D'Ambra, M.~Colajanni,
-{\em Using a Parallel Library of Sparse Linear Algebra in a Fluid Dynamics 
-Applications Code on Linux Clusters},
-in G.~Joubert, A.~Murli, F.~Peters, M.~Vanneschi, editors,
-Parallel Computing - Advances \& Current Issues,
-pp.~441--448, Imperial College Press, 2002. 
-%
-\bibitem{METIS}
-Karypis, G. and Kumar, V.,
-{\em {METIS}: Unstructured Graph Partitioning and Sparse Matrix
-  Ordering System}.
-Minneapolis, MN 55455: University of Minnesota, Department of
-  Computer Science, 1995. 
-Internet Address: {\verb|http://www.cs.umn.edu/~karypis|}.
-\bibitem{BLAS1}
-Lawson, C.,  Hanson, R., Kincaid, D. and Krogh, F.,
-   Basic {L}inear {A}lgebra {S}ubprograms for {F}ortran usage,
-{ACM Trans. Math. Softw.} vol.~{5}, 38--329, 1979.
-
-\bibitem{machiels}
-{Machiels, L. and Deville, M.}
-{\em Fortran 90: An entry to object-oriented programming for the solution
-  of partial differential equations.}
-{ACM Trans. Math. Softw.} vol.~{23}, 32--49.
-\bibitem{metcalf}
-{Metcalf, M., Reid, J. and Cohen, M.}
-{\em Fortran 95/2003 explained.}
-{Oxford University Press}, 2004.
-
-\bibitem{dd2_96}
-B.~Smith, P.~Bjorstad and W.~Gropp,
-{\em Domain Decomposition: Parallel Multilevel Methods for Elliptic
-Partial Differential Equations},
-Cambridge University Press, 1996.
-
-\bibitem{MPI1}
-M.~Snir, S.~Otto, S.~Huss-Lederman, D.~Walker and J.~Dongarra,
-{\em MPI: The Complete Reference. Volume 1 - The MPI Core}, second edition,
-MIT Press, 1998.
-%
-\bibitem{BREZINA_VANEK}
-M.~Brezina and P.~Van{\v e}k,
-{\em A Black-Box Iterative Solver Based on a Two-Level Schwarz Method},
-Computing, 1999, 63, 233-263.
-%
-%
-\bibitem{VANEK_MANDEL_BREZINA}
-P.~Van{\v e}k, J.~Mandel and M.~Brezina,
-{\em Algebraic Multigrid by Smoothed Aggregation for Second and Fourth Order Elliptic Problems},
-Computing, 1996, 56, 179-196.
-%
-
+\begin{thebibliography}{99}
+
+%
+\bibitem{PARA04FOREST}
+Bella, G., Filippone, S., De Maio, A., Testa, M.:
+A Simulation Model for Forest Fires.
+In: Dongarra, J., Madsen, K., Wasniewski, J. (eds.):
+Proceedings of PARA~04 Workshop on State of the Art
+in Scientific Computing. Lecture Notes in Computer Science, 3732. Berlin:
+Springer, 2005
+%
+\bibitem{aaecc_07} A. Buttari, D. di Serafino, P. D'Ambra, S. Filippone,\newblock
+2LEV-D2P4: a package of high-performance preconditioners,\newblock
+Applicable Algebra in Engineering, Communications and Computing, 
+Volume 18, Number 3, May, 2007, pp.  223-239
+%Published online: 13 February 2007, {\tt http://dx.doi.org/10.1007/s00200-007-0035-z}
+%
+\bibitem{apnum_07}  P. D'Ambra, S. Filippone,  D. Di Serafino\newblock
+On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners
+\newblock
+Applied Numerical Mathematics, Elsevier Science, 
+Volume 57, Issues 11-12, November-December 2007, Pages 1181-1196.
+%published online 3 February 2007, {\tt
+%  http://dx.doi.org/10.1016/j.apnum.2007.01.006}
+
+%% \bibitem{DOUGLAS}
+%% R.E.~Bank and C.C.~Douglas,
+%% {\em SMMP: Sparse Matrix Multiplication Package}, 
+%% Advances in Computational Mathematics, 1993, 1, 127-137.
+%% (See also {\tt http://www.mgnet.org/~douglas/ccd-codes.html}) 
+%
+%
+\bibitem{para_04}
+A.~Buttari, P.~D'Ambra, D.~di Serafino and S.~Filippone,
+{\em Extending PSBLAS to Build Parallel Schwarz Preconditioners},
+in , J.~Dongarra, K.~Madsen, J.~Wasniewski, editors,
+Proceedings of PARA~04 Workshop on State of the Art
+in Scientific Computing, pp.~593--602, Lecture Notes in Computer Science,
+Springer, 2005.
+%
+%% \bibitem{CAI_SAAD}
+%% X.~C.~Cai and Y.~Saad,
+%% {\em Overlapping Domain Decomposition Algorithms for General Sparse Matrices},
+%% Numerical Linear Algebra with Applications, 3(3), pp.~221--237, 1996.
+%% %
+%% \bibitem{CAI_SARKIS}
+%% X.C.~Cai and M.~Sarkis,
+%% {\em A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems},
+%% SIAM Journal on Scientific Computing, 21(2), pp.~792--797, 1999.
+%
+\bibitem{Cai_Widlund_92}
+X.C.~Cai and O.~B.~Widlund,
+{\em Domain Decomposition Algorithms for Indefinite Elliptic Problems},
+SIAM Journal on Scientific and Statistical Computing, 13(1), pp.~243--258, 1992.
+%
+\bibitem{dd1_94}
+T.~Chan and T.~Mathew,
+{\em Domain Decomposition Algorithms},
+in A.~Iserles, editor, Acta Numerica 1994, pp.~61--143, 1994.
+Cambridge University Press.
+%% %
+%% \bibitem{UMFPACK}
+%% T.A.~Davis, 
+%% {\em Algorithm 832: UMFPACK - an Unsymmetric-pattern Multifrontal
+%% Method with a Column Pre-ordering Strategy},
+%% ACM Transactions on Mathematical Software, 30, pp.~196--199, 2004.
+%% (See also {\tt http://www.cise.ufl.edu/~davis/})
+%% %
+%% \bibitem{SUPERLU}
+%% J.W.~Demmel, S.C.~Eisenstat, J.R.~Gilbert, X.S.~Li and J.W.H.~Liu,
+%% A supernodal approach to sparse partial pivoting,
+%% SIAM Journal on Matrix Analysis and Applications, 20(3), pp.~720--755, 1999.
+%
+\bibitem{BLACS}
+J.~J.~Dongarra and R.~C.~Whaley,
+{\em A User's Guide to the BLACS v.~1.1},
+Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of
+Tennessee, March 1995 (updated May 1997).
+%
+\bibitem{sblas_97}
+I.~Duff, M.~Marrone, G.~Radicati and C.~Vittoli,
+{\em Level 3 Basic Linear Algebra Subprograms for Sparse Matrices: 
+a User Level Interface},
+ACM Transactions on Mathematical Software, 23(3), pp.~379--401, 1997.
+%
+\bibitem{sblas_02}
+I.~Duff, M.~Heroux and R.~Pozo,
+{\em An Overview of the Sparse Basic Linear
+Algebra Subprograms: the New Standard from the BLAS Technical Forum},
+ACM Transactions on Mathematical Software, 28(2), pp.~239--267, 2002.
+%
+\bibitem{psblas_00}
+S.~Filippone and M.~Colajanni, 
+{\em PSBLAS: A Library for Parallel Linear Algebra
+Computation on Sparse Matrices},
+\newblock
+ACM Transactions on Mathematical Software, 26(4), pp.~527--550, 2000.
+%
+\bibitem{KIVA3PSBLAS}
+S.~Filippone, P.~D'Ambra, M.~Colajanni,
+{\em Using a Parallel Library of Sparse Linear Algebra in a Fluid Dynamics 
+Applications Code on Linux Clusters},
+in G.~Joubert, A.~Murli, F.~Peters, M.~Vanneschi, editors,
+Parallel Computing - Advances \& Current Issues,
+pp.~441--448, Imperial College Press, 2002. 
+%
+\bibitem{METIS}
+Karypis, G. and Kumar, V.,
+{\em {METIS}: Unstructured Graph Partitioning and Sparse Matrix
+  Ordering System}.
+Minneapolis, MN 55455: University of Minnesota, Department of
+  Computer Science, 1995. 
+Internet Address: {\verb|http://www.cs.umn.edu/~karypis|}.
+\bibitem{BLAS1}
+Lawson, C.,  Hanson, R., Kincaid, D. and Krogh, F.,
+   Basic {L}inear {A}lgebra {S}ubprograms for {F}ortran usage,
+{ACM Trans. Math. Softw.} vol.~{5}, 38--329, 1979.
+
+\bibitem{machiels}
+{Machiels, L. and Deville, M.}
+{\em Fortran 90: An entry to object-oriented programming for the solution
+  of partial differential equations.}
+{ACM Trans. Math. Softw.} vol.~{23}, 32--49.
+\bibitem{metcalf}
+{Metcalf, M., Reid, J. and Cohen, M.}
+{\em Fortran 95/2003 explained.}
+{Oxford University Press}, 2004.
+
+\bibitem{dd2_96}
+B.~Smith, P.~Bjorstad and W.~Gropp,
+{\em Domain Decomposition: Parallel Multilevel Methods for Elliptic
+Partial Differential Equations},
+Cambridge University Press, 1996.
+
+\bibitem{MPI1}
+M.~Snir, S.~Otto, S.~Huss-Lederman, D.~Walker and J.~Dongarra,
+{\em MPI: The Complete Reference. Volume 1 - The MPI Core}, second edition,
+MIT Press, 1998.
+%
+\bibitem{BREZINA_VANEK}
+M.~Brezina and P.~Van{\v e}k,
+{\em A Black-Box Iterative Solver Based on a Two-Level Schwarz Method},
+Computing, 1999, 63, 233-263.
+%
+%
+\bibitem{VANEK_MANDEL_BREZINA}
+P.~Van{\v e}k, J.~Mandel and M.~Brezina,
+{\em Algebraic Multigrid by Smoothed Aggregation for Second and Fourth Order Elliptic Problems},
+Computing, 1996, 56, 179-196.
+%
+
 \end{thebibliography}
--- a/docs/pdf/building.tex
+++ b/docs/pdf/building.tex
@ -1,7 +1,7 @@
-\section{Configuring and Building MLD2P4\label{sec:configuring}}
-    - uso di GNU autoconf e automake \\
-    - software di base necessario (MPI, BLACS, BLAS, PSBLAS - specificare versioni) \\
-    - software opzionale (UMFPACK, SuperLU, SuperLUdist - specificare versioni e opzioni di configure) \\
-    - sistemi operativi e compilatori su cui MLD2P4 e' stato costruito con successo \\
-    - sono previste opzioni di configurazione per il debugging o per il profiling? \\
-    - albero delle directory \\
+\section{Configuring and Building MLD2P4\label{sec:configuring}}
+    - uso di GNU autoconf e automake \\
+    - software di base necessario (MPI, BLACS, BLAS, PSBLAS - specificare versioni) \\
+    - software opzionale (UMFPACK, SuperLU, SuperLUdist - specificare versioni e opzioni di configure) \\
+    - sistemi operativi e compilatori su cui MLD2P4 e' stato costruito con successo \\
+    - sono previste opzioni di configurazione per il debugging o per il profiling? \\
+    - albero delle directory \\
--- a/docs/pdf/conventions.tex
+++ b/docs/pdf/conventions.tex
@ -1,6 +1,6 @@
-\section{Notational Conventions\label{sec:conventions}}
-    - caratteri tipografici usati nella guida (vedi guida ML recente e guida Aztec) \\
-    - convenzioni sui nomi di routine (differenza tra high-level e medium-level),
-      strutture dati,\\
-      moduli, costanti, etc. (vedi guida psblas) \\
+\section{Notational Conventions\label{sec:conventions}}
+    - caratteri tipografici usati nella guida (vedi guida ML recente e guida Aztec) \\
+    - convenzioni sui nomi di routine (differenza tra high-level e medium-level),
+      strutture dati,\\
+      moduli, costanti, etc. (vedi guida psblas) \\
    - versione reale e complessa\\
--- a/docs/pdf/distribution.tex
+++ b/docs/pdf/distribution.tex
@ -1,41 +1,42 @@
-\section{Code Distribution\label{sec:distribution}}
-
-The MLD2P4 is freely distributable under the following copyright
-terms:
-\begin{verbatim} 
-                         MLD2P4  version 1.0
-MultiLevel Domain Decomposition Parallel Preconditioners Package
-           based on PSBLAS (Parallel Sparse BLAS version 2.3)
-
-(C) Copyright 2008
-
-                    Salvatore Filippone  University of Rome Tor Vergata       
-                    Alfredo Buttari      University of Rome Tor Vergata
-                    Pasqua D'Ambra       ICAR-CNR, Naples
-                    Daniela di Serafino  Second University of Naples
-
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions
-are met:
-  1. Redistributions of source code must retain the above copyright
-     notice, this list of conditions and the following disclaimer.
-  2. Redistributions in binary form must reproduce the above copyright
-     notice, this list of conditions, and the following disclaimer in the
-     documentation and/or other materials provided with the distribution.
-  3. The name of the MLD2P4 group or the names of its contributors may
-     not be used to endorse or promote products derived from this
-     software without specific written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
-TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS
-BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGE.
-\end{verbatim}
+\section{Code Distribution\label{sec:distribution}}
+
+The MLD2P4 is freely distributable under the following copyright
+terms: {\small
+\begin{verbatim} 
+                         MLD2P4  version 1.0
+MultiLevel Domain Decomposition Parallel Preconditioners Package
+           based on PSBLAS (Parallel Sparse BLAS version 2.3)
+
+(C) Copyright 2008
+
+                    Salvatore Filippone  University of Rome Tor Vergata       
+                    Alfredo Buttari      University of Rome Tor Vergata
+                    Pasqua D'Ambra       ICAR-CNR, Naples
+                    Daniela di Serafino  Second University of Naples
+
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+  1. Redistributions of source code must retain the above copyright
+     notice, this list of conditions and the following disclaimer.
+  2. Redistributions in binary form must reproduce the above copyright
+     notice, this list of conditions, and the following disclaimer in the
+     documentation and/or other materials provided with the distribution.
+  3. The name of the MLD2P4 group or the names of its contributors may
+     not be used to endorse or promote products derived from this
+     software without specific written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS
+BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+\end{verbatim}
+}
--- a/docs/pdf/errors.tex
+++ b/docs/pdf/errors.tex
@ -1,9 +1,9 @@
-\section{Error Handling}\label{sec:errors}
-
-Error handling
-    - Breve descrizione con rinvio alla guida di PSBLAS
-
-%%% Local Variables: 
-%%% mode: latex
-%%% TeX-master: "userguide"
-%%% End: 
+\section{Error Handling}\label{sec:errors}
+
+Error handling
+    - Breve descrizione con rinvio alla guida di PSBLAS
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "userguide"
+%%% End: 
--- a/docs/pdf/gettingstarted.tex
+++ b/docs/pdf/gettingstarted.tex
@ -1,224 +1,231 @@
-\section{Getting Started\label{sec:started}}
-
-We describe the basics for building and applying MLD2P4 one-level and multi-level
-Schwarz preconditioners with the Krylov solvers included in PSBLAS \cite{}.
-The following five steps are required:
-\begin{enumerate}
-\item \emph{Allocate and initialize the preconditioner data structure, according to
-	a preconditioner type chosen by the user}. This is performed by the routine
-	\verb|mld_precinit|, which also sets a default preconditioner for each preconditioner
-	type selected by the user. The default preconditioner associated to each preconditioner
-	type is listed in Table~\ref{tab:precinit}; the string used by \verb|mld_precinit|
-	to identify each preconditioner type is also given. The preconditioner data structure is
-	the derived data type \verb|mld_prec_type|, which is accessed to the user only
-	through the MLD2P4 routines.
-\item \emph{Choose a specific variant of the selected preconditioner type, by setting
-  the preconditioner parameters.} This is performed by the routine \verb|mld_precset|.
-  A few examples concerning the use of \verb|mld_precset| are given in 
-  Sections~\ref{sec:example1} and \ref{sec:example1}; a complete list of all the
-  preconditioner parameters and their allowed values is provided in 
-  Section~\ref{sec:highlevel}. 
-\item \emph{Build the preconditioner for a given matrix.} This is performed by
-  the routine \verb|mld_precbld|.
-\item \emph{Apply the preconditioner at each iteration of a Krylov solver.}
-  This is performed by the routine \verb|mld_precaply|. When using the PSBLAS Krylov solvers,
-  this step is completely transparent to the user, since \verb|mld_precaply| is called
-  by the PSBLAS routine implementing the Krylov solver (\verb|psb_krylov|).
-\item \emph{Deallocate the preconditioner data structure}. This is performed by
-  the routine \verb|mld_precfree|. This step is complementary to step 1 and should
-  be performed when the preconditioner is no more used.
-\end{enumerate}
-A detailed description of the above routines is given in Section~\ref{sec:highlevel}.
-
-Note that the Fortran 95 module \verb|mld_prec_mod| must be used in the program
-calling the MLD2P4 routines. Furthermore, to apply MLD2P4 with the Krylov solvers
-from PSBLAS, the module \verb|psb_krylov_mod| must be used too.
-
-Two simple example programs showing the (basic) use of MLD2P4 are reported in
-Section~\ref{sec:examples}.
-
-\begin{table}[th]
-{
-\begin{center}
-\begin{tabular}{|l|l|p{6.7cm}|}
-\hline
-Type              & String & Default preconditioner \\ \hline
-No preconditioner &'NOPREC'& (Considered only to use the PSBLAS
-                             Krylov solvers with no preconditioner.) \\
-Diagonal          & 'DIAG' & --- \\
-Block Jacobi      & 'BJAC' & ILU(0) on the local blocks.\\ 
-Additive Schwarz  & 'AS'   & Restricted Additive Schwarz (RAS),
-                             with overlap 1 and ILU(0) on the local blocks. \\ 
-Multilevel        &'ML'    & Multi-level hybrid preconditioner (additive on the
-                             same level and multiplicative through the levels),
-                             with post-smoothing only. Number of levels: 2;
-                             post-smoother: block-Jacobi preconditioner, with ILU(0)
-                             on the local blocks; coarsest matrix: distributed among the
-                             processors; corase-level solver: 4 sweeps of the
-                             block-Jacobi solver, with ILU(0) on the blocks. \\
-\hline
-\end{tabular}
-\end{center}
-}
-\caption{Preconditioner types and default choices.\label{tab:precinit}}
-\end{table}
-
-\subsection{Examples\label{sec:examples}}
-
-The simple code reported below shows how to set and apply the MLD2P4 default multi-level
-preconditioned, i.e.\ the two-level hybrid post-smoothed Schwarz preconditioner, using block-Jacobi with ILU(0) on the blocks as basic preconditioner,
-a coarse matrix distributed among the processors, and four block-Jacobi sweeps with ILU(0) on the blocks as approximate coarse-level solver. The choice of this preconditioner is made
-by simply specifying \verb|'ML'| as second argument of \verb|mld_precinit|
-(a call to \verb|mld_precset| is not needed).
-The preconditioner is applied within the BiCGSTAB solver provided by PSBLAS. 
-
-The part of the code concerning the
-reading and assembling of the sparse matrix and the right-hand side vector, performed
-through the PSBLAS routines for sparse matrix and vector management, is not reported
-here for brevity. Other statements concerning the use of PSBLAS are neglected too.
-The complete code can be found in the example program file \verb|example_2lev_default.f90|
-in the directory \textbf{XXXXXX (SPECIFICARE).} Note that the modules \verb|psb_base_mod|
-and \verb|psb_util_mod| at the beginning of the code are required by PSBLAS.
-For details on the use of the PSBLAS routines, see the PSBLAS User's Guide \cite{}.
-
-\begin{verbatim}
-  use psb_base_mod
-  use psb_util_mod 
-  use mld_prec_mod
-  use psb_krylov_mod
-... ...
-!
-! sparse matrix
-  type(psb_dspmat_type) :: A
-! sparse matrix descriptor
-  type(psb_desc_type)   :: DESC_A
-! preconditioner
-  type(mld_prec_type)  :: PRE
-... ...
-!
-! initialize the parallel environment
-  call psb_init(ictxt)
-  call psb_info(ictxt,iam,np)
-... ...
-!
-! read and assemble the matrix A and the right-hand
-! side b using PSBLAS routines for sparse matrix /
-! vector management
-... ...
-!
-! initialize the default multi-level preconditioner
-! (two-level hybrid post-smoothed Schwarz)
-  call mld_precinit(PRE,'ML',info)
-!
-! build the preconditioner
-  call psb_precbld(A,PRE,DESC_A,info)
-!
-! set the solver parameters and the initial guess
-  ... ...
-!
-! solve Ax=b with preconditioned BiCGSTAB
-  call psb_krylov('BICGSTAB',A,PRE,b,x,tol,DESC_A,info)
-  ... ...
-!
-! cleanup the preconditioner
-  call mld_precfree(PRE,info)
-!
-! cleanup other data structures
-  ... ...
-!
-! exit the parallel environment
-  call psb_exit(ictxt)
-  stop
-\end{verbatim}
-
-
-\textbf{MODIFICARE TUTTA LA PARTE CHE SEGUE:\\
- solo istruzioni diverse dall'esempio precedente (essenzialmente il setting del precondizionatore, magari con piu' chiamate a precset;\\
- lasciare l'osservazione sulla specifica esplicita del numero di livelli;\\
- rimandare al paragrafo successivo per una decrizione accurata di tutti i parametri;\\
- lasciare l'osservazione sui vecchi utenti di PSBLAS.}\\
-
-In the following we describe the general procedure for setting and building one of the MLD2P4 preconditioners.
-The user has first to prepare the preconditioner data structure by using the routine \verb|mld_precinit|. Input parameters
-for this routine include a string parameter, needed to define the preconditioner type, and an optional integer parameter
-specifying the number of the levels in the case of a multi-level preconditioner.
-Note that if the optional parameter is not present and a multi-level preconditioner has been chosen,
-a two-level preconditioner is set. On the other hand, the integer parameter is ignored if the type of the preconditioner is not multilevel.
-In Table \ref{tab:precinit} we report both the possible choices for the preconditioner type
-and the related default preconditioners. 
-
-
-The user of MLD2P4 may set a lot of parameters for one-level and multi-level Schwarz, in order
-to define a different preconditioner than that of default choices. The parameters
-can be set through the routine \verb|mld_precset|. The APIs of \verb|mld_precinit| and  \verb|mld_precset| as well as the complete 
-list of the parameters that can be set with the corresponding allowed values are reported in Section \ref{sec:highlevel}. In the following a simple code
-for a three-level hybrid post-smoothed Schwarz preconditioner, using RAS with overlap 1 as local preconditioner,
-with ILU(0) on the local blocks, a distributed coarse matrix, four block-Jacobi sweeps with the UMFPACK LU
-factorization on the blocks as coarse-matrix solver, is reported. Note that for the multi-level preconditioners, the levels are numbered in increasing
-order starting from the finest one, i.e. level 1 is the finest level. 
-For more details, see the test program \verb|example2.f90| in xxxx(directory dei test).\\[0.5cm]
-
-\begin{verbatim}
-  use psb_base_mod
-  use psb_util_mod 
-  use mld_prec_mod
-  use psb_krylov_mod
-... ...
-!
-! sparse matrix
-  type(psb_dspmat_type) :: A
-! sparse matrix descriptor
-  type(psb_desc_type)   :: DESC_A
-! preconditioner data
-  type(mld_dprec_type)  :: PRE
-... ...
-!
-! initialization of the parallel environment
-
-  call psb_init(ictxt)
-  call psb_info(ictxt,iam,np)
-... ...
-! read and assemble the matrix A and the right-hand
-! side vector b using PSBLAS routines for sparse
-! matrix/vector management
-... ...
-! prepare the three-level hybrid post-smoothed Schwarz
-! using RAS with overlap 1 as local preconditioner
-!
-  call mld_precinit(PRE,'ML',info,nlev=3)
-  call mld_precset(PRE,mld_n_ovr_,novr=1,info,ilev=1)
-  call mld_precset(PRE,mld_sub_restr_,psb_halo_,info,ilev=1)
-NOTA: e' PROPRIO BRUTTO "PSB_HALO_", BISOGNEREBBE AVERE COSTANTI CHE HANNO IL PREFISSO MLD!
-!
-! build preconditioner
-  call psb_precbld(A,PRE,DESC_A,info)
-!
-! set solver parameters and initial guess
-  ... ...
-! solve Ax=b with preconditioned BiCGSTAB
-
-  call psb_krylov('BICGSTAB',A,PRE,b,x,tol,DESC_A,info)
-  ... ...
-!  
-!  cleanup storage and exit
-!
-  call mld_precfree(PRE,info)
-!
-  call psb_gefree(b,DESC_A,info)
-  call psb_gefree(x,DESC_A,info)
-  call psb_spfree(A,DESC_A,info)
-  call psb_cdfree(DESC_A,info)
-!
-  call psb_exit(ictxt)
-  stop
-
-\end{verbatim}
-
-{\bf Remark for users with PSBLAS-based legacy codes:} when MLD2P4 is installed, a PSBLAS user, with a PSBLAS-based legacy code 
-calling base preconditioners included in PSBLAS (NOPREC, DIAG and BJAC), is able to use the same preconditioners without changes to the code, if she/he
-includes in her/his program the file \verb|psb_prec_mod|.
-
-%%% Local Variables: 
-%%% mode: latex
-%%% TeX-master: "userguide"
-%%% End: 
+\section{Getting Started\label{sec:started}}
+
+We describe the basics for building and applying MLD2P4 one-level and multi-level
+Schwarz preconditioners with the Krylov solvers included in PSBLAS \cite{}.
+The following steps are required:
+\begin{enumerate} 
+\item \emph{Declare the preconditioner data structure}. It is a derived data type,
+  \verb|mld_|\emph{x}\verb|prec_type|,where \emph{x} may be \verb|s|, \verb|d|, \verb|c|
+	or \verb|z|, according to the basic data type of the sparse matrix
+	(\verb|s| = real single precision; \verb|s| = real double precision;
+	\verb|c| = complex single precision; \verb|z| = complex double precision).
+	This data structure is accessed by the user only through the MLD2P4 routines,
+	following an object-oriented approach.
+\item \emph{Allocate and initialize the preconditioner data structure, according to
+	a preconditioner type chosen by the user}. This is performed by the routine
+	\verb|mld_precinit|, which also sets a default preconditioner for each preconditioner
+	type selected by the user. The default preconditioner associated to each preconditioner
+	type is listed in Table~\ref{tab:precinit}; the string used by \verb|mld_precinit|
+	to identify each preconditioner type is also given.
+\item \emph{Choose a specific preconditioner within the selected preconditioner type, by setting
+  the preconditioner parameters.} This is performed by the routine \verb|mld_precset|.
+  A few examples concerning the use of \verb|mld_precset| are given in 
+  Section~\ref{sec:examples}; a complete list of all the
+  preconditioner parameters and their allowed values is provided in 
+  Section~\ref{sec:highlevel}. 
+\item \emph{Build the preconditioner for a given matrix.} This is performed by
+  the routine \verb|mld_precbld|.
+\item \emph{Apply the preconditioner at each iteration of a Krylov solver.}
+  This is performed by the routine \verb|mld_precaply|. When using the PSBLAS Krylov solvers,
+  this step is completely transparent to the user, since \verb|mld_precaply| is called
+  by the PSBLAS routine implementing the Krylov solver (\verb|psb_krylov|).
+\item \emph{Deallocate the preconditioner data structure}. This is performed by
+  the routine \verb|mld_precfree|. This step is complementary to step 1 and should
+  be performed when the preconditioner is no more used.
+\end{enumerate}
+A detailed description of the above routines is given in Section~\ref{sec:highlevel}.
+
+Note that the Fortran 95 module \verb|mld_prec_mod| must be used in the program
+calling the MLD2P4 routines. Furthermore, to apply MLD2P4 with the Krylov solvers
+from PSBLAS, the module \verb|psb_krylov_mod| must be used too.
+
+Examples showing the basic use of MLD2P4 are reported in Section~\ref{sec:examples}.
+
+\begin{table}[th]
+{
+\begin{center}
+\begin{tabular}{|l|l|p{6.7cm}|}
+\hline
+Type              & String        & Default preconditioner \\ \hline
+No preconditioner &\verb|'NOPREC'|& (Considered only to use the PSBLAS
+                                    Krylov solvers with no preconditioner.) \\
+Diagonal          & \verb|'DIAG'| & --- \\
+Block Jacobi      & \verb|'BJAC'| & Block Jacobi with ILU(0) on the local blocks.\\ 
+Additive Schwarz  & \verb|'AS'|   & Restricted Additive Schwarz (RAS),
+                                    with overlap 1 and ILU(0) on the local blocks. \\ 
+Multilevel        &\verb|'ML'|    & Multi-level hybrid preconditioner (additive on the
+                                    same level and multiplicative through the levels),
+                                    with post-smoothing only. Number of levels: 2;
+                                    post-smoother: block-Jacobi preconditioner with ILU(0)
+                                    on the local blocks; coarsest matrix: distributed among the
+                                    processors; corase-level solver: 4 sweeps of the
+                                    block-Jacobi solver, with ILU(0) on the blocks. \\
+\hline
+\end{tabular}
+\end{center}
+}
+\caption{Preconditioner types and default choices.\label{tab:precinit}}
+\end{table}
+
+\subsection{Examples\label{sec:examples}}
+
+The code reported below shows how to set and apply the MLD2P4 default multi-level
+preconditioned, i.e.\ the two-level hybrid post-smoothed Schwarz preconditioner,
+using block-Jacobi with ILU(0) on the blocks as basic preconditioner,
+a coarse matrix distributed among the processors, and four block-Jacobi
+sweeps with ILU(0) on the blocks as approximate coarse-level solver.
+The choice of this preconditioner is made
+by simply specifying \verb|'ML'| as second argument of \verb|mld_precinit|
+(a call to \verb|mld_precset| is not needed).
+The preconditioner is applied within the BiCGSTAB solver provided by PSBLAS. 
+
+The part of the code concerning the
+reading and assembling of the sparse matrix and the right-hand side vector, performed
+through the PSBLAS routines for sparse matrix and vector management, is not reported
+here for brevity. Other statements concerning the use of PSBLAS are neglected too.
+The complete code can be found in the example program file \verb|example_2lev_default.f90|
+in the directory \textbf{XXXXXX (SPECIFICARE).} Note that the modules \verb|psb_base_mod|
+and \verb|psb_util_mod| at the beginning of the code are required by PSBLAS.
+For details on the use of the PSBLAS routines, see the PSBLAS User's Guide \cite{}.
+
+\begin{verbatim}
+  use psb_base_mod
+  use psb_util_mod 
+  use mld_prec_mod
+  use psb_krylov_mod
+... ...
+!
+! sparse matrix
+  type(psb_dspmat_type) :: A
+! sparse matrix descriptor
+  type(psb_desc_type)   :: DESC_A
+! preconditioner
+  type(mld_dprec_type)  :: PRE
+... ...
+!
+! initialize the parallel environment
+  call psb_init(ictxt)
+  call psb_info(ictxt,iam,np)
+... ...
+!
+! read and assemble the matrix A and the right-hand
+! side b using PSBLAS routines for sparse matrix /
+! vector management
+... ...
+!
+! initialize the default multi-level preconditioner
+! (two-level hybrid post-smoothed Schwarz)
+  call mld_precinit(PRE,'ML',info)
+!
+! build the preconditioner
+  call psb_precbld(A,PRE,DESC_A,info)
+!
+! set the solver parameters and the initial guess
+  ... ...
+!
+! solve Ax=b with preconditioned BiCGSTAB
+  call psb_krylov('BICGSTAB',A,PRE,b,x,tol,DESC_A,info)
+  ... ...
+!
+! cleanup the preconditioner
+  call mld_precfree(PRE,info)
+!
+! cleanup other data structures
+  ... ...
+!
+! exit the parallel environment
+  call psb_exit(ictxt)
+  stop
+\end{verbatim}
+
+
+\textbf{MODIFICARE TUTTA LA PARTE CHE SEGUE:\\
+- solo istruzioni diverse dall'esempio precedente (essenzialmente il setting del precondizionatore, magari con piu' chiamate a precset;\\
+- lasciare l'osservazione sulla specifica esplicita del numero di livelli;\\
+- rimandare al paragrafo successivo per una decrizione accurata di tutti i parametri;\\
+- lasciare l'osservazione sui vecchi utenti di PSBLAS.}\\
+
+In the following we describe the general procedure for setting and building one of the MLD2P4 preconditioners.
+The user has first to prepare the preconditioner data structure by using the routine \verb|mld_precinit|. Input parameters
+for this routine include a string parameter, needed to define the preconditioner type, and an optional integer parameter
+specifying the number of the levels in the case of a multi-level preconditioner.
+Note that if the optional parameter is not present and a multi-level preconditioner has been chosen,
+a two-level preconditioner is set. On the other hand, the integer parameter is ignored if the type of the preconditioner is not multilevel.
+In Table \ref{tab:precinit} we report both the possible choices for the preconditioner type
+and the related default preconditioners. 
+
+
+The user of MLD2P4 may set a lot of parameters for one-level and multi-level Schwarz, in order
+to define a different preconditioner than that of default choices. The parameters
+can be set through the routine \verb|mld_precset|. The APIs of \verb|mld_precinit| and  \verb|mld_precset| as well as the complete 
+list of the parameters that can be set with the corresponding allowed values are reported in Section \ref{sec:highlevel}. In the following a simple code
+for a three-level hybrid post-smoothed Schwarz preconditioner, using RAS with overlap 1 as local preconditioner,
+with ILU(0) on the local blocks, a distributed coarse matrix, four block-Jacobi sweeps with the UMFPACK LU
+factorization on the blocks as coarse-matrix solver, is reported. Note that for the multi-level preconditioners, the levels are numbered in increasing
+order starting from the finest one, i.e. level 1 is the finest level. 
+For more details, see the test program \verb|example2.f90| in xxxx(directory dei test).\\[0.5cm]
+
+\begin{verbatim}
+  use psb_base_mod
+  use psb_util_mod 
+  use mld_prec_mod
+  use psb_krylov_mod
+... ...
+!
+! sparse matrix
+  type(psb_dspmat_type) :: A
+! sparse matrix descriptor
+  type(psb_desc_type)   :: DESC_A
+! preconditioner data
+  type(mld_dprec_type)  :: PRE
+... ...
+!
+! initialization of the parallel environment
+
+  call psb_init(ictxt)
+  call psb_info(ictxt,iam,np)
+... ...
+! read and assemble the matrix A and the right-hand
+! side vector b using PSBLAS routines for sparse
+! matrix/vector management
+... ...
+! prepare the three-level hybrid post-smoothed Schwarz
+! using RAS with overlap 1 as local preconditioner
+!
+  call mld_precinit(PRE,'ML',info,nlev=3)
+  call mld_precset(PRE,mld_n_ovr_,novr=1,info,ilev=1)
+  call mld_precset(PRE,mld_sub_restr_,psb_halo_,info,ilev=1)
+NOTA: e' PROPRIO BRUTTO "PSB_HALO_", BISOGNEREBBE AVERE COSTANTI CHE HANNO IL PREFISSO MLD!
+!
+! build preconditioner
+  call psb_precbld(A,PRE,DESC_A,info)
+!
+! set solver parameters and initial guess
+  ... ...
+! solve Ax=b with preconditioned BiCGSTAB
+
+  call psb_krylov('BICGSTAB',A,PRE,b,x,tol,DESC_A,info)
+  ... ...
+!  
+!  cleanup storage and exit
+!
+  call mld_precfree(PRE,info)
+!
+  call psb_gefree(b,DESC_A,info)
+  call psb_gefree(x,DESC_A,info)
+  call psb_spfree(A,DESC_A,info)
+  call psb_cdfree(DESC_A,info)
+!
+  call psb_exit(ictxt)
+  stop
+
+\end{verbatim}
+
+{\bf Remark for users with PSBLAS-based legacy codes:} when MLD2P4 is installed, a PSBLAS user, with a PSBLAS-based legacy code 
+calling base preconditioners included in PSBLAS (NOPREC, DIAG and BJAC), is able to use the same preconditioners without changes to the code, if she/he
+includes in her/his program the file \verb|psb_prec_mod|.
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "userguide"
+%%% End: 
--- a/docs/pdf/listofroutines.tex
+++ b/docs/pdf/listofroutines.tex
@ -1,10 +1,10 @@
-\section{List of Routines}\label{sec:routines}
-
-   Elenco (ordine alfabetico) di tutte le routine, con rinvio (ipertestuale e num. pag.) alla descrizione
-     di ciascuna in qualche paragrafo precedente
-     (una specie di indice analitico, che rimanda alle routine descritte precedentemente nei rispettivi paragrafi)
-
-%%% Local Variables: 
-%%% mode: latex
-%%% TeX-master: "userguide"
-%%% End: 
+\section{List of Routines}\label{sec:routines}
+
+   Elenco (ordine alfabetico) di tutte le routine, con rinvio (ipertestuale e num. pag.) alla descrizione
+     di ciascuna in qualche paragrafo precedente
+     (una specie di indice analitico, che rimanda alle routine descritte precedentemente nei rispettivi paragrafi)
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "userguide"
+%%% End: 
--- a/docs/pdf/overview.tex
+++ b/docs/pdf/overview.tex
@ -1,62 +1,62 @@
-\section{General Overview\label{sec:overview}}
-
-The \emph{Multi-Level Domain Decomposition Parallel Preconditioners Package based on
-PSBLAS (MLD2P4}) provides various versions of multi-level Schwarz preconditioners~\cite{DD2},
-to be used in the iterative solutions of sparse linear systems $Ax=b$, where
-$A$ is a square, real or complex, sparse matrix with a symmetric sparsity pattern.
-\textbf{Ma non abbiamo detto che, se il pattern di sparista' non e' simmetrico,
-lavoriamo su $(A+A^T)/2$? Ma questo vale solo per l'aggregazione? Dovremmo fare
-qualcosa di consistente anche con 1-lev Schwarz.}
-Both additive and hybrid preconditioners, i.e.\ multiplicative among the levels
-and additive inside a level, are implemented; the basic additive Schwarz preconditioners
-are obtained by considering only one level. A purely algebraic approach is used to
-generate a sequence of coarse-level corrections to a basic preconditioner, without
-explicitly using any information on the geometry of the original problem (e.g.\ the
-discretization of a PDE). The smoothed aggregation technique is applied
-as algebraic coarsening strategy~\cite{}.
-
-The package is written in Fortran~95, using object-oriented techniques,
-and is based on a distributed-memory parallel programming paradigm. \textbf{SALVATORE,
-potresti aggiungere due righe sulla scelta del Fortran 95 e sul semplice interfacciamento
-con i legacy codes, senza ripetere quello che e' detto sotto sulla scelta di PSBLAS?}
-Single and double precision implementations of MLD2P4 are available for both the
-real and the complex case, that can be used through a single interface.
-\textbf{SALVATORE, funziona tutto?}
-
-MLD2P4 has been designed to implement scalable and easy-to-use multilevel preconditioners
-in the context of the PSBLAS (Parallel Sparse BLAS) computational framework~\cite{}.
-PSBLAS is a library originally developed to address the parallel implementation of
-iterative solvers for sparse linear system, by providing basic linear algebra
-operators and data management facilities for distributed sparse matrices; it
-also includes parallel Krylov solvers, built on the top of the basic PSBLAS kernels.
-The preconditioners available in MLD2P4 can be used with these Krylov solvers.
-The choice of PSBLAS has been mainly motivated by the need of having
-a portable and efficient software infrastructure implementing ``de facto'' standard
-parallel sparse linear algebra kernels, to pursue goals such as performance,
-portability, modularity ed extensibility in the development of the preconditioner
-package. On the other hand, the implementation of MLD2P4 has led to some
-revisions and extentions of the PSBLAS kernels, leading to the
-recent PSBLAS 2.0 version~\cite{}. The inter-process comunication required
-by MLD2P4 is encapsulated into the PSBLAS routines, except few cases where
-MPI~\cite{} is explicitly called. Therefore, MLD2P4 can be run on any parallel
-machine where PSBLAS and MPI implementations are available.
-
-MLD2P4 has a layered and modular software architecture where three main layers can be identified. The lower layer consists of the PSBLAS kernels, the middle one implements
-the construction and application phases of the preconditioners, and the upper one
-provides a uniform and easy-to-use interface to all the preconditioners. 
-This architecture allows for different levels of use of the package:
-few black-box routines at the upper level allow non-expert users to easily
-build any preconditioner available in MLD2P4 and to apply it within a PSBLAS Krylov solver.
-On the other hand, the routines of the middle and lower layer can be used and extended
-by expert users to build new versions of multi-level Schwarz preconditioners.\\
-
-\textbf{Organizzazione della guida:\\
-dire che per il momento non
-forniamo anche la documentazione del middle layer, ma lo faremo in seguito\\}
-
-\textbf{Evidenziare le parole chiave che caratterizzano il nostro package}
-
-%%% Local Variables: 
-%%% mode: latex
-%%% TeX-master: "userguide"
-%%% End: 
+\section{General Overview\label{sec:overview}}
+
+The \emph{Multi-Level Domain Decomposition Parallel Preconditioners Package based on
+PSBLAS (MLD2P4}) provides various versions of multi-level Schwarz preconditioners~\cite{DD2},
+to be used in the iterative solutions of sparse linear systems $Ax=b$, where
+$A$ is a square, real or complex, sparse matrix with a symmetric sparsity pattern.
+\textbf{Ma non abbiamo detto che, se il pattern di sparista' non e' simmetrico,
+lavoriamo su $(A+A^T)/2$? Ma questo vale solo per l'aggregazione? Dovremmo fare
+qualcosa di consistente anche con 1-lev Schwarz.}
+Both additive and hybrid preconditioners, i.e.\ multiplicative among the levels
+and additive inside a level, are implemented; the basic additive Schwarz preconditioners
+are obtained by considering only one level. A purely algebraic approach is used to
+generate a sequence of coarse-level corrections to a basic preconditioner, without
+explicitly using any information on the geometry of the original problem (e.g.\ the
+discretization of a PDE). The smoothed aggregation technique is applied
+as algebraic coarsening strategy~\cite{}.
+
+The package is written in Fortran~95, using object-oriented techniques,
+and is based on a distributed-memory parallel programming paradigm. \textbf{SALVATORE,
+potresti aggiungere due righe sulla scelta del Fortran 95 e sul semplice interfacciamento
+con i legacy codes, senza ripetere quello che e' detto sotto sulla scelta di PSBLAS?}
+Single and double precision implementations of MLD2P4 are available for both the
+real and the complex case, that can be used through a single interface.
+\textbf{SALVATORE, funziona tutto?}
+
+MLD2P4 has been designed to implement scalable and easy-to-use multilevel preconditioners
+in the context of the PSBLAS (Parallel Sparse BLAS) computational framework~\cite{}.
+PSBLAS is a library originally developed to address the parallel implementation of
+iterative solvers for sparse linear system, by providing basic linear algebra
+operators and data management facilities for distributed sparse matrices; it
+also includes parallel Krylov solvers, built on the top of the basic PSBLAS kernels.
+The preconditioners available in MLD2P4 can be used with these Krylov solvers.
+The choice of PSBLAS has been mainly motivated by the need of having
+a portable and efficient software infrastructure implementing ``de facto'' standard
+parallel sparse linear algebra kernels, to pursue goals such as performance,
+portability, modularity ed extensibility in the development of the preconditioner
+package. On the other hand, the implementation of MLD2P4 has led to some
+revisions and extentions of the PSBLAS kernels, leading to the
+recent PSBLAS 2.0 version~\cite{}. The inter-process comunication required
+by MLD2P4 is encapsulated into the PSBLAS routines, except few cases where
+MPI~\cite{} is explicitly called. Therefore, MLD2P4 can be run on any parallel
+machine where PSBLAS and MPI implementations are available.
+
+MLD2P4 has a layered and modular software architecture where three main layers can be identified. The lower layer consists of the PSBLAS kernels, the middle one implements
+the construction and application phases of the preconditioners, and the upper one
+provides a uniform and easy-to-use interface to all the preconditioners. 
+This architecture allows for different levels of use of the package:
+few black-box routines at the upper level allow non-expert users to easily
+build any preconditioner available in MLD2P4 and to apply it within a PSBLAS Krylov solver.
+On the other hand, the routines of the middle and lower layer can be used and extended
+by expert users to build new versions of multi-level Schwarz preconditioners.\\
+
+\textbf{Organizzazione della guida:\\
+dire che per il momento non
+forniamo anche la documentazione del middle layer, ma lo faremo in seguito\\}
+
+\textbf{Evidenziare le parole chiave che caratterizzano il nostro package}
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "userguide"
+%%% End: 
--- a/docs/pdf/title.tex
+++ b/docs/pdf/title.tex
@ -5,37 +5,46 @@

 \ifx\pdfoutput\undefined % We're not running pdftex
 \else
-\pdfbookmark{MLD2P4-1.0 User's Guide}{title}
+\pdfbookmark{MLD2P4 User's and Reference Guide}{title}
 \fi
-\newlength{\centeroffset}
-\setlength{\centeroffset}{-0.5\oddsidemargin}
-\addtolength{\centeroffset}{0.5\evensidemargin}
+%\newlength{\centeroffset}
+%\setlength{\centeroffset}{-0.5\oddsidemargin}
+%\addtolength{\centeroffset}{0.5\evensidemargin}
 %\addtolength{\textwidth}{-\centeroffset}
 \thispagestyle{empty}
 \vspace*{\stretch{1}}
 \noindent\hspace*{\centeroffset}\makebox[0pt][l]{\begin{minipage}{\textwidth}
 \flushright
-{\Huge\bfseries MLD2P4-1.0 User's guide
+{\Huge\bfseries MLD2P4\\[.8ex] User's and Reference Guide
 }
 \noindent\rule[-1ex]{\textwidth}{5pt}\\[2.5ex]
-\hfill\emph{\Large A reference guide for the MultiLevel Domain
-  Decomposition Parallel Preconditioners Package based on Parallel Sparse BLAS}
+\hfill\emph{\Large A guide for the Multi-Level Domain Decomposition \\[.6ex]
+Parallel Preconditioners Package
+based on PSBLAS}
+\end{minipage}}
+
+
+\vspace{\stretch{1}}
+\noindent\hspace*{\centeroffset}\makebox[0pt][l]{\begin{minipage}{\textwidth}
+\flushright
+{\large\bfseries Pasqua D'Ambra}\\
+\large ICAR-CNR, Naples, Italy\\[3ex]
+{\large\bfseries Daniela di Serafino}\\
+\large Second University of Naples, Italy\\[3ex]
+{\large\bfseries Salvatore Filippone} \\
+\large University of Rome ``Tor Vergata'', Italy 
+%\\[10ex]
+%\today
 \end{minipage}}

 \vspace{\stretch{1}}
 \noindent\hspace*{\centeroffset}\makebox[0pt][l]{\begin{minipage}{\textwidth}
 \flushright
-{\bfseries 
-by Salvatore Filippone\\
-   Alfredo Buttari} \\
-University of Rome ``Tor Vergata'' \\[3ex]
-{\bfseries Daniela di Serafino }\\
-Second University of Naples\\[3ex]
-{\bfseries Pasqua D'Ambra}\\
-ICAR-CNR, Naples\\[3ex]
+\large Software version: 1.0\\
 \today
 \end{minipage}}

+
 %\addtolength{\textwidth}{\centeroffset}
 \vspace{\stretch{2}}

@ -48,4 +57,3 @@ ICAR-CNR, Naples\\[3ex]
 % mode: latex
 % mode: flyspell
 % End:
-
--- a/docs/pdf/userguide.tex
+++ b/docs/pdf/userguide.tex
@ -35,6 +35,14 @@
 %  /URI (http://ce.uniroma2.it/psblas)
 } 

+\setlength\oddsidemargin{.7in}
+\setlength\evensidemargin{.7in}
+\newlength{\centeroffset}
+\setlength{\centeroffset}{0.5\oddsidemargin}
+\addtolength{\centeroffset}{0.5\evensidemargin}
+\addtolength{\textwidth}{-\centeroffset}
+\pagestyle{myheadings}
+
 \newcounter{subroutine}[subsection]
 \newcounter{example}[subroutine]
 \makeatletter
--- a/docs/userguide.pdf
+++ b/docs/userguide.pdf