mld2p4:

docs/pdf/background.tex docs/pdf/bibliography.tex docs/pdf/building.tex docs/pdf/conventions.tex docs/pdf/distribution.tex docs/pdf/errors.tex docs/pdf/gettingstarted.tex docs/pdf/highlevelview.tex docs/pdf/intro.tex docs/pdf/methods.tex docs/pdf/overview.tex docs/pdf/precs.tex docs/pdf/userguide.tex docs/pdf/userinterface.tex docs/userguide.pdf furtehr fixes to docs.
17 years ago · 0b62918e37
parent 70c2e5400e
commit 0b62918e37
15 changed files with 3542 additions and 2826 deletions
--- a/docs/pdf/background.tex
+++ b/docs/pdf/background.tex
@ -1,4 +1,6 @@
 \section{Multi-level Domain Decomposition Background\label{sec:background}}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:background} Background}}

 \emph{Domain Decomposition} (DD) preconditioners, coupled with Krylov iterative
 solvers, are widely used in the parallel solution of large and sparse linear systems.
@ -56,12 +58,12 @@ interpolation \cite{StubenGMD69_99}.

 MLD2P4 uses a pure algebraic approach for building the sequence of coarse matrices
 starting from the original matrix. The algebraic approach is based on the \emph{smoothed 
-aggregation} algorithm \cite{Brezina_Vanek_,Vanek_Mandel_Brezina_}. A decoupled version
+aggregation} algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}. A decoupled version
 of this algorithm is implemented, where the smoothed aggregation is applied locally
-to each submatrix \cite{Tuminaro_Tong_00}. In the next two subsections we provide
+to each submatrix \cite{TUMINARO_TONG}. In the next two subsections we provide
 a brief description of the multi-level Schwarz preconditioners and on the smoothed
 aggregation technique as implemented in MLD2P4. For further details the user
-is referred to \cite{para_04,apnum_07,aaecc_07,dd2_96}.
+is referred to \cite{para_04,aaecc_07,apnum_07,dd2_96}.


 \subsection{Multi-level Schwarz Preconditioners\label{sec:multilevel}}
@ -112,7 +114,7 @@ three steps:
 A variant of the classical AS preconditioner that outperforms it
 in terms of both convergence rate and of computation and communication
 time on parallel distributed-memory computers is the so-called \emph{Restricted AS
-(RAS)} preconditioner~\cite{Cai_Sarkis,Efstathiou_Gander}. It
+(RAS)} preconditioner~\cite{CAI_SARKIS,EFSTATHIOU}. It
 is obtained by zeroing the components of $w_i$ corresponding to the
 overlapping vertices when applying the prolongation. Therefore,
 RAS differs from classical AS by the prolongation operators,
@ -209,20 +211,63 @@ coarse-level system. The corresponding preconditioners are called \emph{multi-le
 One more reason for the multi-level approach is that it may significantly
 reduce the computational cost of preconditioning with respect to the two-level case
 (see \cite[Chapter 3]{dd2_96}). Additive and hybrid multilevel preconditioners
-are obtained as direct extensions of the two-level counterparts. Other combinations
-of the smoothers and coarse-level corrections are possible, leading to variants
+are obtained as direct extensions of the two-level counterparts. 
+The algorithm for applying a multi-level version of the two-level hybrid 
+post-smoothed preconditioner is reported in Figure~\ref{fig:mlhpost_alg}.
+Other combinations of the smoothers and coarse-level corrections are possible, leading to variants
 of the previous algorithms. For a detailed descrition of them, the reader is
 referred to \cite[Chapter 3]{dd2_96}.
-\textbf{DESCRIZIONE ALGORITMICA, a titolo di esempio,
-di un precondizionatore multilevel, ad esempio quello ibrido con pre- o post-smoothing,
-sul tipo della descrizione in figura 1 della guida di Trilinos ML 4.0.}
+% 
+\begin{figure}[t]
+\begin{center}
+\framebox{
+\begin{minipage}{.85\textwidth} {\small
+\begin{tabbing}
+\quad \=\quad \=\quad \=\quad \\[-1mm]
+! assigned the finest matrix\\
+$A_1 \leftarrow A$;\\[1mm]
+! defined the number of levels $nlev$ \\[1mm]
+! defined $nlev-1$ prolongators\\
+$R_l^T, l=2, \ldots, nlev$;\\[1mm]
+! defined $nlev-1$ coarser matrices\\
+$A_l \leftarrow R_lA_{l-1}R_l^T, \; l=2, \ldots, nlev$;\\[1mm]
+! defined the $nlev-1$ basic Schwarz preconditioners\\
+$M_l$, basic preconditioner for $A_l \; l=1, \ldots, nlev-1$;\\[1mm]
+! assigned a vector $v$\\
+$v_1 \leftarrow v$; \\[2mm]
+\textbf{for $l=2, nlev$ do}\\[1mm]
+\> ! transfer $v_{l-1}$ to the next coarser level\\
+\>  $v_l \leftarrow R_lv_{l-1}$; \\[1mm]
+\textbf{endfor} \\[2mm]
+! apply the coarsest-level correction\\[1mm]
+$y_{nlev} \leftarrow A_{nlev}^{-1}*v_{nlev}$;\\[2mm]
+\textbf{for $l=nlev -1 , 1, -1$ do}\\[1mm]
+\> ! transfer $y_{l+1}$ to the next finer level\\
+\> $y_l \leftarrow R_{l+1}^T*y_{l+1}$;\\[1mm]
+\> ! compute the residual at the current level\\
+\> $r_l \leftarrow v_l-A_l^{-1}*y_l$;\\[1mm]
+\> ! apply the basic Schwarz preconditioner to $r_l$\\
+\> $r_l \leftarrow M_l^{-1}*r_l$\\[1mm]
+\> ! update $y_l$\\
+\> $y_l \leftarrow y_l+r_l$\\
+\textbf{endfor} \\[1mm]
+! preconditioned vector
+$w \leftarrow y_1$;
+\end{tabbing}
+}
+\end{minipage}
+}
+\caption{Multi-level hybrid post-smoothed preconditioner.\label{fig:mlhpost_alg}}
+\end{center}
+\end{figure}
+%


 \subsection{Smoothed Aggregation\label{sec:aggregation}}

 To define the restriction operator $R_C$, which is used to compute
 the coarse-level matrix $A_C$, MLD2P4 uses the \emph{smoothed aggregation}
-algorithm described in \cite{Brezina_Vanek_,Vanek_Mandel_Brezina_}.
+algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.
 The basic idea of this algorithm is to build a coarse set of vertices
 $W_C$ by suitably grouping the vertices of $W$ into disjoint subsets
 (aggregates), and to define the coarse-to-fine space transfer operator $R_C^T$ by
@ -238,7 +283,7 @@ Three main steps can be identified in the smoothed aggregation procedure:
 %\textbf{NOTA: Controllare cosa fa trilinos dopo il primo passo.}
 
 To perform the coarsening step, we have implemented the aggregation algorithm sketched
-in \cite{apnum_07}. According to \cite{brezina_vanek}, a modification of this algorithm
+in \cite{apnum_07}. According to \cite{BREZINA_VANEK}, a modification of this algorithm
 has been actually considered,
 in which each aggregate $N_r$ is made of vertices of $W$ that are \emph{strongly coupled}
 to a certain root vertex $r \in W$, i.e.\
@ -256,7 +301,7 @@ i.e.\ near vertices adjacent to vertices in other processors, and is strongly
 dependent on the number of processors and on the initial partitioning of the matrix $A$.
 Nevertheless, this algorithm has been chosen for the implementation in MLD2P4,
 since it has been shown to produce good results in practice
-\cite{Tuminaro_Tong_00,apnum_07,aaecc_07}.
+\cite{aaecc_07,apnum_07,TUMINARO_TONG}.

 The prolongator $P_C=R_C^T$ is built starting from a \emph{tentative prolongator}
 $P \in \Re^{n \times n_C}$, defined as
@ -276,14 +321,14 @@ P_C = S P,
 \end{equation}
 in order to remove oscillatory components from the range of the prolongator
 and hence to improve the convergence properties of the multi-level
-Schwarz method \cite{Brezina_Vanek_,StubenGMD69_99}.
+Schwarz method \cite{BREZINA_VANEK,StubenGMD69_99}.
 A simple choice for $S$ is the damped Jacobi smoother:
 \begin{equation}
 S = I - \omega D^{-1} A , 
 \label{eq:jac_smoother}
 \end{equation}
 where the value of $\omega$ can be chosen
-using some estimate of the spectral radius of $D^{-1}A$ \cite{Brezina_Vanek}.
+using some estimate of the spectral radius of $D^{-1}A$ \cite{BREZINA_VANEK}.
 %
 %\textbf{NOTA: filtering di $A$ nello smoothing, da implementare?}
 %
--- a/docs/pdf/bibliography.tex
+++ b/docs/pdf/bibliography.tex
@ -1,25 +1,41 @@
-\begin{thebibliography}{99}
+\section{Bibliography\label{sec:bib}}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:bib} Bibliography}}
+\let\refname\relax

+\begin{thebibliography}{99}
+%
+%\bibitem{PARA04FOREST}
+%G.~Bella, S.~Filippone, A.~De Maio, A., Testa, M.: 
+%A Simulation Model for Forest Fires.
+%In: Dongarra, J., Madsen, K., Wasniewski, J. (eds.):
+%Proceedings of PARA~04 Workshop on State of the Art
+%in Scientific Computing. Lecture Notes in Computer Science, 3732. Berlin:
+%Springer, 2005
+%
+\bibitem{BREZINA_VANEK}
+M.~Brezina, P.~Van{\v e}k,
+{\em A Black-Box Iterative Solver Based on a Two-Level Schwarz Method},
+Computing, 63, 1999, 233--263.
 %
-\bibitem{PARA04FOREST}
-Bella, G., Filippone, S., De Maio, A., Testa, M.:
-A Simulation Model for Forest Fires.
-In: Dongarra, J., Madsen, K., Wasniewski, J. (eds.):
+\bibitem{para_04}
+A.~Buttari, P.~D'Ambra, D.~di Serafino, S.~Filippone,
+{\em Extending PSBLAS to Build Parallel Schwarz Preconditioners},
+in , J.~Dongarra, K.~Madsen, J.~Wasniewski, editors,
 Proceedings of PARA~04 Workshop on State of the Art
-in Scientific Computing. Lecture Notes in Computer Science, 3732. Berlin:
-Springer, 2005
+in Scientific Computing, Lecture Notes in Computer Science,
+Springer, 2005, 593--602.
 %
-\bibitem{aaecc_07} A. Buttari, D. di Serafino, P. D'Ambra, S. Filippone,\newblock
-2LEV-D2P4: a package of high-performance preconditioners,\newblock
+\bibitem{aaecc_07} A.~Buttari, P.~D'Ambra, D.~di~Serafino, S.~Filippone,
+{\em 2LEV-D2P4: a package of high-performance preconditioners},
 Applicable Algebra in Engineering, Communications and Computing, 
-Volume 18, Number 3, May, 2007, pp.  223-239
+18, 3, May, 2007, 223--239.
 %Published online: 13 February 2007, {\tt http://dx.doi.org/10.1007/s00200-007-0035-z}
 %
-\bibitem{apnum_07}  P. D'Ambra, S. Filippone,  D. Di Serafino\newblock
-On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners
-\newblock
+\bibitem{apnum_07}  P.~D'Ambra, S.~Filippone,  D.~Di~Serafino,
+{\em On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners},
 Applied Numerical Mathematics, Elsevier Science, 
-Volume 57, Issues 11-12, November-December 2007, Pages 1181-1196.
+57, 11-12, 2007, 1181-1196.
 %published online 3 February 2007, {\tt
 %  http://dx.doi.org/10.1016/j.apnum.2007.01.006}

@ -30,118 +46,127 @@ Volume 57, Issues 11-12, November-December 2007, Pages 1181-1196.
 %% (See also {\tt http://www.mgnet.org/~douglas/ccd-codes.html}) 
 %
 %
-\bibitem{para_04}
-A.~Buttari, P.~D'Ambra, D.~di Serafino and S.~Filippone,
-{\em Extending PSBLAS to Build Parallel Schwarz Preconditioners},
-in , J.~Dongarra, K.~Madsen, J.~Wasniewski, editors,
-Proceedings of PARA~04 Workshop on State of the Art
-in Scientific Computing, pp.~593--602, Lecture Notes in Computer Science,
-Springer, 2005.
-%
 %% \bibitem{CAI_SAAD}
 %% X.~C.~Cai and Y.~Saad,
 %% {\em Overlapping Domain Decomposition Algorithms for General Sparse Matrices},
 %% Numerical Linear Algebra with Applications, 3(3), pp.~221--237, 1996.
-%% %
-%% \bibitem{CAI_SARKIS}
-%% X.C.~Cai and M.~Sarkis,
-%% {\em A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems},
-%% SIAM Journal on Scientific Computing, 21(2), pp.~792--797, 1999.
+%
+\bibitem{CAI_SARKIS}
+X.~C.~Cai, M.~Sarkis,
+{\em A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems},
+SIAM Journal on Scientific Computing, 21, 2, 1999, 792--797.
 %
 \bibitem{Cai_Widlund_92}
-X.C.~Cai and O.~B.~Widlund,
+X.~C.~Cai, O.~B.~Widlund,
 {\em Domain Decomposition Algorithms for Indefinite Elliptic Problems},
-SIAM Journal on Scientific and Statistical Computing, 13(1), pp.~243--258, 1992.
+SIAM Journal on Scientific and Statistical Computing, 13, 1, 1992, 243--258.
 %
 \bibitem{dd1_94}
 T.~Chan and T.~Mathew,
 {\em Domain Decomposition Algorithms},
-in A.~Iserles, editor, Acta Numerica 1994, pp.~61--143, 1994.
+in A.~Iserles, editor, Acta Numerica 1994, 61--143.
 Cambridge University Press.
-%% %
-%% \bibitem{UMFPACK}
-%% T.A.~Davis, 
-%% {\em Algorithm 832: UMFPACK - an Unsymmetric-pattern Multifrontal
-%% Method with a Column Pre-ordering Strategy},
-%% ACM Transactions on Mathematical Software, 30, pp.~196--199, 2004.
-%% (See also {\tt http://www.cise.ufl.edu/~davis/})
-%% %
-%% \bibitem{SUPERLU}
-%% J.W.~Demmel, S.C.~Eisenstat, J.R.~Gilbert, X.S.~Li and J.W.H.~Liu,
-%% A supernodal approach to sparse partial pivoting,
-%% SIAM Journal on Matrix Analysis and Applications, 20(3), pp.~720--755, 1999.
-%
-\bibitem{BLACS}
-J.~J.~Dongarra and R.~C.~Whaley,
-{\em A User's Guide to the BLACS v.~1.1},
-Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of
-Tennessee, March 1995 (updated May 1997).
-%
-\bibitem{sblas_97}
-I.~Duff, M.~Marrone, G.~Radicati and C.~Vittoli,
-{\em Level 3 Basic Linear Algebra Subprograms for Sparse Matrices: 
-a User Level Interface},
-ACM Transactions on Mathematical Software, 23(3), pp.~379--401, 1997.
-%
-\bibitem{sblas_02}
-I.~Duff, M.~Heroux and R.~Pozo,
-{\em An Overview of the Sparse Basic Linear
-Algebra Subprograms: the New Standard from the BLAS Technical Forum},
-ACM Transactions on Mathematical Software, 28(2), pp.~239--267, 2002.
+% 
+\bibitem{UMFPACK}
+T.A.~Davis, 
+{\em Algorithm 832: UMFPACK - an Unsymmetric-pattern Multifrontal
+Method with a Column Pre-ordering Strategy},
+ACM Transactions on Mathematical Software, 30, 2004, 196--199.
+(See also {\tt http://www.cise.ufl.edu/~davis/})
+%
+\bibitem{SUPERLU}
+J.W.~Demmel, S.C.~Eisenstat, J.R.~Gilbert, X.S.~Li and J.W.H.~Liu,
+A supernodal approach to sparse partial pivoting,
+SIAM Journal on Matrix Analysis and Applications, 20, 3, 1999, 720--755.
+%
+%\bibitem{BLACS}
+%J.~J.~Dongarra and R.~C.~Whaley,
+%{\em A User's Guide to the BLACS v.~1.1},
+%Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of
+%Tennessee, March 1995 (updated May 1997).
+%
+%\bibitem{sblas_97}
+%I.~Duff, M.~Marrone, G.~Radicati and C.~Vittoli,
+%{\em Level 3 Basic Linear Algebra Subprograms for Sparse Matrices: 
+%a User Level Interface},
+%ACM Transactions on Mathematical Software, 23(3), pp.~379--401, 1997.
+%
+%\bibitem{sblas_02}
+%I.~Duff, M.~Heroux and R.~Pozo,
+%{\em An Overview of the Sparse Basic Linear
+%Algebra Subprograms: the New Standard from the BLAS Technical Forum},
+%ACM Transactions on Mathematical Software, 28(2), pp.~239--267, 2002.
+%
+\bibitem{EFSTATHIOU} 
+E.~Efstathiou, J.~G.~Gander,
+{\em Why Restricted Additive Schwarz Converges Faster than Additive Schwarz},
+BIT Numerical Mathematics, 43, 2003, 945--959.
+%
+\bibitem{PSBLASGUIDE}
+S.~Filippone, A.~Buttari, 
+{\em PSBLAS-2.1 User's Guide. A Reference Guide for the Parallel Sparse BLAS Library},
+xxxxx.
 %
 \bibitem{psblas_00}
-S.~Filippone and M.~Colajanni, 
+S.~Filippone, M.~Colajanni, 
 {\em PSBLAS: A Library for Parallel Linear Algebra
 Computation on Sparse Matrices},
-\newblock
-ACM Transactions on Mathematical Software, 26(4), pp.~527--550, 2000.
-%
-\bibitem{KIVA3PSBLAS}
-S.~Filippone, P.~D'Ambra, M.~Colajanni,
-{\em Using a Parallel Library of Sparse Linear Algebra in a Fluid Dynamics 
-Applications Code on Linux Clusters},
-in G.~Joubert, A.~Murli, F.~Peters, M.~Vanneschi, editors,
-Parallel Computing - Advances \& Current Issues,
-pp.~441--448, Imperial College Press, 2002. 
-%
-\bibitem{METIS}
-Karypis, G. and Kumar, V.,
-{\em {METIS}: Unstructured Graph Partitioning and Sparse Matrix
-  Ordering System}.
-Minneapolis, MN 55455: University of Minnesota, Department of
-  Computer Science, 1995. 
-Internet Address: {\verb|http://www.cs.umn.edu/~karypis|}.
-\bibitem{BLAS1}
-Lawson, C.,  Hanson, R., Kincaid, D. and Krogh, F.,
-   Basic {L}inear {A}lgebra {S}ubprograms for {F}ortran usage,
-{ACM Trans. Math. Softw.} vol.~{5}, 38--329, 1979.
-
-\bibitem{machiels}
-{Machiels, L. and Deville, M.}
-{\em Fortran 90: An entry to object-oriented programming for the solution
-  of partial differential equations.}
-{ACM Trans. Math. Softw.} vol.~{23}, 32--49.
-\bibitem{metcalf}
-{Metcalf, M., Reid, J. and Cohen, M.}
-{\em Fortran 95/2003 explained.}
-{Oxford University Press}, 2004.
-
+ACM Transactions on Mathematical Software, 26, 4, 2000, 527--550.
+\bibitem{SUPERLUDIST}
+X.~S.~Li, J.~W.~Demmel, {\em SuperLU\_DIST: A Scalable Distributed-memory Sparse Direct Solver for Unsymmetric Linear Systems},
+ACM Transactions on Mathematical Software, 29, 2, 2003, 110--140.
+%
+%\bibitem{KIVA3PSBLAS}
+%S.~Filippone, P.~D'Ambra, M.~Colajanni,
+%{\em Using a Parallel Library of Sparse Linear Algebra in a Fluid Dynamics 
+%Applications Code on Linux Clusters},
+%in G.~Joubert, A.~Murli, F.~Peters, M.~Vanneschi, editors,
+%Parallel Computing - Advances \& Current Issues,
+%pp.~441--448, Imperial College Press, 2002. 
+%
+%\bibitem{METIS}
+%Karypis, G. and Kumar, V.,
+%{\em {METIS}: Unstructured Graph Partitioning and Sparse Matrix
+%  Ordering System}.
+%Minneapolis, MN 55455: University of Minnesota, Department of
+%  Computer Science, 1995. 
+%Internet Address: {\verb|http://www.cs.umn.edu/~karypis|}.
+%\bibitem{BLAS1}
+%Lawson, C.,  Hanson, R., Kincaid, D. and Krogh, F.,
+%   Basic {L}inear {A}lgebra {S}ubprograms for {F}ortran usage,
+%{ACM Trans. Math. Softw.} vol.~{5}, 38--329, 1979.
+%
+%\bibitem{machiels}
+%{Machiels, L. and Deville, M.}
+%{\em Fortran 90: An entry to object-oriented programming for the solution
+%  of partial differential equations.}
+%{ACM Trans. Math. Softw.} vol.~{23}, 32--49.
+%\bibitem{metcalf}
+%{Metcalf, M., Reid, J. and Cohen, M.}
+%{\em Fortran 95/2003 explained.}
+%{Oxford University Press}, 2004.
+%
 \bibitem{dd2_96}
-B.~Smith, P.~Bjorstad and W.~Gropp,
+B.~Smith, P.~Bjorstad, W.~Gropp,
 {\em Domain Decomposition: Parallel Multilevel Methods for Elliptic
 Partial Differential Equations},
 Cambridge University Press, 1996.
-
+%
 \bibitem{MPI1}
-M.~Snir, S.~Otto, S.~Huss-Lederman, D.~Walker and J.~Dongarra,
+M.~Snir, S.~Otto, S.~Huss-Lederman, D.~Walker, J.~Dongarra,
 {\em MPI: The Complete Reference. Volume 1 - The MPI Core}, second edition,
 MIT Press, 1998.
 %
-\bibitem{BREZINA_VANEK}
-M.~Brezina and P.~Van{\v e}k,
-{\em A Black-Box Iterative Solver Based on a Two-Level Schwarz Method},
-Computing, 1999, 63, 233-263.
+\bibitem{StubenGMD69_99}
+K.~St\"{u}ben,
+{\em Algebraic Multigrid (AMG): an Introduction with Applications},
+in A.~Sch\"{u}ller, U.~Trottenberg, C.~Oosterlee, editors, Multigrid,
+Academic Press, 2000.
 %
+\bibitem{TUMINARO_TONG}
+R.~S.~Tuminaro, C.~Tong,
+{\em Parallel Smoothed Aggregation Multigrid: Aggregation Strategies on Massively Parallel Machines},
+in J. Donnelley, editor, Proceedings of SuperComputing 2000, Dallas, 2000.
 %
 \bibitem{VANEK_MANDEL_BREZINA}
 P.~Van{\v e}k, J.~Mandel and M.~Brezina,
@ -149,4 +174,4 @@ P.~Van{\v e}k, J.~Mandel and M.~Brezina,
 Computing, 1996, 56, 179-196.
 %

-\end{thebibliography}
+\end{thebibliography}
--- a/docs/pdf/building.tex
+++ b/docs/pdf/building.tex
@ -1,4 +1,6 @@
 \section{Configuring and Building MLD2P4\label{sec:configuring}}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:configuring} Configuring and Building}}
    - uso di GNU autoconf e automake \\
    - software di base necessario (MPI, BLACS, BLAS, PSBLAS, UMFPACK ? - specificare versioni)\\
    - software opzionale (SuperLU, SuperLUdist - specificare versioni e opzioni di configure)\\
--- a/docs/pdf/conventions.tex
+++ b/docs/pdf/conventions.tex
@ -1,4 +1,7 @@
 \section{Notational Conventions\label{sec:conventions}}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:conventions} Notational conventions}}
+
    - caratteri tipografici usati nella guida (vedi guida ML recente e guida Aztec) \\
    - convenzioni sui nomi di routine (differenza nei nomi tra high-level e medium-level),
      strutture dati, moduli, costanti, etc. (vedi guida psblas) \\
--- a/docs/pdf/distribution.tex
+++ b/docs/pdf/distribution.tex
@ -1,4 +1,6 @@
-\section{Code Distribution\label{sec:distribution}}
+\section{License\label{sec:distribution}}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:distribution} License}}

 The MLD2P4 is freely distributable under the following copyright
 terms: {\small
--- a/docs/pdf/errors.tex
+++ b/docs/pdf/errors.tex
@ -1,4 +1,6 @@
 \section{Error Handling}\label{sec:errors}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:errors} Error handling}}

 Error handling
    - Breve descrizione con rinvio alla guida di PSBLAS
--- a/docs/pdf/gettingstarted.tex
+++ b/docs/pdf/gettingstarted.tex
@ -1,7 +1,9 @@
 \section{Getting Started\label{sec:started}}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:started} Getting started}}

 We describe the basics for building and applying MLD2P4 one-level and multi-level
-Schwarz preconditioners with the Krylov solvers included in PSBLAS \cite{}.
+Schwarz preconditioners with the Krylov solvers included in PSBLAS \cite{PSBLASGUIDE}.
 The following steps are required:
 \begin{enumerate} 
 \item \emph{Declare the preconditioner data structure}. It is a derived data type,
@ -50,11 +52,12 @@ preconditioner has been chosen by taking into account that, on parallel
 machines, it often leads to the smallest execution time when applied to
 linear systems coming from finite-difference discretizations of basic
 elliptic PDE problems, considered as standard tests for multi-level Schwarz
-preconditioners \cite{apnum_07,aaecc_07}. However, this solver does not correspond
-to the smallest number of iterations of the preconditioned Krylov method, which is
-usually obtained by applying a direct solver, e.g.\ based on the LU factorization, at
-the coarsest level (see Section~\ref{sec:userinterface} for coarsest-level
-solvers available in MLD2P4).
+preconditioners \cite{aaecc_07,apnum_07}. However, this solver does
+not necessarily  to the smallest number of iterations of the
+preconditioned Krylov method, which is usually obtained by applying a
+direct solver, e.g.\ based on the LU factorization, on a matrix
+replicated at the coarsest level (see Section~\ref{sec:userinterface}
+for coarsest-level solvers available in MLD2P4). 

 \begin{table}[th]
 {
@ -108,7 +111,7 @@ in the directory \textbf{XXXXXX (COMPLETARE. DIRE CHE I FILE IN REALTA' SONO DUE
 LA GENERAZIONE DELLA MATRICE ED UNO CON LA LETTURA).} Note that the modules \verb|psb_base_mod|
 and \verb|psb_util_mod| at the beginning of the code are required by PSBLAS.
 \textbf{O psb\_base\_mod} E' RICHIESTO ANCHE DA MLD2P4?)
-For details on the use of the PSBLAS routines, see the PSBLAS User's Guide \cite{}.
+For details on the use of the PSBLAS routines, see the PSBLAS User's Guide \cite{PSBLASGUIDE}.

 \textbf{LE FIGURE SONO DECENTRATE, NONOSTANTE IL CENTER. CI VUOLE UNA MINIPAGE?}

@ -178,7 +181,7 @@ the default values of the preconditioner parameters. The code reported in
 Figure~\ref{fig:ex_3lh} shows how to set a three-level hybrid Schwarz
 preconditioner, which uses block Jacobi with ILU(0) on the
 local blocks as post-smoother, a coarsest matrix replicated on the processors,
-and the LU factorization from UMFPACK as coarse-level solver.
+and the LU factorization from UMFPACK~\cite{UMFPACK}, version 4.4, as coarse-level solver.
 The number of levels is specified by using \verb|mld_precinit|; the other
 preconditioner parameters are set by calling \verb|mld_precset|. Note that
 the type of multilevel framework (i.e.\ multiplicative among the levels
@ -189,13 +192,14 @@ which applies RAS, with overlap 1 and ILU(0) on the blocks,
 as pre- and post-smoother, and five block-Jacobi sweeps, with
 the UMFPACK LU factorization on the blocks, as distributed coarsest-level
 solver. Again, \verb|mld_precset| is used only to set
-non-default values of the parameters (see Tables~\ref{tab:ptype}-\ref{tab:pcoarse}).
+non-default values of the parameters (see Tables~\ref{tab:p_type}-\ref{tab:p_coarse}).
 In both cases, the construction and the application of the preconditioner
 are carried out as for the default multi-level preconditioner.
-The code fragments shown in in Figures~\ref{fig:ex_3lh}-\ref{fig:3la} are
+The code fragments shown in in Figures~\ref{fig:ex_3lh}-\ref{fig:ex_3la} are
 included in the example program file \verb|example_ml.f90|.
 \textbf{LO STESSO PROGRAMMA CONTIENE I TRE ESEMPI, CON UN SWITCH TRA L'UNO E L'ALTRO
-O FACCIAMO 3 PROGRAMMI DISTINTI? NON RICORDO CHE COSA ABBIAMO DECISO.}
+O FACCIAMO 3 PROGRAMMI DISTINTI? NON RICORDO CHE COSA ABBIAMO DECISO.
+PASQUA: ABBIAMO DETTO CHE ERA PREFERIBILE UN UNICO PROGRAMMA CON SWITCH.}

 Finally, Figure~\ref{fig:ex_1l} shows the setup of a one-level
 additive Schwarz preconditioner, i.e. RAS with overlap 2. The corresponding code,
--- a/docs/pdf/highlevelview.tex
+++ b/docs/pdf/highlevelview.tex
@ -1,4 +1,6 @@
 \section{User Interface\label{sec:highlevel}}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:overview} User Interface}}

 The basic user interface of MLD2P4 consists of six routines. The four routines \verb|mld_precinit|,
 \verb|mld_precset|, \verb|mld_precbld| and \verb|mld_precaply| encapsulate all the functionalities
--- a/docs/pdf/intro.tex
+++ b/docs/pdf/intro.tex
@ -1,4 +1,6 @@
 \section{Introduction}\label{sec:intro}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:overview} Introduction}}

 The MLD2P4 library provides ....

--- a/docs/pdf/methods.tex
+++ b/docs/pdf/methods.tex
@ -1,5 +1,7 @@
 \section{Iterative Methods}
 \label{sec:methods}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:methods} Iterative Methods}}

 In this chapter we provide routines for preconditioners and iterative
 methods. The interfaces for Krylov subspace methods are available in
--- a/docs/pdf/overview.tex
+++ b/docs/pdf/overview.tex
@ -4,7 +4,7 @@
         {\underline{\ref{sec:overview} General Overview}}

 The \textsc{Multi-Level Domain Decomposition Parallel Preconditioners Package based on
-PSBLAS (MLD2P4}) provides \emph{multi-level Schwarz preconditioners}~\cite{DD2},
+PSBLAS (MLD2P4}) provides \emph{multi-level Schwarz preconditioners}~\cite{dd2_96},
 to be used in the iterative solutions of sparse linear systems:
 \begin{equation} 
 Ax=b, 
@ -18,24 +18,25 @@ where $A$ is a square, real or complex, sparse matrix with a symmetric sparsity
 %
 These preconditioners have the following general features:
 \begin{itemize}
-\item both \emph{additive and hybrid multilevel} variants, i.e.\ multiplicative among the levels
-and additive inside a level, are implemented; the basic additive Schwarz preconditioners
-are obtained by considering only one level;
+\item both \emph{additive and hybrid multilevel} variants are implemented,
+i.e.\ variants that are additive among the levels and inside each level, and variants
+that are multiplicative among the levels and additive inside each level; the basic Additive Schwarz (AS) preconditioners are obtained by considering only one level;
 \item a \emph{purely algebraic} approach is used to
-generate a sequence of coarse-level corrections to a basic preconditioner, without
+generate a sequence of coarse-level corrections to a basic AS preconditioner, without
 explicitly using any information on the geometry of the original problem (e.g.\ the
 discretization of a PDE). The \emph{smoothed aggregation} technique is applied
-as algebraic coarsening strategy~\cite{Vanek_Mandel_Brezina,Brezina_Vanek}.
+as algebraic coarsening strategy~\cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}.
 \end{itemize}

 The package is written in \emph{Fortran~95}, following an \emph{object-oriented approach}
-through the exploitation of features such as abstract data type creation, functional overloading and
-dynamic memory management, while providing a smooth path towards the integration in
-legacy application codes. The parallel implementation is based on a Single Program Multiple Data
-(SPMD) paradigm for distributed-memory architectures. 
+through the exploitation of features such as abstract data type creation, functional
+overloading and dynamic memory management, while providing a smooth path towards the integration in legacy application codes.
+\textbf{NON MI PIACE QUESTO PERIODO, E' TROPPO LUNGO. RIUSCITE A SCRIVERLO MEGLIO?}
+The parallel implementation is based
+on a Single Program Multiple Data (SPMD) paradigm for distributed-memory architectures. 
 Single and double precision implementations of MLD2P4 are available for both the
 real and the complex case, that can be used through a single interface.
-\textbf{SALVATORE, funziona tutto?}
+

 MLD2P4 has been designed to implement scalable and easy-to-use multilevel preconditioners
 in the context of the \emph{PSBLAS (Parallel Sparse BLAS) computational framework}~\cite{psblas_00}.
@ -67,7 +68,7 @@ by expert users to build new versions of multi-level Schwarz preconditioners.
 We provide here a description of the upper-layer routines, but not of the
 medium-layer ones.

-This guide is organized as follows:\textbf{organizzazione della guida}
+This guide is organized as follows: \textbf{ORGANIZZAZIONE DELLA GUIDA}

 %%% Local Variables: 
 %%% mode: latex
--- a/docs/pdf/precs.tex
+++ b/docs/pdf/precs.tex
@ -1,5 +1,7 @@
 \section{Preconditioner routines}
 \label{sec:precs}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:precs} Preconditioners}}

 % \section{Preconditioners}
 \label{sec:psprecs}
--- a/docs/pdf/userguide.tex
+++ b/docs/pdf/userguide.tex
@ -83,6 +83,7 @@
 \newcommand{\precdata}{\hyperlink{precdata}{{\tt mld\_prec\_type}}}
 \newcommand{\descdata}{\hyperlink{descdata}{{\tt psb\_desc\_type}}}
 \newcommand{\spdata}{\hyperlink{spdata}{{\tt psb\_spmat\_type}}}
+\newcommand{\Ref}[1]{\mbox{(\ref{#1})}}

 \begin{document}
 \include{title}
@ -111,7 +112,6 @@

 \include{overview}
 \include{conventions}
-\include{distribution}
 \include{building}
 \include{background}
 \include{gettingstarted}
@ -119,6 +119,9 @@
 %\include{advanced}
 \include{errors}
 %\include{listofroutines}
+\cleardoublepage
+\appendix
+\include{distribution}

 \cleardoublepage

--- a/docs/pdf/userinterface.tex
+++ b/docs/pdf/userinterface.tex
@ -1,4 +1,6 @@
 \section{User Interface\label{sec:userinterface}}
+\markboth{\underline{MLD2P4 User's and Reference Guide}}
+         {\underline{\ref{sec:userinterface} User Interface}}

 The basic user interface of MLD2P4 consists of six routines. The four routines \verb|mld_precinit|,
 \verb|mld_precset|, \verb|mld_precbld| and \verb|mld_precaply| encapsulate all the functionalities for the setup and application of any one-level and multi-level
@ -48,8 +50,7 @@ according to the preconditioner type chosen by the user.
 \subsubsection*{Arguments}

 \begin{tabular}{p{1.2cm}p{11.5cm}}
-\verb|p|      & \verb|type(mld_|\emph{x}\verb|prec_type), intent(inout)|.
-                \textbf{CONTROLLARE SE DEVE ESSERE INOUT O SOLO OUT}     \\
+\verb|p|      & \verb|type(mld_|\emph{x}\verb|prec_type), intent(inout)|.\\
              & The preconditioner data structure. Note that \emph{x}
                must be chosen according to the real/complex, single/double
                precision version of MLD2P4 under use.\\
@ -147,10 +148,9 @@ ACCESSIBILE ALL'UTENTE.}
                           MULTILEVEL.}                                 \\
 \verb|mld_smoother_pos_| & \verb|character(len=*)|
                         & 'PRE' \ \ \ 'POST' \ \ \ 'TWOSIDE'
-                         & 2,...,\verb|nlev|
+                         & 'POST'
                         & ``position'' of the smoother: pre-smoother, post-smoother, 
-                           pre-/post-smoother \textbf{PREFERISCO TWOSIDE A BOTH
-                           PERCHE' E' DIVERSO DA TRILINOS} \\
+                           pre-/post-smoother \\
 \hline
 \end{tabular}
 \end{center}
@ -168,31 +168,33 @@ ACCESSIBILE ALL'UTENTE.}
 \verb|mld_sub_ovr|       & \verb|integer|
                         & any number $\ge 0$
                         & 1
-                         & \textbf{CAMBIARE NOME PARAMETRO NEL SW}    \\
-\verb|mld_sub_restr_|    & 
-                         & 
-                         &  
-                         &     \\
-\verb|mld_sub_prol_|     & 
-                         & 
-                         &
-                         &     \\
-\verb|mld_sub_solve_|    & 
-                         & 
-                         &
-                         &     \\    
-\verb|mld_sub_fillin_|   &
-                         &
-                         &
-                         &     \textbf{CAMBIARE NOME PARAMETRO NEL SW} \\
-\verb|mld_sub_thresh_|   &
-                         &
-                         &
-                         &     \\
-\verb|mld_sub_ren_|      &
-                         &
+                         & \textbf{CAMBIARE NOME PARAMETRO NEL SW} number of overlap in the basic Schwarz preconditioner   \\
+\verb|mld_sub_restr_|    & \verb|character(len=*)|
+                         & 'HALO' \ \ \ 'NONE'
+                         & 'HALO'
+                         & type of restriction operator used in basic Schwarz preconditioner: 'HALO' for taking into account contributions from the overlap \\
+\verb|mld_sub_prol_|     & \verb|character(len=*)|
+                         & 'SUM' \ \ \ 'NONE'
+                         & 'NONE'
+                         & type of prolongator operator used in basic Schwarz preconditioner: 'NONE' for neglecting contributions from the overlap    \\
+\verb|mld_sub_solve_|    & \verb|character(len=*)|
+                         & 'ILU' \ \ \ 'MILU' \ \ \ 'ILUT' \ \ \ 'UMF' \ \ \ 'SLU'
+                         & 'UMF'
+                         & available local solver: 'ILU' for incomplete LU, 'MILU' for modified incomplete LU, 'ILUT' 
+for incomplete LU with threshold, 'UMF' for complete LU using UMFPACK~\cite{UMFPACK} version 4.4, 'SLU' for complete LU using SuperLU~\cite{SUPERLU}, version 3.0  \\    
+\verb|mld_sub_fillin_|   & \verb|integer|
+                         & any number $\ge 0$
+                         & 0
+                         & \textbf{CAMBIARE NOME PARAMETRO NEL SW} fill-in level for 'ILU', 'MILU' and 'ILUT' of local blocks\\
+\verb|mld_sub_thresh_|   & \verb|real| 
+                         & any number $\ge 0.$
+                         & 0.
+                         & drop tolerance for 'ILUT' 
+\textbf{NELLA DOCUMENTAZIONE INTERNA DELLA ROUTINE DI FATTORIZZAZIONE C'E' INTERO, CAMBIARE!}\\
+\verb|mld_sub_ren_|      & \verb|character(len=*)|
+                         & \textbf{MANCA COSTANTE STRINGA ASSOCIATA}
                         &
-                         &  \textbf{MANCA COSTANTE STRINGA ASSOCIATA} \\
+                         & reordering algorithm for the local blocks \\
 \hline

 \end{tabular}
@ -208,22 +210,23 @@ ACCESSIBILE ALL'UTENTE.}
 \verb|what|              & \emph{data type}        &  \verb|val|      &  \emph{default}  &
 \emph{comments} \\ \hline
 %\multicolumn{5}{|c|}{\emph{aggregation algorithm}} \\ \hline
-\verb|mld_aggr_alg_|     &
-                         &
-                         &
-                         &    \\
-\verb|mld_aggr_kind_|    &
-                         &
-                         &
-                         &     \\
-\verb|mld_aggr_thresh_|  &
-                         &
-                         &
-                         &     \\
-\verb|mld_aggr_eig_|     &
-                         &
-                         &
-                         & \textbf{MANCA STRINGA CORRISPONDENTE a mld\_max\_norm} \\
+\verb|mld_aggr_alg_|     & \verb|character(len=*)|
+                         & 'DEC'
+                         & 'DEC'
+                         & define the aggregation scheme. Now, only decoupled aggregation is available\\
+\verb|mld_aggr_kind_|    & \verb|character(len=*)|
+                         & 'SMOOTH', 'RAW'
+                         & 'SMOOTH'
+                         & define the type of aggregation technique (smoothed or nonsmoothed).    \\
+\verb|mld_aggr_thresh_|  & \verb|real|
+                         & any number $\in [0, 1]$
+                         & 0.
+                         & dropping threshold in aggregation    \\
+\verb|mld_aggr_eig_|     & \verb|character(len=*)|
+                         & \textbf{MANCA STRINGA CORRISPONDENTE a mld\_max\_norm}
+                         & 'ANORM'???
+                         & define the algorithm to evaluate the maximum eigenvalue of $D^{-1}A$ for smoothed
+aggregation. Now, only the A-norm of the matrix is available\\
 \hline
 \end{tabular}
 \end{center}
@ -238,30 +241,32 @@ ACCESSIBILE ALL'UTENTE.}
 \verb|what|              & \emph{data type}        &  \verb|val|      &  \emph{default}  &
 \emph{comments} \\ \hline
 %\multicolumn{5}{|c|}{\emph{coarse-space correction at the coarsest level}}\\ \hline
-\verb|mld_coarse_mat_|   &
-                         &
-                         &
-                         &     \\
-\verb|mld_coarse_solve_| &
-                         &
-                         &
-                         & \textbf{VEDI OSSERVAZIONI EMAIL 15-16/06/08}\\
-\verb|mld_coarse_subsolve_| &
-                         &
-                         &
-                         & \textbf{VEDI OSSERVAZIONI EMAIL 15-16/06/08}\\
-\verb|mld_coarse_sweeps_|&                         
-                         &
-                         &
-                         &     \\
-\verb|mld_coarse_fillin_| &
-                         &
-                         &
-                         &     \textbf{MODIFICA NOME PARAM. NEL SW} \\
-\verb|mld_coarse_thresh_| &
-                         &
-                         &
-                         &    \\ \hline
+\verb|mld_coarse_mat_|   & \verb|character(len=*)|
+                         & 'DISTR', 'REPL'
+                         & 'DISTR'
+                         & Coarse matrix: distributed or replicated    \\
+\verb|mld_coarse_solve_| & \verb|character(len=*)|
+                         & 'BJAC' \ \ \ 'UMF' \ \ \ 'SLUDIST'
+                         & 'BJAC'
+                         & \textbf{VEDI OSSERVAZIONI EMAIL 15-16/06/08} available solver for coarse system. 
+Only 'BJAC' and 'SLUDIST' can be used for distributed coarse matrix. 'BJAC' corresponds to some sweeps of a block-Jacobi solver, while 'SLUDIST' corresponds
+to the use of the external package SuperLU\_Dist~\cite{SUPERLUDIST}, version 2.0, for distributed sparse factorization and solve.  \\
+\verb|mld_coarse_subsolve_| & \verb|character(len=*)|
+                         & 'ILU' \ \ \ 'MILU' \ \ \ 'ILUT' \ \ \ 'UMF' \ \ \ 'SLU'
+                         & 'UMF'
+                         & \textbf{VEDI OSSERVAZIONI EMAIL 15-16/06/08} available solver for diagonal local blocks of the coarse matrix, when 'BJAC' is used as coarse solver\\
+\verb|mld_coarse_sweeps_|& \verb|integer|                         
+                         & any number $> 0$
+                         & 4
+                         & number of Block-Jacobi sweeps when 'BJAC' is used as coarse solver    \\
+\verb|mld_coarse_fillin_| & \verb|integer|
+                         & any number $\ge 0$
+                         & 0
+                         & fill-in level in incomplete factorization of local diagonal blocks of the coarse matrix, when 'BJAC' is used as coarse solver and 'ILU' or 'MILU' is used as local solver  \textbf{MODIFICA NOME PARAM. NEL SW} \\
+\verb|mld_coarse_thresh_| & \verb|real|
+                         & any number $\ge 0.$
+                         & 0.
+                         & drop tolerance in incomplete factorization of local diagonal blocks of the coarse matrix, when 'BJAC' is used as coarse solver and 'ILUT' is used as local solver   \\ \hline
 \end{tabular}
 \end{center}
 \caption{Parameters defining the coarse-space correction at the coarsest
@ -287,10 +292,10 @@ the user through the routines \verb|mld_precinit| and \verb|mld_precset|.
              & The sparse matrix structure containing the local part of the
                matrix to be preconditioned. Note that \emph{x} must be chosen according
                to the real/complex, single/double precision version of MLD2P4 under use.
-                See the PSBLAS User's Guide for details \cite{ }.\\
+                See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}.\\
 \verb|desc_a| & \verb|type(psb_desc_type), intent(in)|. \\
              & The communication descriptor of a. See the PSBLAS User's Guide for
-                details \cite{ }.\\
+                details \cite{PSBLASGUIDE}.\\
 \verb|p|      & \verb|type(mld_|\emph{x}\verb|prec_type), intent(inout)|.\\
              & The preconditioner data structure. Note that \emph{x} must be chosen according
                to the real/complex, single/double precision version of MLD2P4 under use.\\
--- a/docs/userguide.pdf
+++ b/docs/userguide.pdf