From 6097a8fbb61439e0310cbcd6e36c9c7693fe8bd2 Mon Sep 17 00:00:00 2001 From: Salvatore Filippone Date: Mon, 17 Jul 2017 15:24:32 +0100 Subject: [PATCH] New docs for release: merged Daniela's comments. --- docs/src/background.tex | 2 +- docs/src/bibliography.tex | 86 ++++---- docs/src/building.tex | 50 +++-- docs/src/distribution.tex | 1 - docs/src/gettingstarted.tex | 36 ++-- docs/src/intro.tex | 34 ---- docs/src/newobjects.tex | 85 ++++++++ docs/src/overview.tex | 6 +- docs/src/precs.tex | 384 ------------------------------------ docs/src/userguide.tex | 1 + docs/src/userhtml.tex | 3 +- docs/src/userinterface.tex | 188 ++++++------------ 12 files changed, 246 insertions(+), 630 deletions(-) delete mode 100644 docs/src/intro.tex create mode 100644 docs/src/newobjects.tex delete mode 100644 docs/src/precs.tex diff --git a/docs/src/background.tex b/docs/src/background.tex index 8b600155..d496242f 100644 --- a/docs/src/background.tex +++ b/docs/src/background.tex @@ -1 +1 @@ -\section{Multigrid Background\label{sec:background}} \markboth{\textsc{MLD2P4 User's and Reference Guide}} {\textsc{\ref{sec:background} Multigrid Background}} Multigrid preconditioners, coupled with Krylov iterative solvers, are widely used in the parallel solution of large and sparse linear systems, because of their optimality in the solution of linear systems arising from the discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids. Optimality, also known as algorithmic scalability, is the property of having a computational cost per iteration that depends linearly on the problem size, and a convergence rate that is independent of the problem size. Multigrid preconditioners are based on a recursive application of a two-grid process consisting of smoother iterations and a coarse-space (or coarse-level) correction. The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones, or more complex subspace-correction methods, such as the Schwarz ones. The coarse-space correction consists of solving, in an appropriately chosen coarse space, the residual equation associated with the approximate solution computed by the smoother, and of using the solution of this equation to correct the previous approximation. The transfer of information between the original (fine) space and the coarse one is performed by using suitable restriction and prolongation operators. The construction of the coarse space and the corresponding transfer operators is carried out by applying a so-called coarsening algorithm to the system matrix. Two main approaches can be used to perform coarsening: the geometric approach, which exploits the knowledge of some physical grid associated with the matrix and requires the user to define transfer operators from the fine to the coarse level and vice versa, and the algebraic approach, which builds the coarse-space correction and the associate transfer operators using only matrix information. The first approach may be difficult when the system comes from discretizations on complex geometries; furthermore, ad hoc one-level smoothers may be required to get an efficient interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients are considered. The second approach performs a fully automatic coarsening and enforces the interplay between fine and coarse level by suitably choosing the coarse space and the coarse-to-fine interpolation (see, e.g., \cite{Briggs2000,Stuben_01,dd2_96} for details.) MLD2P4 uses a pure algebraic approach, based on the smoothed aggregation algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}, for building the sequence of coarse matrices and transfer operators, starting from the original one. A decoupled version of this algorithm is implemented, where the smoothed aggregation is applied locally to each submatrix \cite{TUMINARO_TONG}. A brief description of the AMG preconditioners implemented in MLD2P4 is given in Sections~\ref{sec:multilevel}-\ref{sec:smoothers}. For further details the reader is referred to \cite{para_04,aaecc_07,apnum_07,MLD2P4_TOMS}. We note that optimal multigrid preconditioners do not necessarily correspond to minimum execution times in a parallel setting. Indeed, to obtain effective parallel multigrid preconditioners, a tradeoff between the optimality and the cost of building and applying the smoothers and the coarse-space corrections must be achieved. Effective parallel preconditioners require algorithmic scalability to be coupled with implementation scalability, i.e., a computational cost per iteration which remains (almost) constant as the number of parallel processors increases. \subsection{AMG preconditioners\label{sec:multilevel}} In order to describe the AMG preconditioners available in MLD2P4, we consider a linear system \begin{equation} Ax=b, \label{eq:system} \end{equation} where $A=(a_{ij}) \in \mathbb{R}^{n \times n}$ is a nonsingular sparse matrix; for ease of presentation we assume $A$ is real, but the results are valid for the complex case as well. Let us assume as finest index space the set of row (column) indices of $A$, i.e., $\Omega = \{1, 2, \ldots, n\}$. Any algebraic multilevel preconditioners implemented in MLD2P4 generates a hierarchy of index spaces and a corresponding hierarchy of matrices, \[ \Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev}, \quad A^1 \equiv A, A^2, \ldots, A^{nlev}, \] by using the information contained in $A$, without assuming any knowledge of the geometry of the problem from which $A$ originates. A vector space $\mathbb{R}^{n_{k}}$ is associated with $\Omega^k$, where $n_k$ is the size of $\Omega^k$. For all $k < nlev$, a restriction operator and a prolongation one are built, which connect two levels $k$ and $k+1$: $$ P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad R^k \in \mathbb{R}^{n_{k+1}\times n_k}; $$ %\[ % P^k: \mathbb{R}^{n_{k+1}} \longrightarrow \mathbb{R}^{n_k}, \quad % R^k: \mathbb{R}^{n_k} \longrightarrow \mathbb{R}^{n_{k+1}}; %\] the matrix $A^{k+1}$ is computed by using the previous operators according to the Galerkin approach, i.e., $$ A^{k+1}=R^kA^kP^k. $$ $R^k=(P^k)^T$ in the current implementation of MLD2P4. A smoother with iteration matrix $M^k$ is set up at each level $k < nlev$, and a solver is set up at the coarsest level, so that they are ready for application (for example, setting up a solver based on the $LU$ factorization means computing and storing the $L$ and $U$ factors). The construction of the hierachy of AMG components described so far corresponds to the so-called build phase of the preconditioner. \begin{figure}[t] \begin{center} \framebox{ \begin{minipage}{.85\textwidth} \begin{tabbing} \quad \=\quad \=\quad \=\quad \\[-3mm] procedure V-cycle$\left(k,A^k,b^k,u^k\right)$ \\[2mm] \>if $\left(k \ne nlev \right)$ then \\[1mm] \>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm] \>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$ \\[1mm] \>\> $u^{k+1} =$ V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$ \\[1mm] \>\> $u^k = u^k + P^{k+1} u^{k+1}$ \\[1mm] \>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm] \>else \\[1mm] \>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm] \>endif \\[1mm] \>return $u^k$ \\[1mm] end \end{tabbing} \end{minipage} } \caption{Application phase of a V-cycle preconditioner.\label{fig:application_alg}} \end{center} \end{figure} The components produced in the build phase may be combined in several ways to obtain different multilevel preconditioners; this is done in the application phase, i.e., in the computation of a vector of type $w=B^{-1}v$, where $B$ denotes the preconditioner, usually within an iteration of a Krylov solver \cite{Saad_book}. An example of such a combination, known as V-cycle, is given in Figure~\ref{fig:application_alg}. In this case, a single iteration of the same smoother is used before and after the the recursive call to the V-cycle (i.e., in the pre-smoothing and post-smoothing phases); however, different choices can be performed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycle and W-cycle~\cite{Briggs2000}, and a version of the K-cycle described in~\cite{Notay2008}. \subsection{Smoothed Aggregation\label{sec:aggregation}} In order to define the prolongator $P^k$, used to compute the coarse-level matrix $A^{k+1}$, MLD2P4 uses the smoothed aggregation algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}. The basic idea of this algorithm is to build a coarse set of indices $\Omega^{k+1}$ by suitably grouping the indices of $\Omega^k$ into disjoint subsets (aggregates), and to define the coarse-to-fine space transfer operator $P^k$ by applying a suitable smoother to a simple piecewise constant prolongation operator, with the aim of improving the quality of the coarse-space correction. Three main steps can be identified in the smoothed aggregation procedure: \begin{enumerate} \item aggregation of the indices set $\Omega^k$, to obtain $\Omega^{k+1}$; \item construction of the prolongator $P^k$; \item application of $P^k$ and $R^k=(P^k)^T$ to build $A^{k+1}$. \end{enumerate} In order to perform the coarsening step, the smoothed aggregation algorithm described in~\cite{VANEK_MANDEL_BREZINA} is used. In this algorithm, each index in $\Omega^{k+1}$ corresponds to an aggregate of $\Omega^k$, consisting of a suitably chosen index $j$ and of the indices $i$ that are strongly coupled to $j$, i.e., $$ |a_{ij}^k| > \theta \sqrt{|a_{ii}^ka_{jj}^k|}, $$ for a given $\theta \in [0,1]$. Since this algorithm has a sequential nature, a decoupled version of it is applied, where each processor $i$ independently executes the algorithm on the set of indices assigned to it in the initial data distribution. This version is embarrassingly parallel, since it does not require any data communication. On the other hand, it may produce some non-uniform aggregates and is strongly dependent on the number of processors and on the initial partitioning of the matrix $A$. Nevertheless, this parall algorithm has been chosen for MLD2P4, since it has been shown to produce good results in practice \cite{aaecc_07,apnum_07,TUMINARO_TONG}. The prolongator $P^k$ is built starting from a tentative prolongator $\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$, defined as $$ \bar{P}^k =(\bar{p}_{ij}^k), \quad \bar{p}_{ij}^k = \left\{ \begin{array}{ll} 1 & \quad \mbox{if} \; i \in \Omega^k_j, \\ 0 & \quad \mbox{otherwise}, \end{array} \right. \label{eq:tent_prol} $$ where $\Omega^k_j$ is the aggregate of $\Omega^k$ corresponding to the index $j \in \Omega^{k+1}$. $P^k$ is obtained by applying to $\bar{P}^k$ a smoother $S^k \in \mathbb{R}^{n_k \times n_k}$: $$ P^k = S^k \bar{P}^k, $$ in order to remove nonsmooth components from the range of the prolongator, and hence to improve the convergence properties of the multi-level method~\cite{BREZINA_VANEK,Stuben_01}. A simple choice for $S^k$ is the damped Jacobi smoother: $$ S^k = I - \omega^k (D^k)^{-1} A^k , $$ where $D^k$ is the diagonal matrix with the same diagonal entries as $A^k$, and $\omega^k$ is an approximation of $4/(3\rho^k)$, where $\rho^k$ is the spectral radius of $(D^k)^{-1}A^k$. computed by using some estimate of the spectral radius of $(D^k)^{-1}A^k$ \cite{BREZINA_VANEK}. \subsection{Smoothers and coarsest-level solvers\label{sec:smoothers}} The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods, a hybrid version of the forward and backward Gauss-Seidel methods, and the additive Schwarz (AS) ones (see, e.g., \cite{Saad_book,dd2_96}). The hybrid Gauss-Seidel version is considered because the original Gauss-Seidel method is inherently sequential. At each iteration of the hybrid version, each parallel process uses the most recent values of its own local variables and the values of the non-local variables computed at the previous iteration, obtained by exchanging data with other processes before the beginning of the current iteration. In the AS methods, the index space $\Omega^k$ is divided into $m_k$ subsets $\Omega^k_i$ of size $n_{k,i}$, possibly overlapping. For each $i$ we consider the restriction operator $R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$ % $R_i^k: \mathbb{R}^{n_k} \longrightarrow \mathbb{R}^{n_{k,i}}$ that maps a vector $x^k$ to the vector $x_i^k$ made of the components of $x^k$ with indices in $\Omega^k_i$, and the prolongation operator $P^k_i = (R_i^k)^T$. These operators are then used to build $A_i^k=R_i^kA^kP_i^k$, which is the restriction of $A^k$ to the index space $\Omega^k_i$. The classical AS preconditioner $M^k_{AS}$ is defined as \[ ( M^k_{AS} )^{-1} = \sum_{i=1}^{m_k} P_i^k (A_i^k)^{-1} R_i^{k}, \] where $A_i^k$ is supposed to be nonsingular. We observe that an approximate inverse of $A_i^k$ is usually considered instead of $(A_i^k)^{-1}$. The setup of $S^k_{AS}$ during the multilevel build phase involves \begin{itemize} \item the definition of the index subspaces $\Omega_i^k$ and of the corresponding operators $R_i^k$ (and $P_i^k$); \item the computation of the submatrices $A_i^k$; \item the computation of their inverses (usually approximated through some form of incomplete factorization). \end{itemize} The computation of $z^k=M^k_{AS}w^k$, with $w^k \in \mathbb{R}^{n_k}$, during the multilevel application phase, requires \begin{itemize} \item the restriction of $w^k$ to the subspaces $\mathbb{R}^{n_{k,i}}$, i.e.\ $w_i^k = R_i^{k} w^k$; \item the computation of the vectors $z_i^k=(A_i^k)^{-1} w_i^k$; \item the prolongation and the sum of the previous vectors, i.e.\ $z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$. \end{itemize} Variants of the classical AS method, which use modifications of the restriction and prolongation operators, are also implemented in MLD2P4. Among them, the Restricted AS (RAS) preconditioner usually outperforms the classical AS preconditioner in terms of convergence rate and of computation and communication time on parallel distributed-memory computers, and is therefore the most widely used among the AS preconditioners~\cite{CAI_SARKIS}. Direct solvers based on sparse LU factorizations, implemented in the third party libraries reported in Section~\ref{sec:third_party}, can be applied as coarsest-level solvers by MLD2P4. Native inexact solvers based on incomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel, and block Jacobi preconditioners are also available. Direct solvers usually lead to more effective preconditioners in terms of algorithmic scalability; however, this does not guarantee parallel efficiency. %%% Local Variables: %%% mode: latex %%% TeX-master: "userguide" %%% End: \ No newline at end of file +\section{Multigrid Background\label{sec:background}} \markboth{\textsc{MLD2P4 User's and Reference Guide}} {\textsc{\ref{sec:background} Multigrid Background}} Multigrid preconditioners, coupled with Krylov iterative solvers, are widely used in the parallel solution of large and sparse linear systems, because of their optimality in the solution of linear systems arising from the discretization of scalar elliptic Partial Differential Equations (PDEs) on regular grids. Optimality, also known as algorithmic scalability, is the property of having a computational cost per iteration that depends linearly on the problem size, and a convergence rate that is independent of the problem size. Multigrid preconditioners are based on a recursive application of a two-grid process consisting of smoother iterations and a coarse-space (or coarse-level) correction. The smoothers may be either basic iterative methods, such as the Jacobi and Gauss-Seidel ones, or more complex subspace-correction methods, such as the Schwarz ones. The coarse-space correction consists of solving, in an appropriately chosen coarse space, the residual equation associated with the approximate solution computed by the smoother, and of using the solution of this equation to correct the previous approximation. The transfer of information between the original (fine) space and the coarse one is performed by using suitable restriction and prolongation operators. The construction of the coarse space and the corresponding transfer operators is carried out by applying a so-called coarsening algorithm to the system matrix. Two main approaches can be used to perform coarsening: the geometric approach, which exploits the knowledge of some physical grid associated with the matrix and requires the user to define transfer operators from the fine to the coarse level and vice versa, and the algebraic approach, which builds the coarse-space correction and the associate transfer operators using only matrix information. The first approach may be difficult when the system comes from discretizations on complex geometries; furthermore, ad hoc one-level smoothers may be required to get an efficient interplay between fine and coarse levels, e.g., when matrices with highly varying coefficients are considered. The second approach performs a fully automatic coarsening and enforces the interplay between fine and coarse level by suitably choosing the coarse space and the coarse-to-fine interpolation (see, e.g., \cite{Briggs2000,Stuben_01,dd2_96} for details.) MLD2P4 uses a pure algebraic approach, based on the smoothed aggregation algorithm \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}, for building the sequence of coarse matrices and transfer operators, starting from the original one. A decoupled version of this algorithm is implemented, where the smoothed aggregation is applied locally to each submatrix \cite{TUMINARO_TONG}. A brief description of the AMG preconditioners implemented in MLD2P4 is given in Sections~\ref{sec:multilevel}-\ref{sec:smoothers}. For further details the reader is referred to \cite{para_04,aaecc_07,apnum_07,MLD2P4_TOMS}. We note that optimal multigrid preconditioners do not necessarily correspond to minimum execution times in a parallel setting. Indeed, to obtain effective parallel multigrid preconditioners, a tradeoff between the optimality and the cost of building and applying the smoothers and the coarse-space corrections must be achieved. Effective parallel preconditioners require algorithmic scalability to be coupled with implementation scalability, i.e., a computational cost per iteration which remains (almost) constant as the number of parallel processors increases. \subsection{AMG preconditioners\label{sec:multilevel}} In order to describe the AMG preconditioners available in MLD2P4, we consider a linear system \begin{equation} Ax=b, \label{eq:system} \end{equation} where $A=(a_{ij}) \in \mathbb{R}^{n \times n}$ is a nonsingular sparse matrix; for ease of presentation we assume $A$ is real, but the results are valid for the complex case as well. Let us assume as finest index space the set of row (column) indices of $A$, i.e., $\Omega = \{1, 2, \ldots, n\}$. Any algebraic multilevel preconditioners implemented in MLD2P4 generates a hierarchy of index spaces and a corresponding hierarchy of matrices, \[ \Omega^1 \equiv \Omega \supset \Omega^2 \supset \ldots \supset \Omega^{nlev}, \quad A^1 \equiv A, A^2, \ldots, A^{nlev}, \] by using the information contained in $A$, without assuming any knowledge of the geometry of the problem from which $A$ originates. A vector space $\mathbb{R}^{n_{k}}$ is associated with $\Omega^k$, where $n_k$ is the size of $\Omega^k$. For all $k < nlev$, a restriction operator and a prolongation one are built, which connect two levels $k$ and $k+1$: $$ P^k \in \mathbb{R}^{n_k \times n_{k+1}}, \quad R^k \in \mathbb{R}^{n_{k+1}\times n_k}; $$ %\[ % P^k: \mathbb{R}^{n_{k+1}} \longrightarrow \mathbb{R}^{n_k}, \quad % R^k: \mathbb{R}^{n_k} \longrightarrow \mathbb{R}^{n_{k+1}}; %\] the matrix $A^{k+1}$ is computed by using the previous operators according to the Galerkin approach, i.e., $$ A^{k+1}=R^kA^kP^k. $$ $R^k=(P^k)^T$ in the current implementation of MLD2P4. A smoother with iteration matrix $M^k$ is set up at each level $k < nlev$, and a solver is set up at the coarsest level, so that they are ready for application (for example, setting up a solver based on the $LU$ factorization means computing and storing the $L$ and $U$ factors). The construction of the hierarchy of AMG components described so far corresponds to the so-called build phase of the preconditioner. \begin{figure}[t] \begin{center} \framebox{ \begin{minipage}{.85\textwidth} \begin{tabbing} \quad \=\quad \=\quad \=\quad \\[-3mm] procedure V-cycle$\left(k,A^k,b^k,u^k\right)$ \\[2mm] \>if $\left(k \ne nlev \right)$ then \\[1mm] \>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm] \>\> $b^{k+1} = R^{k+1}\left(b^k - A^k u^k\right)$ \\[1mm] \>\> $u^{k+1} =$ V-cycle$\left(k+1,A^{k+1},b^{k+1},0\right)$ \\[1mm] \>\> $u^k = u^k + P^{k+1} u^{k+1}$ \\[1mm] \>\> $u^k = u^k + M^k \left(b^k - A^k u^k\right)$ \\[1mm] \>else \\[1mm] \>\> $u^k = \left(A^k\right)^{-1} b^k$\\[1mm] \>endif \\[1mm] \>return $u^k$ \\[1mm] end \end{tabbing} \end{minipage} } \caption{Application phase of a V-cycle preconditioner.\label{fig:application_alg}} \end{center} \end{figure} The components produced in the build phase may be combined in several ways to obtain different multilevel preconditioners; this is done in the application phase, i.e., in the computation of a vector of type $w=B^{-1}v$, where $B$ denotes the preconditioner, usually within an iteration of a Krylov solver \cite{Saad_book}. An example of such a combination, known as V-cycle, is given in Figure~\ref{fig:application_alg}. In this case, a single iteration of the same smoother is used before and after the the recursive call to the V-cycle (i.e., in the pre-smoothing and post-smoothing phases); however, different choices can be performed. Other cycles can be defined; in MLD2P4, we implemented the standard V-cycle and W-cycle~\cite{Briggs2000}, and a version of the K-cycle described in~\cite{Notay2008}. \subsection{Smoothed Aggregation\label{sec:aggregation}} In order to define the prolongator $P^k$, used to compute the coarse-level matrix $A^{k+1}$, MLD2P4 uses the smoothed aggregation algorithm described in \cite{BREZINA_VANEK,VANEK_MANDEL_BREZINA}. The basic idea of this algorithm is to build a coarse set of indices $\Omega^{k+1}$ by suitably grouping the indices of $\Omega^k$ into disjoint subsets (aggregates), and to define the coarse-to-fine space transfer operator $P^k$ by applying a suitable smoother to a simple piecewise constant prolongation operator, with the aim of improving the quality of the coarse-space correction. Three main steps can be identified in the smoothed aggregation procedure: \begin{enumerate} \item aggregation of the indices of $\Omega^k$ to obtain $\Omega^{k+1}$; \item construction of the prolongator $P^k$; \item application of $P^k$ and $R^k=(P^k)^T$ to build $A^{k+1}$. \end{enumerate} In order to perform the coarsening step, the smoothed aggregation algorithm described in~\cite{VANEK_MANDEL_BREZINA} is used. In this algorithm, each index $j \in \Omega^{k+1}$ corresponds to an aggregate $\Omega^k_j$ of $\Omega^k$, consisting of a suitably chosen index $i \in \Omega^k$ and indices that are (usually) contained in a strongly-coupled neighborood of $i$, i.e., \begin{equation} \label{eq:strongly_coup} \Omega^k_j \subset \mathcal{N}_i^k(\theta) = \left\{ r \in \Omega^k: |a_{ir}^k| > \theta \sqrt{|a_{ii}^ka_{rr}^k|} \right \} \cup \left\{ i \right\}, \end{equation} for a given threshold $\theta \in [0,1]$ (see~\cite{VANEK_MANDEL_BREZINA} for the details). Since this algorithm has a sequential nature, a decoupled version of it is applied, where each processor independently executes the algorithm on the set of indices assigned to it in the initial data distribution. This version is embarrassingly parallel, since it does not require any data communication. On the other hand, it may produce some nonuniform aggregates and is strongly dependent on the number of processors and on the initial partitioning of the matrix $A$. Nevertheless, this parallel algorithm has been chosen for MLD2P4, since it has been shown to produce good results in practice \cite{aaecc_07,apnum_07,TUMINARO_TONG}. The prolongator $P^k$ is built starting from a tentative prolongator $\bar{P}^k \in \mathbb{R}^{n_k \times n_{k+1}}$, defined as $$ \bar{P}^k =(\bar{p}_{ij}^k), \quad \bar{p}_{ij}^k = \left\{ \begin{array}{ll} 1 & \quad \mbox{if} \; i \in \Omega^k_j, \\ 0 & \quad \mbox{otherwise}, \end{array} \right. \label{eq:tent_prol} $$ where $\Omega^k_j$ is the aggregate of $\Omega^k$ corresponding to the index $j \in \Omega^{k+1}$. $P^k$ is obtained by applying to $\bar{P}^k$ a smoother $S^k \in \mathbb{R}^{n_k \times n_k}$: $$ P^k = S^k \bar{P}^k, $$ in order to remove nonsmooth components from the range of the prolongator, and hence to improve the convergence properties of the multi-level method~\cite{BREZINA_VANEK,Stuben_01}. A simple choice for $S^k$ is the damped Jacobi smoother: $$ S^k = I - \omega^k (D^k)^{-1} A^k_F , $$ where $D^k$ is the diagonal matrix with the same diagonal entries as $A^k$, $A^k_F = (\bar{a}_{ij}^k)$ is the filtered matrix defined as \begin{equation} \label{eq:filtered} \bar{a}_{ij}^k = \left \{ \begin{array}{ll} a_{ij}^k & \mbox{if } j \in \mathcal{N}_i^k(\theta), \\ 0 & \mbox{otherwise}, \end{array} \right. \; (j \ne i), \qquad \bar{a}_{ii}^k = a_{ii}^k - \sum_{j \ne i} (a_{ij}^k - \bar{a}_{ij}^k), \end{equation} and $\omega^k$ is an approximation of $4/(3\rho^k)$, where $\rho^k$ is the spectral radius of $(D^k)^{-1}A^k_F$ \cite{BREZINA_VANEK}. In MLD2P4 this approximation is obtained by using $\| A^k_F \|_\infty$ as an estimate of $\rho^k$. Note that for systems coming from uniformly elliptic problems, filtering the matrix $A^k$ has little or no effect, and $A^k$ can be used instead of $A^k_F$. The latter choice is the default in MLD2P4. \subsection{Smoothers and coarsest-level solvers\label{sec:smoothers}} The smoothers implemented in MLD2P4 include the Jacobi and block-Jacobi methods, a hybrid version of the forward and backward Gauss-Seidel methods, and the additive Schwarz (AS) ones (see, e.g., \cite{Saad_book,dd2_96}). The hybrid Gauss-Seidel version is considered because the original Gauss-Seidel method is inherently sequential. At each iteration of the hybrid version, each parallel process uses the most recent values of its own local variables and the values of the non-local variables computed at the previous iteration, obtained by exchanging data with other processes before the beginning of the current iteration. In the AS methods, the index space $\Omega^k$ is divided into $m_k$ subsets $\Omega^k_i$ of size $n_{k,i}$, possibly overlapping. For each $i$ we consider the restriction operator $R_i^k \in \mathbb{R}^{n_{k,i} \times n_k}$ % $R_i^k: \mathbb{R}^{n_k} \longrightarrow \mathbb{R}^{n_{k,i}}$ that maps a vector $x^k$ to the vector $x_i^k$ made of the components of $x^k$ with indices in $\Omega^k_i$, and the prolongation operator $P^k_i = (R_i^k)^T$. These operators are then used to build $A_i^k=R_i^kA^kP_i^k$, which is the restriction of $A^k$ to the index space $\Omega^k_i$. The classical AS preconditioner $M^k_{AS}$ is defined as \[ ( M^k_{AS} )^{-1} = \sum_{i=1}^{m_k} P_i^k (A_i^k)^{-1} R_i^{k}, \] where $A_i^k$ is supposed to be nonsingular. We observe that an approximate inverse of $A_i^k$ is usually considered instead of $(A_i^k)^{-1}$. The setup of $M^k_{AS}$ during the multilevel build phase involves \begin{itemize} \item the definition of the index subspaces $\Omega_i^k$ and of the corresponding operators $R_i^k$ (and $P_i^k$); \item the computation of the submatrices $A_i^k$; \item the computation of their inverses (usually approximated through some form of incomplete factorization). \end{itemize} The computation of $z^k=M^k_{AS}w^k$, with $w^k \in \mathbb{R}^{n_k}$, during the multilevel application phase, requires \begin{itemize} \item the restriction of $w^k$ to the subspaces $\mathbb{R}^{n_{k,i}}$, i.e.\ $w_i^k = R_i^{k} w^k$; \item the computation of the vectors $z_i^k=(A_i^k)^{-1} w_i^k$; \item the prolongation and the sum of the previous vectors, i.e.\ $z^k = \sum_{i=1}^{m_k} P_i^k z_i^k$. \end{itemize} Variants of the classical AS method, which use modifications of the restriction and prolongation operators, are also implemented in MLD2P4. Among them, the Restricted AS (RAS) preconditioner usually outperforms the classical AS preconditioner in terms of convergence rate and of computation and communication time on parallel distributed-memory computers, and is therefore the most widely used among the AS preconditioners~\cite{CAI_SARKIS}. Direct solvers based on sparse LU factorizations, implemented in the third-party libraries reported in Section~\ref{sec:third-party}, can be applied as coarsest-level solvers by MLD2P4. Native inexact solvers based on incomplete LU factorizations, as well as Jacobi, hybrid (forward) Gauss-Seidel, and block Jacobi preconditioners are also available. Direct solvers usually lead to more effective preconditioners in terms of algorithmic scalability; however, this does not guarantee parallel efficiency. %%% Local Variables: %%% mode: latex %%% TeX-master: "userguide" %%% End: \ No newline at end of file diff --git a/docs/src/bibliography.tex b/docs/src/bibliography.tex index 7d8290fa..98305063 100644 --- a/docs/src/bibliography.tex +++ b/docs/src/bibliography.tex @@ -6,13 +6,19 @@ %\let\refname\relax % +\bibitem{MUMPS} +P.~R.~Amestoy, C.~Ashcraft, O.~Boiteau, A.~Buttari, J.~L'Excellent, C.~Weisbecker, +{\em Improving multifrontal methods by means of block low-rank representations}, +SIAM Journal on Scientific Computing, volume 37 (3), 2015, A1452--A1474. +See also {\tt http://mumps.enseeiht.fr}. +% \bibitem{BREZINA_VANEK} -M.~Brezina, P.~Van{\v e}k, +M.~Brezina, P.~Van\v{e}k, {\em A Black-Box Iterative Solver Based on a Two-Level Schwarz Method}, Computing, 63, 1999, 233--263. % \bibitem{Briggs2000} -W.~L.~Briggs, V.~E.~Henson, S.~F.~ McCormick, +W.~L.~Briggs, V.~E.~Henson, S.~F.~McCormick, {\em A Multigrid Tutorial, Second Edition}, SIAM, 2000. % @@ -32,28 +38,29 @@ Applicable Algebra in Engineering, Communications and Computing, 18 (3) 2007, 223--239. %Published online: 13 February 2007, {\tt http://dx.doi.org/10.1007/s00200-007-0035-z} % -\bibitem{apnum_07} P.~D'Ambra, S.~Filippone, D.~di~Serafino, -{\em On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners}, -Applied Numerical Mathematics, Elsevier Science, -57 (11-12), 2007, 1181-1196. -%published online 3 February 2007, {\tt -% http://dx.doi.org/10.1016/j.apnum.2007.01.006} -% \bibitem{CAI_SARKIS} X.~C.~Cai, M.~Sarkis, {\em A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems}, SIAM Journal on Scientific Computing, 21 (2), 1999, 792--797. % -\bibitem{Cai_Widlund_92} -X.~C.~Cai, O.~B.~Widlund, -{\em Domain Decomposition Algorithms for Indefinite Elliptic Problems}, -SIAM Journal on Scientific and Statistical Computing, 13 (1), 1992, 243--258. +%\bibitem{Cai_Widlund_92} +%X.~C.~Cai, O.~B.~Widlund, +%{\em Domain Decomposition Algorithms for Indefinite Elliptic Problems}, +%SIAM Journal on Scientific and Statistical Computing, 13 (1), 1992, 243--258. +% +%\bibitem{dd1_94} +%T.~Chan and T.~Mathew, +%{\em Domain Decomposition Algorithms}, +%in A.~Iserles, editor, Acta Numerica 1994, 61--143. +%Cambridge University Press. % -\bibitem{dd1_94} -T.~Chan and T.~Mathew, -{\em Domain Decomposition Algorithms}, -in A.~Iserles, editor, Acta Numerica 1994, 61--143. -Cambridge University Press. +\bibitem{apnum_07} +P.~D'Ambra, S.~Filippone, D.~di~Serafino, +{\em On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners}, +Applied Numerical Mathematics, Elsevier Science, +57 (11-12), 2007, 1181-1196. +%published online 3 February 2007, {\tt +% http://dx.doi.org/10.1016/j.apnum.2007.01.006} % \bibitem{MLD2P4_TOMS} P.~D'Ambra, D.~di~Serafino, S.~Filippone, @@ -62,20 +69,14 @@ Algebraic Domain Decomposition Preconditioners in Fortran 95}, ACM Trans. Math. Softw., 37(3), 2010, art. 30. % \bibitem{UMFPACK} -T.A.~Davis, +T.~A.~Davis, {\em Algorithm 832: UMFPACK - an Unsymmetric-pattern Multifrontal Method with a Column Pre-ordering Strategy}, ACM Transactions on Mathematical Software, 30, 2004, 196--199. (See also {\tt http://www.cise.ufl.edu/~davis/}) % -\bibitem{MUMPS} -P.R.~Amestoy, C.~Ashcraft, O.~Boiteau, A.~Buttari, J.~L'Excellent, C.~Weisbecker -{\em Improving multifrontal methods by means of block low-rank representations}, -SIAM Journal on Scientific Computing, volume 37 (3), 2015, A1452--A1474. -See also {\tt http://mumps.enseeiht.fr}. -% \bibitem{SUPERLU} -J.W.~Demmel, S.C.~Eisenstat, J.R.~Gilbert, X.S.~Li and J.W.H.~Liu, +J.~W.~Demmel, S.~C.~Eisenstat, J.~R.~Gilbert, X.~S.~Li, J.~W.~H.~Liu, A supernodal approach to sparse partial pivoting, SIAM Journal on Matrix Analysis and Applications, 20 (3), 1999, 720--755. % @@ -89,16 +90,16 @@ J.~J.~Dongarra, J.~Du Croz, S.~Hammarling, R.~J.~Hanson, \emph{An extended set of FORTRAN Basic Linear Algebra Subprograms}, ACM Transactions on Mathematical Software, 14 (1) 1988, 1--17. % -\bibitem{BLACS} -J.~J.~Dongarra and R.~C.~Whaley, -{\em A User's Guide to the BLACS v.~1.1}, -Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of -Tennessee, March 1995 (updated May 1997). +%\bibitem{BLACS} +%J.~J.~Dongarra, R.~C.~Whaley, +%{\em A User's Guide to the BLACS v.~1.1}, +%Lapack Working Note 94, Tech.\ Rep.\ UT-CS-95-281, University of +%Tennessee, March 1995 (updated May 1997). % -\bibitem{EFSTATHIOU} -E.~Efstathiou, J.~G.~Gander, -{\em Why Restricted Additive Schwarz Converges Faster than Additive Schwarz}, -BIT Numerical Mathematics, 43 (5), 2003, 945--959. +%\bibitem{EFSTATHIOU} +%E.~Efstathiou, J.~G.~Gander, +%{\em Why Restricted Additive Schwarz Converges Faster than Additive Schwarz}, +%BIT Numerical Mathematics, 43 (5), 2003, 945--959. % \bibitem{PSBLASGUIDE} S.~Filippone, A.~Buttari, @@ -106,9 +107,9 @@ S.~Filippone, A.~Buttari, available from \texttt{http://www.ce.uniroma2.it/psblas/}. % \bibitem{PSBLAS3} -Salvatore Filippone and Alfredo Buttari. +S.~Filippone, A.~Buttari, {\em Object-Oriented Techniques for Sparse Matrix Computations in Fortran 2003}. -ACM Transactions on on Mathematical Software, 38 (4), 2012, art. 23. +ACM Transactions on on Mathematical Software, 38 (4), 2012, art.~23. % \bibitem{psblas_00} S.~Filippone, M.~Colajanni, @@ -127,12 +128,14 @@ C.~L.~Lawson, R.~J.~Hanson, D.~Kincaid, F.~T.~Krogh, ACM Transactions on Mathematical Software, 5 (3), 1979, 308--323. % \bibitem{SUPERLUDIST} -X.~S.~Li, J.~W.~Demmel, {\em SuperLU\_DIST: A Scalable Distributed-memory +X.~S.~Li, J.~W.~Demmel, +{\em SuperLU\_DIST: A Scalable Distributed-memory Sparse Direct Solver for Unsymmetric Linear Systems}, ACM Transactions on Mathematical Software, 29 (2), 2003, 110--140. % \bibitem{Notay2008} -Y.~Notay, P.~S.~Vassilevski, {\em Recursive Krylov-based multigrid cycles}, +Y.~Notay, P.~S.~Vassilevski, +{\em Recursive Krylov-based multigrid cycles}, Numerical Linear Algebra with Applications, 15 (5), 2008, 473--487. % \bibitem{Saad_book} @@ -149,7 +152,7 @@ Cambridge University Press, 1996. M.~Snir, S.~Otto, S.~Huss-Lederman, D.~Walker, J.~Dongarra, {\em MPI: The Complete Reference. Volume 1 - The MPI Core}, second edition, MIT Press, 1998. -%% +% \bibitem{Stuben_01} K.~St\"{u}ben, {\em An Introduction to Algebraic Multigrid}, @@ -161,9 +164,8 @@ R.~S.~Tuminaro, C.~Tong, {\em Parallel Smoothed Aggregation Multigrid: Aggregation Strategies on Massively Parallel Machines}, in J. Donnelley, editor, Proceedings of SuperComputing 2000, Dallas, 2000. % \bibitem{VANEK_MANDEL_BREZINA} -P.~Van{\v e}k, J.~Mandel and M.~Brezina, +P.~Van\v{e}k, J.~Mandel, M.~Brezina, {\em Algebraic Multigrid by Smoothed Aggregation for Second and Fourth Order Elliptic Problems}, Computing, 56 (3) 1996, 179--196. -% \end{thebibliography} diff --git a/docs/src/building.tex b/docs/src/building.tex index bc0e49e6..163d2dfe 100644 --- a/docs/src/building.tex +++ b/docs/src/building.tex @@ -17,7 +17,15 @@ recommend to use at least version 4.8. The software defines data types and interfaces for real and complex data, in both single and double precision. -\subsection{Prerequisites} +Building MLD2P4 requires some base libraries (see Section~\ref{sec:prerequisites}); +interfaces to optional third-party libraries, which extend the functionalities of MLD2P4 +(see Section~\ref{sec:third-party}), are also available. Many Linux distributions +(e.g., Ubuntu, Fedora, CentOS) provide precompiled packages for the prerequisite and +optional software. In many cases these packages are split between a runtime part and a +``developer'' part; in order to build MLD2P4 you need both. A description of the base and +optional software used by MLD2P4 is given in the next sections. + +\subsection{Prerequisites\label{sec:prerequisites}} The following base libraries are needed: \begin{description} @@ -68,18 +76,18 @@ for multi-level preconditioners may change to reflect their presence. A sparse LU factorization package available from \url{mumps.enseeiht.fr}; it provides sequential and parallel factorizations and triangular system solution for single and double precision, real and complex data. - We tested versions 4.10.0 and version 5.0.1. + We tested versions 4.10.0 and 5.0.1. \item[SuperLU] \cite{SUPERLU} A sparse LU factorization package available from \url{crd.lbl.gov/~xiaoye/SuperLU/}; it provides sequential factorization and triangular system solution for single and double precision, - real and complex data. We tested version 4.3 and 5.0. If you installed BLAS from + real and complex data. We tested versions 4.3 and 5.0. If you installed BLAS from ATLAS, remember to define the BLASLIB variable in the make.inc file. \item[SuperLU\_Dist] \cite{SUPERLUDIST} A sparse LU factorization package available from the same site as SuperLU; it provides parallel factorization and triangular system solution for double precision real and complex data. - We tested version 3.3 and 4.2. If you installed BLAS from + We tested versions 3.3 and 4.2. If you installed BLAS from ATLAS, remember to define the BLASLIB variable in the make.inc file and to add the \verb|-std=c99| option to the C compiler options. Note that this library requires the ParMETIS @@ -278,7 +286,7 @@ install directory under the name \verb|Make.inc.MLD2P4|. To use the MUMPS solver package, the user has to add the appropriate options to the configure script; by default we are looking for the libraries -\verb|-ldmumps -lsmumps| \verb|-lzmumps -mumps_common -lpord|. +\verb|-ldmumps -lsmumps| \verb| -lzmumps -lcmumps -mumps_common -lpord|. MUMPS often uses additional packages such as ScaLAPACK, ParMETIS, SCOTCH, as well as enabling OpenMP; in such cases it is necessary to add linker options with the \verb|--with-extra-libs| configure option. @@ -291,23 +299,23 @@ followed (optionally) by \begin{verbatim} make install \end{verbatim} -Many Linux distributions (e.g. Ubuntu, Fedora, CentOS) provide -precompiled packages for the prerequisite softwares; in many cases the -software packages are split between a runtime part and a ``developer'' -part, to rebuild MLD2P4 you'll need both. - \subsection{Bug reporting} -If you find any bugs in our codes, please let us know at -\begin{rawhtml} - -\end{rawhtml} -\texttt{bugreport@mld2p4.it} -\begin{rawhtml} - -\end{rawhtml} -; be aware that -the amount of information needed to reproduce a problem in a parallel -program may vary quite a lot. +If you find any bugs in our codes, please send an email to \\[2mm] +\texttt{pasqua.dambra@cnr.it} \\ +\texttt{daniela.diserafino@unicampania.it} \\ +\texttt{salvatore.filippone@cranfield.ac.uk} \\[2mm] +% please let us know at +%\begin{rawhtml} +% +%\end{rawhtml} +%\texttt{bugreport@mld2p4.it} +%\begin{rawhtml} +% +%\end{rawhtml} +%; +You should be aware that the amount of information needed to reproduce a problem +in a parallel program may vary quite a lot. + \subsection{Example and test programs\label{sec:ex_and_test}} The package contains the \verb|examples| and \verb|tests| directories; both of them are further divided into \verb|fileread| and diff --git a/docs/src/distribution.tex b/docs/src/distribution.tex index 6fd6c7ae..bc16b84b 100644 --- a/docs/src/distribution.tex +++ b/docs/src/distribution.tex @@ -8,7 +8,6 @@ MLD2P4 is available from the web site \texttt{http://www.mld2p4.it} \end{quotation} where contact points for further information can be also found. -{\bf Passiamo subito a GitHub?} The software is available under a modified BSD license, as specified in Appendix~\ref{sec:license}; please note that some of the optional diff --git a/docs/src/gettingstarted.tex b/docs/src/gettingstarted.tex index 8f99a12c..758d28f1 100644 --- a/docs/src/gettingstarted.tex +++ b/docs/src/gettingstarted.tex @@ -70,7 +70,7 @@ No preconditioner &\verb|'NOPREC'|& Considered only to use the PSBLAS Krylov solvers with no preconditioner. \\ \hline Diagonal & \verb|'DIAG'| or \verb|'JACOBI'| & Diagonal preconditioner. For any zero diagonal entry of the matrix to be preconditioned, - the corresponding entry of he preconditioner is set to~1.\\ \hline + the corresponding entry of the preconditioner is set to~1.\\ \hline Block Jacobi & \verb|'BJAC'| & Block-Jacobi with ILU(0) on the local blocks.\\ \hline Additive Schwarz & \verb|'AS'| & Restricted Additive Schwarz (RAS), with overlap~1 and ILU(0) on the local blocks. \\ \hline @@ -176,11 +176,11 @@ the corresponding codes are available in \verb|examples/fileread/|. ! with basic smoothed aggregation, 1 hybrid forward/backward ! GS sweep as pre/post-smoother and UMFPACK as coarsest-level ! solver - call P%init(P,'ML',info) + call P%init('ML',info) ! ! build the preconditioner - call P%hierarchy_build(A,desc_A,P,info) - call P%smoothers_build(A,desc_A,P,info) + call P%hierarchy_build(A,desc_A,info) + call P%smoothers_build(A,desc_A,info) ! ! set the solver parameters and the initial guess @@ -191,7 +191,7 @@ the corresponding codes are available in \verb|examples/fileread/|. ... ... ! ! deallocate the preconditioner - call P%free(P,info) + call P%free(info) ! ! deallocate other data structures ... ... @@ -260,12 +260,12 @@ boundary conditions are also available in the directory \verb|examples/pdegen|. ! build a V-cycle preconditioner with 1 block-Jacobi sweep (with ! ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi ! sweeps (with ILU(0) on the blocks) as coarsest-level solver - call P%init(P,'ML',info) - call_P%set(P,'SMOOTHER_TYPE','BJAC',info) - call P%set(P,'COARSE_SOLVE','BJAC',info) - call P%set(P,'COARSE_SWEEPS',8,info) - call P%hierarchy_build(A,desc_A,P,info) - call P%smoothers_build(A,desc_A,P,info) + call P%init('ML',info) + call_P%set('SMOOTHER_TYPE','BJAC',info) + call P%set('COARSE_SOLVE','BJAC',info) + call P%set('COARSE_SWEEPS',8,info) + call P%hierarchy_build(A,desc_A,info) + call P%smoothers_build(A,desc_A,info) ... ... \end{verbatim} } @@ -284,17 +284,17 @@ boundary conditions are also available in the directory \verb|examples/pdegen|. ! build a W-cycle preconditioner with 2 Gauss-Seidel sweeps as ! post-smoother (and no pre-smoother), a distributed coarsest ! matrix, and MUMPS as coarsest-level solver - call P%init(P,'ML',info) + call P%init('ML',info) call P%set('ML_TYPE','WCYCLE',info) call P%set('SMOOTHER_TYPE','GS',info) call P%set('SMOOTHER_SWEEPS',0,info,pos='PRE') call P%set('SMOOTHER_SWEEPS',2,info,pos='POST') call P%set('COARSE_SOLVE','MUMPS',info) call P%set('COARSE_MAT','DIST',info) - call P%hierarchy_build(A,desc_A,P,info) - call P%smoothers_build(A,desc_A,P,info) + call P%hierarchy_build(A,desc_A,info) + call P%smoothers_build(A,desc_A,info) ... ... -! solve Ax=b with preconditioned CG +! solve Ax=b with preconditioned BiCGSTAB call psb_krylov('BICGSTAB',A,P,b,x,tol,desc_A,info) \end{verbatim} } @@ -310,9 +310,9 @@ boundary conditions are also available in the directory \verb|examples/pdegen|. \begin{verbatim} ... ... ! set RAS with overlap 2 and ILU(0) on the local blocks - call P%init(P,'AS',info) - call P%set(P,'SUB_OVR',2,info) - call P%bld(A,desc_A,P,info) + call P%init('AS',info) + call P%set('SUB_OVR',2,info) + call P%bld(A,desc_A,info) ... ... \end{verbatim} } diff --git a/docs/src/intro.tex b/docs/src/intro.tex deleted file mode 100644 index 864ab263..00000000 --- a/docs/src/intro.tex +++ /dev/null @@ -1,34 +0,0 @@ -\section{Introduction}\label{sec:intro} -\markboth{\underline{MLD2P4 User's and Reference Guide}} - {\underline{\ref{sec:overview} Introduction}} - -The MLD2P4 library provides .... - - -\subsection{Programming model} - -The MLD2P4 librarary is based on the Single Program Multiple Data -(SPMD) programming model: each process participating in the -computation performs the same actions on a chunk of data. Parallelism -is thus data-driven. - -Because of this structure, many subroutines coordinate their action -across the various processes, thus providing an implicit -synchronization point, and therefore \emph{must} be -called simultaneously by all processes participating in the -computation. -However there are many cases where no synchronization, and indeed no -communication among processes, is implied. - -Throughout this user's guide each subroutine will be clearly indicated -as: -\begin{description} -\item[Synchronous:] must be called simultaneously by all the - processes in the relevant communication context; -\item[Asynchronous:] may be called in a totally independent manner. -\end{description} - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "userguide" -%%% End: diff --git a/docs/src/newobjects.tex b/docs/src/newobjects.tex new file mode 100644 index 00000000..aa7ec330 --- /dev/null +++ b/docs/src/newobjects.tex @@ -0,0 +1,85 @@ + +\clearpage + +\section{Adding new smoother and solver objects to MLD2P4\label{sec:adding}} + +Developers can add completely new smoother and/or solver classes +derived from the base objects in the library (see Remark~2 in Section~\ref{sec:precset}), +without recompiling the library itself. + +To do so, it is necessary first to select the base type to be extended. +In our experience, it is quite likely that the new application needs +only the definition of a ``solver'' object, which is almost +always acting only on the local part of the distributed matrix. +The parallel actions required to connect the various solver objects +are most often already provided by the block-Jacobi or the additive +Schwarz smoothers. To define a new solver, the developer will then +have to define its components and methods, perhaps taking one of the +predefined solvers as a starting point, if possible. + +Once the new smoother/solver class has been developed, to use it in +the context of the multilevel preconditioners it is necessary to: +\begin{itemize} +\item declare in the application program a variable of the new type; +\item pass that variable as the argument to the \verb|set| routine as in the +following: +\begin{center} +\verb|call p%set(smoother,info [,ilev,ilmax,pos])|\\ +\verb|call p%set(solver,info [,ilev,ilmax,pos])| +\end{center} +\item link the code implementing the various methods into the application executable. +\end{itemize} +The new solver object is then dynamically included in the +preconditioner structure, and acts as a \emph{mold} to which the +preconditioner will conform, even though the MLD2P4 library has not +been modified to account for this new development. + +It is possible to define new values for the keyword \verb|WHAT| in the +\verb|set| routine; if the library code does not recognize a keyword, +it passes it down the composition hierarchy (levels containing +smoothers containing in turn solvers), so that it can be eventually caught by +the new solver. + +An example is provided in the source code distribution under the +folder \verb|tests/newslv|. In this example we are implementing a new +incomplete factorization variant (which is simply the ILU(0) +factorization under a new name). Because of the specifics of this case, it is +possible to reuse the basic structure of the ILU solver, with its +L/D/U components and the methods needed to apply the solver; only a +few methods, such as the description and most importantly the build, +need to be ovverridden (rewritten). + +The interfaces for the calls shown above are defined using +\begin{center} +\begin{tabular}{p{1.4cm}p{12cm}} +\verb|smoother| & \verb|class(mld_x_base_smoother_type)| \\ + & The user-defined new smoother to be employed in the + preconditioner.\\ +\verb|solver| & \verb|class(mld_x_base_solver_type)| \\ + & The user-defined new solver to be employed in the + preconditioner. +\end{tabular} +\end{center} +The other arguments are defined in the way described in +Sec.~\ref{sec:precset}. As an example, in the \verb|tests/newslv| +code we define a new object of type \verb|mld_d_tlu_solver_type|, and +we pass it as follows: +\begin{verbatim} + + ! sparse matrix and preconditioner + type(psb_dspmat_type) :: a + type(mld_dprec_type) :: prec + type(mld_d_tlu_solver_type) :: tlusv + +...... + ! + ! prepare the preconditioner: an ML with defaults, but with TLU solver at + ! intermediate levels. All other parameters are at default values. + ! + call prec%init('ML', info) + call prec%hierarchy_build(a,desc_a,info) + nlv = prec%get_nlevs() + call prec%set(tlusv, info,ilev=1,ilmax=max(1,nlv-1)) + call prec%smoothers_build(a,desc_a,info) + +\end{verbatim} diff --git a/docs/src/overview.tex b/docs/src/overview.tex index 30079b4b..6c88056f 100644 --- a/docs/src/overview.tex +++ b/docs/src/overview.tex @@ -25,7 +25,7 @@ The multi-level preconditioners implemented in MLD2P4 are obtained by combining AMG cycles with smoothers and coarsest-level solvers. The V-, W-, and K-cycles~\cite{Briggs2000,Notay2008} are available, which allow to define almost all the preconditioners in the package, including the multi-level hybrid -Schwarz ones; a specific cycle is implemented to obained multi-level additive +Schwarz ones; a specific cycle is implemented to obtain multi-level additive Schwarz preconditioners. The Jacobi, hybrid %\footnote{see Note 2 in Table~\ref{tab:p_coarse}, p.~28.} forward/backward Gauss-Seidel, block-Jacobi, and additive Schwarz methods @@ -62,8 +62,8 @@ portability, modularity ed extensibility in the development of the preconditione package. On the other hand, the implementation of MLD2P4 has led to some revisions and extentions of the original PSBLAS kernels. The inter-process comunication required by MLD2P4 is encapsulated -in the PSBLAS routines;% , except few cases where MPI~\cite{MPI1} is explicitly called. -% \textbf{E' ancora cos\'{i} o adesso \`e tutto incapsulato in PSBLAS?} +in the PSBLAS routines; +% , except few cases where MPI~\cite{MPI1} is explicitly called. therefore, MLD2P4 can be run on any parallel machine where PSBLAS implementations are available. diff --git a/docs/src/precs.tex b/docs/src/precs.tex deleted file mode 100644 index 582b2b12..00000000 --- a/docs/src/precs.tex +++ /dev/null @@ -1,384 +0,0 @@ -\section{Preconditioner routines} -\label{sec:precs} -\markboth{\underline{MLD2P4 User's and Reference Guide}} - {\underline{\ref{sec:precs} Preconditioners}} - -% \section{Preconditioners} -\label{sec:psprecs} -The MLD2P4 library contains the implementation of many preconditioning -techniques. The preconditioners may be applied as normal ``base'' -preconditioners; alternatively multiple ``base'' preconditioners may -be combined in a multilevel framework. - -The base (one-level) preconditioners include: -\begin{itemize} -\item Diagonal Scaling -\item Block Jacobi -\item Hybrid Gauss-Seidel; -\item Additive Schwarz, Restricted Additive Schwarz and - Additive Schwarz with Harmonic extensions; -\end{itemize} -The Jacobi and Additive Schwarz preconditioners can make use of the -following solvers: -\begin{itemize} -\item Level-$p$ Incomplete LU factorization ($ILU(p)$); -\item Threshold Incomplete LU factorization ($ILU(\tau,p)$); -\item Complete LU factorization by means of the following optional - external packages: -\begin{itemize} -\item UMFPACK; -\item SuperLU; -\item SuperLU\_Dist. -\end{itemize} -\end{itemize} - -The supporting data type and subroutine interfaces are defined in the -module \verb|mld_prec_mod|; the module also overrides the variables -and tyep definitions of \verb|psb_prec_mod| so as to function as a -drop-in replacement for the PSBLAS methods. Thus if the user does not -wish to employ the additional MLD2P4 capabitlities, it is possible to -migrate an existing PSBLAS program without any source code -modifications, only a recompilation is needed. - -%% We also provide a companion package of multi-level Additive -%% Schwarz preconditioners called MD2P4; this is actually a family of -%% preconditioners since there is the possibility to choose between -%% many variants, and is currently in an experimental stateIts -%% documentation is planned to appear after stabilization of the -%% package, which will characterize release 2.1 of our library. - - - - -\subroutine{mld\_precinit}{Initialize a preconditioner} - -\syntax{call mld\_precinit}{prec, ptype, info} -\syntax*{call mld\_precinit}{prec, ptype, info, nlev} - -\begin{description} -\item[Type:] Asynchronous. -\item[\bf On Entry] -\item[ptype] the type of preconditioner. -Scope: {\bf global} \\ -Type: {\bf required}\\ -Intent: {\bf in}.\\ -Specified as: a character string, see usage notes. -\item[nlev] Number of levels in a multilevel precondtioner. -Scope: {\bf global} \\ -Type: {\bf optional}\\ -Specified as: an integer value, see usage notes. -%% \item[rs] -%% Scope: {\bf global} \\ -%% Type: {\bf optional}\\ -%% Specified as: a long precision real number. -\item[\bf On Exit] - -\item[prec] -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: a preconditioner data structure \precdata. -\item[info] -Scope: {\bf global} \\ -Type: {\bf required}\\ -Intent: {\bf out}.\\ -Error code: if no error, 0 is returned. -\end{description} -\subsection*{Usage Notes} -%% The PSBLAS 2.0 contains a number of preconditioners, ranging from a -%% simple diagonal scaling to 2-level domain decomposition. These -%% preconditioners may use the SuperLU or the UMFPACK software, if -%% installed; see~\cite{SUPERLU,UMFPACK}. -Legal inputs to this subroutine are interpreted depending on the -$ptype$ string as follows\footnote{The string is case-insensitive}: -\begin{description} -\item[NONE] No preconditioning, i.e. the preconditioner is just a copy - operator. -\item[DIAG] Diagonal scaling; each entry of the input vector is - multiplied by the reciprocal of the sum of the absolute values of - the coefficients in the corresponding row of matrix $A$; -\item[BJAC] Precondition by a factorization of the - block-diagonal of matrix $A$, where block boundaries are determined - by the data allocation boundaries for each process; requires no - communication. -\item[AS] Additive Schwarz; default is to apply the Restricted - Additive Schwarz variant, with an $ILU(0)$ factorization -\item[ML] Multilevel preconditioner. -\end{description} - - - -\subroutine{mld\_precset}{Set preconditioner features} - -\syntax{call mld\_precset}{prec, what, val, info, ilev, pos} -\syntax{call prec\%set}{what, val, info, ilev, pos} - - -\begin{description} -\item[Type:] Asynchronous. -\item[\bf On Entry] -\item[prec] the preconditioner.\\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: an already initialized precondtioner data structure \precdata\\ -\item[what] The feature to be set. \\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}.\\ -Specified as: an integer constant or a string. Symbolic names are -available in the library module, see usage notes for legal values. -\item[val] The value to set the chosen feature to. \\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}.\\ -Specified as: an integer, double precision or character variable. -Symbolic names for some choices are available in the library module, -see usage notes for legal values. -\item[ilev] The level of a multilevel preconditioner to which the - feature choice should apply.\\ -Scope: {\bf global} \\ -Type: {\bf optional}\\ -Specified as: an integer value, see usage notes. -\item[pos] The position of the smoother/solver to which the current - setting applies. - feature choice should apply.\\ -Scope: {\bf global} \\ -Type: {\bf optional}\\ -Specified as: a character variable, with values \verb|pre| or \verb|post|. -\end{description} - -\begin{description} -\item[\bf On Return] -\item[prec] the preconditioner.\\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: a precondtioner data structure \precdata\\ -\item[info] Error code.\\ -Scope: {\bf local} \\ -Type: {\bf required} \\ -Intent: {\bf out}.\\ -An integer value; 0 means no error has been detected. -\end{description} - -\subsection*{Usage Notes} -Legal inputs to this subroutine are interpreted depending on the value -of \verb|what| input as follows -\begin{description} -\item[mld\_coarse\_mat\_] -\end{description} - -\subroutine{mld\_hierarchy\_bld}{Builds a matrix hierarchy} - -\syntax{call mld\_hierarchy\_bld}{a, desc\_a, prec, info} - -\begin{description} -\item[Type:] Synchronous. -\item[\bf On Entry] -\item[a] the system sparse matrix. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}, target.\\ -Specified as: a sparse matrix data structure \spdata. -\item[prec] the preconditioner.\\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: an already initialized precondtioner data structure \precdata\\ -\item[desc\_a] the problem communication descriptor. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}, target.\\ -Specified as: a communication descriptor data structure \descdata. -%% \item[upd] -%% Scope: {\bf global} \\ -%% Type: {\bf optional}\\ -%% Intent: {\bf in}.\\ -%% Specified as: a character. -\end{description} - -\begin{description} -\item[\bf On Return] -\item[prec] the preconditioner, with all the matrices and transfer - operators updated.\\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: a precondtioner data structure \precdata\\ -\item[info] Error code.\\ -Scope: {\bf local} \\ -Type: {\bf required} \\ -Intent: {\bf out}.\\ -An integer value; 0 means no error has been detected. -\end{description} - - -\subroutine{mld\_smoothers\_bld}{Builds the smoothers} - -\syntax{call mld\_smoothers\_bld}{a, desc\_a, prec, info} - -\begin{description} -\item[Type:] Synchronous. -\item[\bf On Entry] -\item[a] the system sparse matrix. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}, target.\\ -Specified as: a sparse matrix data structure \spdata. -\item[prec] the preconditioner.\\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: an already initialized precondtioner data structure -\precdata\ with an already built matrix hierarchy \\ -\item[desc\_a] the problem communication descriptor. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}, target.\\ -Specified as: a communication descriptor data structure \descdata. -%% \item[upd] -%% Scope: {\bf global} \\ -%% Type: {\bf optional}\\ -%% Intent: {\bf in}.\\ -%% Specified as: a character. -\end{description} - -\begin{description} -\item[\bf On Return] -\item[prec] the preconditioner, with all the smoothers and solvers - updated.\\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: a precondtioner data structure \precdata\\ -\item[info] Error code.\\ -Scope: {\bf local} \\ -Type: {\bf required} \\ -Intent: {\bf out}.\\ -An integer value; 0 means no error has been detected. -\end{description} - - - - -\subroutine{mld\_precbld}{Builds a preconditioner} - -\syntax{call mld\_precbld}{a, desc\_a, prec, info} - -\begin{description} -\item[Type:] Synchronous. -\item[\bf On Entry] -\item[a] the system sparse matrix. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}, target.\\ -Specified as: a sparse matrix data structure \spdata. -\item[prec] the preconditioner.\\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: an already initialized precondtioner data structure \precdata\\ -\item[desc\_a] the problem communication descriptor. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}, target.\\ -Specified as: a communication descriptor data structure \descdata. -%% \item[upd] -%% Scope: {\bf global} \\ -%% Type: {\bf optional}\\ -%% Intent: {\bf in}.\\ -%% Specified as: a character. -\end{description} - -\begin{description} -\item[\bf On Return] -\item[prec] the preconditioner.\\ -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: a precondtioner data structure \precdata\\ -\item[info] Error code.\\ -Scope: {\bf local} \\ -Type: {\bf required} \\ -Intent: {\bf out}.\\ -An integer value; 0 means no error has been detected. -\end{description} - - -\subsection*{Usage Notes} -A call to this routine is equivalent to a call to -\verb|mld_hierarchy_bld| followed by a call to \verb|mld_smnoothers_bld|. - - -\subroutine{mld\_precaply}{Preconditioner application routine} - -\syntax{call mld\_precaply}{prec,x,y,desc\_a,info,trans,work} -\syntax*{call mld\_precaply}{prec,x,desc\_a,info,trans} - -\begin{description} -\item[Type:] Synchronous. -\item[\bf On Entry] -\item[prec] the preconditioner. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}.\\ -Specified as: a preconditioner data structure \precdata. -\item[x] the source vector. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: a double precision array. -\item[desc\_a] the problem communication descriptor. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}.\\ -Specified as: a communication data structure \descdata. -\item[trans] -Scope: {\bf } \\ -Type: {\bf optional}\\ -Intent: {\bf in}.\\ -Specified as: a character. -\item[work] an optional work space -Scope: {\bf local} \\ -Type: {\bf optional}\\ -Intent: {\bf inout}.\\ -Specified as: a double precision array. -\end{description} - -\begin{description} -\item[\bf On Return] -\item[y] the destination vector. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf inout}.\\ -Specified as: a double precision array. -\item[info] Error code.\\ -Scope: {\bf local} \\ -Type: {\bf required} \\ -Intent: {\bf out}.\\ -An integer value; 0 means no error has been detected. -\end{description} - - - -\subroutine{mld\_prec\_descr}{Prints a description of current preconditioner} - -\syntax{call mld\_prec\_descr}{prec} - -\begin{description} -\item[Type:] Asynchronous. -\item[\bf On Entry] -\item[prec] the preconditioner. -Scope: {\bf local} \\ -Type: {\bf required}\\ -Intent: {\bf in}.\\ -Specified as: a preconditioner data structure \precdata. -\end{description} - - - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "userguide" -%%% End: diff --git a/docs/src/userguide.tex b/docs/src/userguide.tex index 07cb2688..224a8607 100644 --- a/docs/src/userguide.tex +++ b/docs/src/userguide.tex @@ -157,6 +157,7 @@ based on PSBLAS} \include{background} \include{gettingstarted} \include{userinterface} +\include{newobjects} \include{errors} \clearpage \appendix diff --git a/docs/src/userhtml.tex b/docs/src/userhtml.tex index 6425c1c5..de80420e 100644 --- a/docs/src/userhtml.tex +++ b/docs/src/userhtml.tex @@ -133,9 +133,8 @@ Software version: 2.1\\ \include{background} \include{gettingstarted} \include{userinterface} -%\include{advanced} +\include{newobjects} \include{errors} -%\include{listofroutines} \cleardoublepage \appendix \include{license} diff --git a/docs/src/userinterface.tex b/docs/src/userinterface.tex index ecc6f081..fad93c0a 100644 --- a/docs/src/userinterface.tex +++ b/docs/src/userinterface.tex @@ -27,7 +27,7 @@ i.e., \verb|s|, \verb|d|, \verb|c|, \verb|z|, according to the sparse matrix data structure; \item the arrays containing the vectors $v$ and $w$ involved in - the preconditioner application $w=M^{-1}v$ must be of type + the preconditioner application $w=B^{-1}v$ must be of type \verb|psb_|\emph{x}\verb|vect_type| with \emph{x} = \verb|s|, \verb|d|, \verb|c|, \verb|z|, in a manner completely analogous to the sparse matrix type; @@ -100,9 +100,8 @@ contained in \verb|val|. % be chosen according to the real/complex, single/double precision % version of MLD2P4 under use.\\ \verb|what| & \verb|character(len=*)|. \\ - & The parameter to be set. It can be specified by - a predefined constant, or through its name; the string - is case-insensitive. See also + & The parameter to be set. It can be specified through its name; + the string is case-insensitive. See Tables~\ref{tab:p_cycle}-\ref{tab:p_smoother_1}.\\ \verb|val | & \verb|integer| \emph{or} \verb|character(len=*)| \emph{or} \verb|real(psb_spk_)| \emph{or} \verb|real(psb_dpk_)|, @@ -240,7 +239,7 @@ solver is changed to the default sequential solver. \textsc{comments} \\ \hline %\multicolumn{5}{|c|}{\emph{type of the multi-level preconditioner}}\\ \hline %\verb|mld_ml_cycle_| \par -\verb|ML_CYCLE| & \verb|character(len=*)| +\verb|'ML_CYCLE'| & \verb|character(len=*)| & \texttt{'VCYCLE'} \par \texttt{'WCYCLE'} \par \texttt{'KCYCLE'} \par \texttt{'MULT'} \par \texttt{'ADD'} & \texttt{'VCYCLE'} @@ -249,7 +248,7 @@ solver is changed to the default sequential solver. Note that hybrid Multiplicative Schwarz is equivalent to V-cycle and is included for compatibility with previous versions of MLD2P4. \\ \hline %\verb|mld_outer_sweeps_| \par - \verb|OUTER_SWEEPS| & \texttt{integer} & + \verb|'OUTER_SWEEPS'| & \texttt{integer} & Any integer \par number $\ge 1$ & 1 & Number of multi-level cycles. \\ \hline %\verb|mld_smoother_type_| \par \verb|SMOOTHER_TYPE| & \verb|character(len=*)| @@ -279,7 +278,7 @@ be applied. \textsc{comments} \\ \hline %\multicolumn{5}{|c|}{\emph{aggregation algorithm}} \\ \hline %\verb|mld_min_coarse_size_| \par -\verb|MIN_COARSE_SIZE| & \verb|integer| +\verb|'MIN_COARSE_SIZE'| & \verb|integer| & Any number \par $> 0$ & $\lfloor 40 \sqrt[3]{n} \rfloor$, where $n$ is the dimension of the matrix at the finest level @@ -295,7 +294,7 @@ be applied. % (see \verb|mld_n_prec_levs_|). \\ \hline %\verb|mld_min_cr_ratio_| \par -\verb|MIN_CR_RATIO| & \verb|real| +\verb|'MIN_CR_RATIO'| & \verb|real| & Any number \par $> 1$ & 1.5 & Minimum coarsening ratio. The aggregation stops @@ -303,13 +302,13 @@ be applied. at two consecutive levels is lower than or equal to this threshold (see Note).\\ \hline %\verb|mld_max_levs_| \par -\verb|MAX_LEVS| & \verb|integer| +\verb|'MAX_LEVS'| & \verb|integer| & Any integer \par number $> 1$ & 20 & Maximum number of levels. The aggregation stops if the number of levels reaches this value (see Note). \\ \hline %\verb|mld_par_aggr_alg_| \par -\verb|PAR_AGGR| & \verb|character(len=*)| \hspace*{-3mm} +\verb|'PAR_AGGR'| & \verb|character(len=*)| \hspace*{-3mm} & \texttt{'DEC'}, \texttt{'SYMDEC'} & \texttt{'DEC'} & Parallel aggregation algorithm. \par Currently, only the @@ -318,13 +317,13 @@ be applied. aggregation to the sparsity pattern of $A+A^T$.\\ \hline %\verb|mld_aggr_type_| \par -\verb|AGGR_TYPE| & \verb|character(len=*)| \hspace*{-3mm} +\verb|'AGGR_TYPE'| & \verb|character(len=*)| \hspace*{-3mm} & \textbf{\texttt{'VMB'}} & \textbf{\texttt{'VMB'}} & Type of aggregation algorithm: currently, the scalar aggregation algorithm by Van\v{e}k, Mandel and Brezina is implemented \cite{VANEK_MANDEL_BREZINA}. \\ \hline %\verb|mld_aggr_prol_| \par -\verb|AGGR_PROL| & \verb|character(len=*)| \hspace*{-3mm} +\verb|'AGGR_PROL'| & \verb|character(len=*)| \hspace*{-3mm} & \texttt{'SMOOTHED'}, \texttt{'UNSMOOTHED'} & \texttt{'SMOOTHED'} & Prolongator used by the aggregation algorithm: smoothed or unsmoothed (i.e., tentative prolongator). \\ @@ -351,7 +350,7 @@ of levels. } \\ \verb|what| & \textsc{data type} & \verb|val| & \textsc{default} & \textsc{comments} \\ \hline %\verb|mld_aggr_ord_| \par -\verb|AGGR_ORD| & \verb|character(len=*)| +\verb|'AGGR_ORD'| & \verb|character(len=*)| & \texttt{'NATURAL'} \par \texttt{'DEGREE'} & \texttt{'NATURAL'} & Initial ordering of indices for the aggregation @@ -361,11 +360,12 @@ of levels. } \\ %Since aggregation is %heuristic, results will be different. %\verb|mld_aggr_thresh_| \par -\verb|AGGR_THRESH| & \verb|real(|\emph{kind\_parameter}\verb|)| +\verb|'AGGR_THRESH'| & \verb|real(|\emph{kind\_parameter}\verb|)| & Any~real \par number~$\in [0, 1]$ & 0.05 - & The threshold $\theta$ in the aggregation algorithm - (see Note). \\ \hline + & The threshold $\theta$ in the aggregation algorithm, + see (\ref{eq:strongly_coup}) in Section~\ref{sec:aggregation}. + See also the note at the bottom of this table. \\ \hline %%\verb|mld_aggr_scale_| \par % \verb|AGGR_SCALE| & \verb|real(|\emph{kind\_parameter}\verb|)| % & Any~real \par number~$\in [0, 1]$ @@ -373,40 +373,40 @@ of levels. } \\ % & Scale factor applied to the threshold in going % from level $ilev$ to level $ilev+1$. \\ \hline %\verb|mld_aggr_omega_alg_| \par -\verb|AGGR_OMEGA_ALG|& \verb|character(len=*)| - & \texttt{'EIG\_EST'} \par \texttt{'USER\_CHOICE'} - & \texttt{'EIG\_EST'} - & How the damping parameter $\omega$ in the - smoothed aggregation is obtained: - either via an estimate of the spectral radius of - $D^{-1}A$, where $A$ is the matrix at the current - level and $D$ is the diagonal matrix with - the same diagonal entires as $A$, or explicily - specified by the user. \\ \hline +%\verb|'AGGR_OMEGA_ALG'|& \verb|character(len=*)| +% & \texttt{'EIG\_EST'} \par \texttt{'USER\_CHOICE'} +% & \texttt{'EIG\_EST'} +% & How the damping parameter $\omega$ in the +% smoothed aggregation is obtained: +% either via an estimate of the spectral radius of +% $D^{-1}A$, where $A$ is the matrix at the current +% level and $D$ is the diagonal matrix with +% the same diagonal entires as $A$, or explicily +% specified by the user. \\ \hline %\verb|mld_aggr_eig_| \par -\verb|AGGR_EIG| & \verb|character(len=*)| - & \texttt{'A\_NORMI'} - & \texttt{'A\_NORMI'} - & How to estimate the spectral radius of $D^{-1}A$. - Currently only the infinity norm estimate - is available. \\ \hline +%\verb|'AGGR_EIG'| & \verb|character(len=*)| +% & \texttt{'A\_NORMI'} +% & \texttt{'A\_NORMI'} +% & How to estimate the spectral radius of $D^{-1}A$. +% Currently only the infinity norm estimate +% is available. \\ \hline %\verb|mld_aggr_omega_val_| \par -\verb|AGGR_OMEGA_VAL| & \verb|real(|\emph{kind\_parameter}\verb|)| - & Any real \par number $>0$ - & $4/(3\rho(D^{-1}A))$ - & Damping parameter $\omega$ in the smoothed aggregation algorithm. - It must be set by the user if - \verb|USER_CHOICE| was specified for - \verb|mld_aggr_omega_alg_|, - otherwise it is computed by the library, using the - selected estimate of the spectral radius $\rho(D^{-1}A)$ of - $D^{-1}A$.\\ \hline +%\verb|'AGGR_OMEGA_VAL'| & \verb|real(|\emph{kind\_parameter}\verb|)| +% & Any real \par number $>0$ +% & $4/(3\rho(D^{-1}A))$ +% & Damping parameter $\omega$ in the smoothed aggregation algorithm. +% It must be set by the user if +% \verb|USER_CHOICE| was specified for +% \verb|mld_aggr_omega_alg_|, +% otherwise it is computed by the library, using the +% selected estimate of the spectral radius $\rho(D^{-1}A)$ of +% $D^{-1}A$.\\ \hline %\verb|mld_aggr_filter_| \par -\verb|AGGR_FILTER| +\verb|'AGGR_FILTER'| & \verb|character(len=*)| & \texttt{'FILTER'} \par \texttt{'NOFILTER'} - & \texttt{'NOFILTER'} & Matrix used in computing the smoothed - prolongator: filtered or unfiltered. \\ + & \texttt{'NOFILTER'} & Matrix used in computing the smoothed + prolongator: filtered or unfiltered (see~(\ref{eq:filtered}) in Section~\ref{sec:aggregation}). \\ \hline \multicolumn{5}{|l|}{{\bfseries Note.} Different thresholds at different levels, such as those used in \cite[Section~5.1]{VANEK_MANDEL_BREZINA}, can be easily set by @@ -428,13 +428,13 @@ the parameter \texttt{ilev}.} \\ \textsc{comments} \\ \hline %\multicolumn{5}{|c|}{\emph{coarse-space correction at the coarsest level}}\\ \hline %\verb|mld_coarse_mat_| \par -\verb|COARSE_MAT| & \verb|character(len=*)| +\verb|'COARSE_MAT'| & \verb|character(len=*)| & \texttt{'DIST'} \par \texttt{'REPL'} & \texttt{'REPL'} & Coarsest matrix layout: distributed among the processes or replicated on each of them. \\ \hline %\verb|mld_coarse_solve_| \par -\verb|COARSE_SOLVE| & \verb|character(len=*)| +\verb|'COARSE_SOLVE'| & \verb|character(len=*)| & \texttt{'MUMPS'} \par \texttt{'UMF'} \par \texttt{'SLU'} \par \texttt{'SLUDIST'} \par \texttt{'JACOBI'} \par \texttt{'GS'} \par \texttt{'BJAC'} @@ -456,7 +456,7 @@ the parameter \texttt{ilev}.} \\ value UMFPACK and SuperLU\_Dist are available only in double precision. \\ \hline %\verb|mld_coarse_subsolve_| \par -\verb|COARSE_SUBSOLVE| & \verb|character(len=*)| +\verb|'COARSE_SUBSOLVE'| & \verb|character(len=*)| & \texttt{'ILU'} \par \texttt{'ILUT'} \par \texttt{'MILU'} \par \texttt{'MUMPS'} \par \texttt{'SLU'} \par \texttt{'UMF'} & See~Note. @@ -491,18 +491,18 @@ level.\label{tab:p_coarse}} \textsc{comments} \\ \hline %\multicolumn{5}{|c|}{\emph{coarse-space correction at the coarsest level}}\\ \hline %\verb|mld_coarse_sweeps_| \par -\verb|COARSE_SWEEPS| & \verb|integer| +\verb|'COARSE_SWEEPS'| & \verb|integer| & Any integer \par number $> 0$ & 10 & Number of sweeps when \verb|JACOBI|, \verb|GS| or \verb|BJAC| is chosen as coarsest-level solver. \\ \hline %\verb|mld_coarse_fillin_| \par -\verb|COARSE_FILLIN| & \verb|integer| +\verb|'COARSE_FILLIN'| & \verb|integer| & Any integer \par number $\ge 0$ & 0 & Fill-in level $p$ of the ILU factorizations. \\ \hline %\verb|mld_coarse_iluthrs_| \par -\verb|COARSE_ILUTHRS| +\verb|'COARSE_ILUTHRS'| & \verb|real(|\emph{kind\_parameter}\verb|)| & Any real \par number $\ge 0$ & 0 @@ -523,7 +523,7 @@ level (continued).\label{tab:p_coarse_1}} \textsc{comments} \\ \hline %\multicolumn{5}{|c|}{\emph{basic one-level preconditioner (smoother)}} \\ \hline %\verb|mld_smoother_type_| \par -\verb|SMOOTHER_TYPE| & \verb|character(len=*)| +\verb|'SMOOTHER_TYPE'| & \verb|character(len=*)| & \verb|'JACOBI'| \par \verb|'GS'| \par \verb|'BGS'| \par \verb|'BJAC'| \par \verb|'AS'| & \verb|'FBGS'| @@ -533,7 +533,7 @@ level (continued).\label{tab:p_coarse_1}} Additive Schwarz. \par It is ignored by one-level preconditioners. \\ \hline %\verb|mld_sub_solve_| \par -\verb|SUB_SOLVE| & \verb|character(len=*)| +\verb|'SUB_SOLVE'| & \verb|character(len=*)| & \texttt{'JACOBI'} \par \texttt{'GS'} \par \texttt{'BGS'} \par \texttt{'ILU'} \par \texttt{'ILUT'} \par \texttt{'MILU'} \par @@ -550,7 +550,7 @@ level (continued).\label{tab:p_coarse_1}} (plus triangular solve). See Note for details on hybrid Gauss-Seidel. \\ \hline %\verb|mld_moother_sweeps_| \par -\verb|SMOOTHER_SWEEPS| & \verb|integer| +\verb|'SMOOTHER_SWEEPS'| & \verb|integer| & Any integer \par number~$\ge 0$ & 1 & Number of sweeps of the smoother or one-level preconditioner. @@ -559,7 +559,7 @@ level (continued).\label{tab:p_coarse_1}} together with \verb|pos='PRE'| or \verb|pos='POST|, respectively. \\ \hline %\verb|mld_sub_ovr_| \par -\verb|SUB_OVR| & \verb|integer| +\verb|'SUB_OVR'| & \verb|integer| & Any integer \par number~$\ge 0$ & 1 & Number of overlap layers, for Additive Schwarz only. \\ @@ -578,7 +578,7 @@ level (continued).\label{tab:p_coarse_1}} \verb|what| & \textsc{data type} & \verb|val| & \textsc{default} & \textsc{comments} \\ \hline %\verb|mld_sub_restr_| \par -\verb|SUB_RESTR| & \verb|character(len=*)| +\verb|'SUB_RESTR'| & \verb|character(len=*)| & \texttt{'HALO'} \par \texttt{'NONE'} & \texttt{'HALO'} & Type of restriction operator, for Additive Schwarz only: @@ -587,7 +587,7 @@ level (continued).\label{tab:p_coarse_1}} Note that \texttt{HALO} must be chosen for the classical Addditive Schwarz smoother and its RAS variant.\\ \hline %\verb|mld_sub_prol_| \par -\verb|SUB_PROL| & \verb|character(len=*)| +\verb|'SUB_PROL'| & \verb|character(len=*)| & \texttt{'SUM'} \par \texttt{'NONE'} & \texttt{'NONE'} & Type of prolongation operator, for Additive Schwarz only: @@ -596,12 +596,12 @@ level (continued).\label{tab:p_coarse_1}} Note that \texttt{SUM} must be chosen for the classical Additive Schwarz smoother, and \texttt{NONE} for its RAS variant. \\ \hline %\verb|mld_sub_fillin_| \par -\verb|SUB_FILLIN| & \verb|integer| +\verb|'SUB_FILLIN'| & \verb|integer| & Any integer \par number~$\ge 0$ & 0 & Fill-in level $p$ of the incomplete LU factorizations. \\ \hline %\verb|mld_sub_iluthrs_| \par -\verb|SUB_ILUTHRS| & \verb|real(|\emph{kind\_parameter}\verb|)| +\verb|'SUB_ILUTHRS'| & \verb|real(|\emph{kind\_parameter}\verb|)| & Any real number~$\ge 0$ & 0 & Drop tolerance $t$ in the ILU($p,t$) factorization. \\ %\hline @@ -741,7 +741,7 @@ hierarchy produced by a previous call to \verb|hierarchy_build| \end{center} \noindent -This routine computes $y = op(M^{-1})\, x$, where $M$ is a previously built +This routine computes $y = op(B^{-1})\, x$, where $B$ is a previously built preconditioner, stored into \verb|p|, and $op$ denotes the preconditioner itself or its transpose, according to the value of \verb|trans|. @@ -770,10 +770,10 @@ and hence it is completely transparent to the user. \verb|info| & \verb|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \verb|trans| & \verb|character(len=1), optional, intent(in).|\\ - & If \verb|trans| = \verb|'N','n'| then $op(M^{-1}) = M^{-1}$; - if \verb|trans| = \verb|'T','t'| then $op(M^{-1}) = M^{-T}$ - (transpose of $M^{-1})$; if \verb|trans| = \verb|'C','c'| then $op(M^{-1}) = M^{-C}$ - (conjugate transpose of $M^{-1})$.\\ + & If \verb|trans| = \verb|'N','n'| then $op(B^{-1}) = B^{-1}$; + if \verb|trans| = \verb|'T','t'| then $op(B^{-1}) = B^{-T}$ + (transpose of $B^{-1})$; if \verb|trans| = \verb|'C','c'| then $op(B^{-1}) = B^{-C}$ + (conjugate transpose of $B^{-1})$.\\ \verb|work| & \emph{type}\verb|(|\emph{kind\_parameter}\verb|), dimension(:), optional, target|.\\ & Workspace. Its size should be at least \verb|4 * psb_cd_get_local_| \verb|cols(desc_a)| (see the PSBLAS User's Guide). @@ -854,66 +854,6 @@ as follows: \end{center} -\clearpage - -\section{Adding new smoothers and solvers to MLD2P4\label{sec:adding}} - -Developers can add completely new smoother and/or solver classes -derived from the base objects in the library may be used without -recompiling the library itself. - -To do so it is necessary first to select the base type to be extended; -in our experience, it is quite likely that the new application needs -only require the definition of a ``solver'' object, which is almost -always acting only on the local part of the distributed matrix. - -The parallel actions required to connect the various solver objects -are most often already provided by the Block Jacobi or the Additive -Schwarz smoothers. To define a new solver, the developer will then -have to define its components and methods, perhaps taking one of the -predefined solvers as a starting point if possible. - - -Once the new smoother/solver class has been developed, to use it in -the context of the multilevel preconditioners it is necessary to: -\begin{itemize} -\item Declare in the application program a variable of the new type; -\item Pass that variable as the argument to the se routine as in the - following: -\begin{center} -\verb|call p%set(smoother,info [,ilev, ilmax,pos])|\\ -\verb|call p%set(solver,info [,ilev, ilmax,pos])| -\end{center} -\item Link into the application executable the code implementing the - various methods. -\end{itemize} -The new solver object is then dynamically included in the -preconditioner structure, and will act as a \emph{mold} to which the -preconditioner will conform, even though the MLD2P4 library has not -been modified to account for this new development. - -It is possible to define new values for the keyword \verb|WHAT| in the -\verb|set| routines; if the library code does not recognize a keyword, -it passes it down the composition hierarchy (levels containing -smoothers containing solvers), so that it can be eventually caught by -the new solver. - -An example is contained in the source code distribution under the -folder \verb|tests/newslv|. This example solver is simply the ILU(0) -solver under a new name, but it should give an idea of what needs to -be done. - -\ \\ - -\begin{tabular}{p{1.2cm}p{12cm}} -\verb|smoother| & \verb|class(mld_x_base_smoother_type)| \\ - & The user-defined new smoother to be employed in the - preconditioner.\\ -\verb|solver| & \verb|class(mld_x_base_solver_type)| \\ - & The user-defined new solver to be employed in the - preconditioner. -\end{tabular} - %%% Local Variables: %%% mode: latex %%% TeX-master: "userguide"