\section{User Interface\label{sec:userinterface}} \markboth{\textsc{AMG4PSBLAS User's and Reference Guide}} {\textsc{\ref{sec:userinterface} User Interface}} The basic user interface of AMG4PBLAS consists of eight methods. The six methods \fortinline|init|, \fortinline|set|, \fortinline|build|, \fortinline|hierarchy_build|, \fortinline|smoothers_build| and \fortinline|apply| encapsulate all the functionalities for the setup and the application of any multilevel and one-level preconditioner implemented in the package. The method \fortinline|free| deallocates the preconditioner data structure, while \fortinline|descr| prints a description of the preconditioner setup by the user. For backward compatibility, methods are also accessible as stand-alone subroutines. For each method, the same user interface is overloaded with respect to the real/complex and single/double precision data; arguments with appropriate data types must be passed to the method, i.e., \begin{itemize} \item the sparse matrix data structure, containing the matrix to be preconditioned, must be of type \verb|psb_|\emph{x}\verb|spmat_type| with \emph{x} = \verb|s| for real single precision, \emph{x} = \verb|d| for real double precision, \emph{x} = \verb|c| for complex single precision, \emph{x} = \verb|z| for complex double precision; \item the preconditioner data structure must be of type \verb|amg_|\emph{x}\verb|prec_type|, with \emph{x} = \verb|s|, \verb|d|, \verb|c|, \verb|z|, according to the sparse matrix data structure; \item the arrays containing the vectors $v$ and $w$ involved in the preconditioner application $w=B^{-1}v$ must be of type \verb|psb_|\emph{x}\verb|vect_type| with \emph{x} = \verb|s|, \verb|d|, \verb|c|, \verb|z|, in a manner completely analogous to the sparse matrix type; \item real parameters defining the preconditioner must be declared according to the precision of the sparse matrix and preconditioner data structures (see Section~\ref{sec:precset}). \end{itemize} A description of each method is given in the remainder of this section. \clearpage \subsection{Method init\label{sec:precinit}} \begin{center} \fortinline|call p%init(contxt,ptype,info)| \end{center} \noindent This method allocates and initializes the preconditioner \fortinline|p|, according to the preconditioner type chosen by the user. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|contxt| & \fortinline|type(psb_ctxt_type), intent(in)|.\\ & The communication context.\\ \fortinline|ptype| & \fortinline|character(len=*), intent(in)|.\\ & The type of preconditioner. Its values are specified in Table~\ref{tab:precinit}.\\ & Note that strings are case insensitive.\\ \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \end{tabular} \clearpage \subsection{Method set\label{sec:precset}} \begin{center} \fortinline|call p%set(what,val,info [,ilev, ilmax, pos, idx])| \end{center} \noindent This method sets the parameters defining the preconditioner \fortinline|p|. More precisely, the parameter identified by \fortinline|what| is assigned the value contained in \fortinline|val|. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|what| & \fortinline|character(len=*)|. \\ & The parameter to be set. It can be specified through its name; the string is case-insensitive. See Tables~\ref{tab:p_cycle}-\ref{tab:p_smoother_1}.\\ \fortinline|val | & \fortinline|integer| \emph{or} \fortinline|character(len=*)| \emph{or} \fortinline|real(psb_spk_)| \emph{or} \fortinline|real(psb_dpk_)|, \fortinline|intent(in)|.\\ & The value of the parameter to be set. The list of allowed values and the corresponding data types is given in Tables~\ref{tab:p_cycle}-\ref{tab:p_smoother_1}. When the value is of type \fortinline|character(len=*)|, it is also treated as case insensitive.\\ \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \fortinline|ilev| & \fortinline|integer, optional, intent(in)|.\\ & For the multilevel preconditioner, the level at which the preconditioner parameter has to be set. The levels are numbered in increasing order starting from the finest one, i.e., level 1 is the finest level. If \fortinline|ilev| is not present, the parameter identified by \fortinline|what| is set at all levels that are appropriate (see Tables~\ref{tab:p_cycle}-\ref{tab:p_smoother_1}).\\ \fortinline|ilmax| & \fortinline|integer, optional, intent(in)|.\\ & For the multilevel preconditioner, when both \fortinline|ilev| and \fortinline|ilmax| are present, the settings are applied at all levels \fortinline|ilev:ilmax|. When \fortinline|ilev| is present but \fortinline|ilmax| is not, then the default is \fortinline|ilmax=ilev|. The levels are numbered in increasing order starting from the finest one, i.e., level 1 is the finest level. \\ \fortinline|pos| & \fortinline|character(len=*), optional, intent(in)|.\\ & Whether the other arguments apply only to the pre-smoother (\fortinline|'PRE'|) or to the post-smoother (\fortinline|'POST'|). If \fortinline|pos| is not present, the other arguments are applied to both smoothers. If the preconditioner is one-level or the parameter identified by \fortinline|what| does not concern the smoothers, \fortinline|pos| is ignored.\\ \fortinline|idx| & \fortinline|integer, optional, intent(in)|.\\ & An auxiliary input argument that can be passed to the underlying objects. \end{tabular} \noindent %However, in this case the optional arguments \fortinline|ilev|, %\fortinline|ilmax|, \fortinline|pos| and \fortinline|idx| %cannot be used. \\ A variety of preconditioners can be obtained by setting the appropriate preconditioner parameters. These parameters can be logically divided into four groups, i.e., parameters defining \begin{enumerate} \item the type of multilevel cycle and how many cycles must be applied; \item the coarsening algorithm; \item the coarse-space correction at the coarsest level (for multilevel preconditioners only); \item the smoother of the multilevel preconditioners, or the one-level preconditioner. \end{enumerate} A list of the parameters that can be set, along with their allowed and default values, is given in Tables~\ref{tab:p_cycle}-\ref{tab:p_smoother_1}.\\ \textbf{Remark 2.} A smoother is usually obtained by combining two objects: a smoother (\fortinline|'SMOOTHER_TYPE'|) and a local solver (\fortinline|'SUB_SOLVE'|), as specified in Tables~\ref{tab:p_smoother}-\ref{tab:p_smoother_1}. For example, the block-Jacobi smoother using ILU(0) on the blocks is obtained by combining the block-Jacobi smoother object with the ILU(0) solver object. Similarly, the hybrid Gauss-Seidel smoother (see Note in Table~\ref{tab:p_smoother}) is obtained by combining the block-Jacobi smoother object with a single sweep of the Gauss-Seidel solver object, while the point-Jacobi smoother is the result of combining the block-Jacobi smoother object with a single sweep of the point-Jacobi solver object. However, for simplicity, shortcuts are provided to set point-Jacobi, hybrid (forward) Gauss-Seidel, and hybrid backward Gauss-Seidel, i.e., the previous smoothers can be defined just by setting \fortinline|'SMOOTHER_TYPE'| to certain specific values (see Tables~\ref{tab:p_smoother}), without the need to set \fortinline|'SUB_SOLVE'| as well. The smoother and solver objects are arranged in a hierarchical manner. When specifying a smoother object, its parameters, including the local solver, are set to their default values, and when a solver object is specified, its defaults are also set, overriding in both cases any previous settings even if explicitly specified. Therefore if the user sets a smoother, and wishes to use a solver different from the default one, the call to set the solver must come \emph{after} the call to set the smoother. Similar considerations apply to the point-Jacobi, Gauss-Seidel and block-Jacobi coarsest-level solvers, and shortcuts are available in this case too (see Table~\ref{tab:p_coarse_1}). \\ \textbf{Remark 3.} Many of the coarsest-level solvers cannot be used with both the replicated and distributed coarsest-matrix layout; therefore, setting the solver after the layout may change the layout. Similarly, setting the layout after the solver may change the solver. More precisely, UMFPACK and SuperLU require the coarsest-level matrix to be replicated, while SuperLU\_Dist requires it to be distributed. In these cases, setting the coarsest-level solver implies that the layout is redefined according to the solver, ovverriding any previous settings. MUMPS, point-Jacobi, hybrid Gauss-Seidel and block-Jacobi can be applied to replicated and distributed matrices, thus their choice does not modify any previously specified layout. It is worth noting that, when the matrix is replicated, the point-Jacobi, hybrid Gauss-Seidel and block-Jacobi solvers reduce to the corresponding local solver objects (see Remark~2). For the point-Jacobi and Gauss-Seidel solvers, these objects correspond to a \emph{single} point-Jacobi sweep and a \emph{single} Gauss-Seidel sweep, respectively, which are very poor solvers. On the other hand, the distributed layout can be used with any solver but UMFPACK and SuperLU; therefore, if any of these two solvers has already been selected, the coarsest-level solver is changed to block-Jacobi, with the previously chosen solver applied to the local blocks. Likewise, the replicated layout can be used with any solver but SuperLu\_Dist; therefore, if SuperLu\_Dist has been previously set, the coarsest-level solver is changed to the default sequential solver. \textbf{Remark 4.} The argument \fortinline|idx| can be used to allow finer control for those solvers; for instance, by specifying the keyword \fortinline|'MUMPS_IPAR_ENTRY'| and an appropriate value for \fortinline|idx|, it is possible to set any entry in the MUMPS integer control array. See also Sec.~\ref{sec:adding}. %The \verb|what,val| pairs described here are those of the predefined %moother/solver objects; newly developed solvers may define new pairs %according to their needs. \bsideways \begin{center} %\begin{tabular}{|p{5cm}|l|p{2.4cm}|p{2.5cm}|p{5cm}|} \begin{tabular}{|p{3.6cm}|l|p{2.4cm}|p{2.4cm}|p{7.2cm}|} \hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fortinline|'ML_CYCLE'| & \fortinline|character(len=*)| & \fortinline|'VCYCLE'| \par \fortinline|'WCYCLE'| \par \fortinline|'KCYCLE'| \par \fortinline|'ADD'| & \fortinline|'VCYCLE'| &Multilevel cycle: V-cycle, W-cycle, K-cycle, and additive composition. \\ \hline \fortinline|'CYCLE_SWEEPS'| & \fortinline|integer| & Any integer \par number $\ge 1$ & 1 & Number of multilevel cycles. \\ \hline \end{tabular} \end{center} \caption{Parameters defining the multilevel cycle and the number of cycles to be applied. \label{tab:p_cycle}} \esideways \bsideways \begin{center} %\begin{tabular}{|p{5cm}|l|p{2.4cm}|p{2.5cm}|p{5cm}|} \begin{tabular}{|p{5.7cm}|l|p{2.3cm}|p{2.5cm}|p{6.9cm}|} \hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fortinline|'MIN_COARSE_SIZE_PER_PROCESS'| & \fortinline|integer| & Any number \par $> 0$ & $200$ & Coarse size threshold per process. The aggregation stops if the global number of variables of the computed coarsest matrix is lower than or equal to this threshold multiplied by the number of processes (see Note). \\ \hline \fortinline|'MIN_COARSE_SIZE'| & \fortinline|integer| & Any number \par $> 0$ & -1 & Coarse size threshold. The aggregation stops if the global number of variables of the computed coarsest matrix is lower than or equal to this threshold (see Note). If negative, it is ignored in favour of the default for \fortinline|'MIN_COARSE_SIZE_PER_PROCESS'|. \\ \hline \fortinline|'MIN_CR_RATIO'| & \fortinline|real| & Any number \par $> 1$ & 1.5 & Minimum coarsening ratio. The aggregation stops if the ratio between the global matrix dimensions at two consecutive levels is lower than or equal to this threshold (see Note).\\ \hline \fortinline|'MAX_LEVS'| & \fortinline|integer| & Any integer \par number $> 1$ & 20 & Maximum number of levels. The aggregation stops if the number of levels reaches this value (see Note). \\ \hline \fortinline|'PAR_AGGR_ALG'| & \fortinline|character(len=*)| \hspace*{-3mm} & \texttt{'DEC'}, \texttt{'SYMDEC'}, \texttt{'COUPLED'} & \texttt{'DEC'} & Parallel aggregation algorithm. \par the \fortinline|SYMDEC| option applies decoupled aggregation to the sparsity pattern of $A+A^T$.\\ \hline \ifpdf \phantomcaption \end{tabular} \end{center} \esideways \bsideways \ContinuedFloat \begin{center} \begin{tabular}{|p{5.7cm}|l|p{2.3cm}|p{2.5cm}|p{5.9cm}|} \hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fi \fortinline|'AGGR_TYPE'| & \fortinline|character(len=*)| \hspace*{-3mm} & \fortinline|'SOC1'| & \fortinline|'SOC1'|, \fortinline|'SOC2'|, \fortinline|'MATCHBOXP'| & Type of aggregation algorithm: currently, for the decoupled aggregation we implement two measures of strength of connection, the one by Van\v{e}k, Mandel and Brezina~\cite{VANEK_MANDEL_BREZINA}, and the one by Gratton et al~\cite{GrHeJi:16}. The coupled aggregation is based on a parallel version of the half-approximate matching implemented in the MatchBox-P software package~\cite{MatchBoxP}.\\ \hline \fortinline|'AGGR_SIZE'| & \fortinline|integer| \hspace*{-3mm} & Any integer \par power of $2$, with $\texttt{aggr\_size} \ge 2$ & 4 & Maximum size of aggregates when the coupled aggregation based on matching is applied. For aggressive coarsening with size of aggregate larger than $8$ we recommend the use of smoothed prolongators. Used only with \texttt{'COUPLED'} and \texttt{'MATCHBOXP'}\\ \hline \fortinline|'AGGR_PROL'| & \fortinline|character(len=*)| \hspace*{-3mm} & \fortinline|'SMOOTHED'|, \fortinline|'UNSMOOTHED'| & \fortinline|'SMOOTHED'| & Prolongator used by the aggregation algorithm: smoothed or unsmoothed (i.e., tentative prolongator). \\ \hline \multicolumn{5}{|l|}{{\bfseries Note.} The aggregation algorithm stops when at least one of the following criteria is met: the coarse size threshold, } \\ \multicolumn{5}{|l|}{the minimum coarsening ratio, or the maximum number of levels is reached.} \\ \multicolumn{5}{|l|}{Therefore, the actual number of levels may be smaller than the specified maximum number of levels. } \\ \hline \end{tabular} \caption{Parameters defining the aggregation algorithm. \label{tab:p_aggregation}} \end{center} \esideways \bsideways \begin{center} %\begin{tabular}{|p{5cm}|l|p{2.4cm}|p{2.5cm}|p{5cm}|} \begin{tabular}{|p{3.8cm}|l|p{2.5cm}|p{2.3cm}|p{6.6cm}|} \hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fortinline|'AGGR_ORD'| & \fortinline|character(len=*)| & \texttt{'NATURAL'} \par \texttt{'DEGREE'} & \texttt{'NATURAL'} & Initial ordering of indices for the decoupled aggregation algorithm: either natural ordering or sorted by descending degrees of the nodes in the matrix graph. \\ \hline %Since aggregation is %heuristic, results will be different. \fortinline|'AGGR_THRESH'| & \fortinline|real(kind_parameter)| & Any~real \par number~$\in [0, 1]$ & 0.01 & The threshold $\theta$ in the strength of connection algorithm. See also the note at the bottom of this table. \\ \hline \fortinline|'AGGR_FILTER'| & \fortinline|character(len=*)| & \texttt{'FILTER'} \par \texttt{'NOFILTER'} & \texttt{'NOFILTER'} & Matrix used in computing the smoothed prolongator: filtered or unfiltered. % (see~(\ref{eq:filtered}) in Section~\ref{sec:aggregation}) \\ \hline \multicolumn{5}{|l|}{{\bfseries Note.} Different thresholds at different levels, such as those used in \cite[Section~5.1]{VANEK_MANDEL_BREZINA}, can be easily set by invoking the rou-} \\ \multicolumn{5}{|l|}{tine \texttt{set} with the parameter \texttt{ilev}.} \\ \hline \end{tabular} \end{center} \caption{Parameters defining the aggregation algorithm (continued). \label{tab:p_aggregation_1}} \esideways \bsideways \begin{center} \begin{tabular}{|p{3.9cm}|l|p{1.7cm}|p{1.7cm}|p{8.6cm}|} \hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fortinline|'COARSE_MAT'| & \fortinline|character(len=*)| & \fortinline|'DIST'| \par \fortinline|'REPL'| & \fortinline|'REPL'| & Coarsest matrix layout: distributed among the processes or replicated on each of them. \\ \hline \fortinline|'COARSE_SOLVE'| & \fortinline|character(len=*)| & \fortinline|'MUMPS'| \par \fortinline|'UMF'| \par \fortinline|'SLU'| \par \fortinline|'SLUDIST'| \par \fortinline|'JACOBI'| \par \fortinline|'GS'| \par \fortinline|'BJAC'| \par \fortinline|'KRM'| & See~Note. & Solver used at the coarsest level: sequential LU from MUMPS, UMFPACK, or SuperLU (plus tri\-an\-gular solve); distributed LU from MUMPS or SuperLU\_Dist (plus triangular solve); point-Jacobi, hybrid Gauss-Seidel or block-Jacobi and related $\ell_1$-versions; Krylov Method (Flexible Conjugate Gradient) coupled with the block-Jacobi preconditioner with ILU(0) on the blocks. Note that \texttt{UMF} and \texttt{SLU} require the coarsest matrix to be replicated, \texttt{SLUDIST}, \texttt{JACOBI}, \texttt{GS}, \texttt{BJAC} and \texttt{KRM} require it to be distributed, \texttt{MUMPS} can be used with either a replicated or a distributed matrix. When any of the previous solvers is specified, the matrix layout is set to a default value which allows the use of the solver (see Remark 3, p.~24). Note also that UMFPACK and SuperLU\_Dist are available only in double precision. \\ \hline \ifpdf \phantomcaption \end{tabular} \end{center} \esideways \bsideways \ContinuedFloat \begin{center} \begin{tabular}{|p{3.9cm}|l|p{1.7cm}|p{1.7cm}|p{8.6cm}|} \hline \fi \fortinline|'COARSE_SUBSOLVE'| & \fortinline|character(len=*)| & \fortinline|'ILU'| \par \fortinline|'ILUT'| \par \fortinline|'MILU'| \par \fortinline|'MUMPS'| \par \fortinline|'SLU'| \par \fortinline|'UMF'| \par \fortinline|'INVT'| \par \fortinline|'INVK'| \par \fortinline|'AINV'| & See~Note. & Solver for the diagonal blocks of the coarse matrix, in case the block Jacobi solver is chosen as coarsest-level solver: ILU($p$), ILU($p,t$), MILU($p$), LU from MUMPS, SuperLU or UMFPACK (plus triangular solve), Approximate Inverses INVK($p,q$), INVT($p_1,p2,t_1,t_2$) and AINV($t$); note that approximate inverses are specifically suited for GPUs since they do not employ triangular system solve kernels, see~\cite{BERTACCINIFILIPPONE}. Note that UMFPACK and SuperLU\_Dist are available only in double precision. \\ \hline \multicolumn{5}{|l|}{{\bfseries Note.} Defaults for \texttt{COARSE\_SOLVE} and \texttt{COARSE\_SUBSOLVE} are chosen in the following order:} \\ \multicolumn{5}{|l|}{single precision version -- \texttt{MUMPS} if installed, then \texttt{SLU} if installed, \texttt{ILU} otherwise;}\\ \multicolumn{5}{|l|}{double precision version -- \texttt{UMF} if installed, then \texttt{MUMPS} if installed, then \texttt{SLU} if installed, \texttt{ILU} otherwise.}\\ \hline %\end{tabular} %\end{center} %\caption{Parameters defining the coarse-space correction at the coarsest %level.\label{tab:p_coarse}} %\esideways % %\bsideways %\begin{center} %\begin{tabular}{|p{3.9cm}|l|p{2cm}|p{1.5cm}|p{7.5cm}|} %\hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fortinline|'COARSE_SWEEPS'| & \fortinline|integer| & Any integer \par number $> 0$ & 10 & Number of sweeps when \fortinline|JACOBI|, \fortinline|GS| or \fortinline|BJAC| is chosen as coarsest-level solver.\\ \hline \fortinline|'COARSE_FILLIN'| & \fortinline|integer| & Any integer \par number $\ge 0$ & 0 & Fill-in level $p$ of the ILU factorizations and first fill-in for the approximate inverses. \\ \hline \fortinline|'COARSE_ILUTHRS'| & \fortinline|real(kind_parameter)| & Any real \par number $\ge 0$ & 0 & Drop tolerance $t$ in the ILU($p,t$) factorization and first drop-tolerance for the approximate inverses. \\ \hline \multicolumn{5}{|l|}{{\bfseries Note.} Further options for coarse solvers are contained in Table~\ref{tab:p_coarse_2}.} \\ \multicolumn{5}{|l|}{For a first use it is suggested to use the default options obtained by simply selecting the solver type.} \\ \hline \end{tabular} \end{center} \caption{Parameters defining the coarse-space correction at the coarsest level (continued).\label{tab:p_coarse_1}} \esideways \bsideways \begin{center} \begin{tabular}{|p{3.9cm}|l|p{1.7cm}|p{1.7cm}|p{8cm}|} \hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fortinline|'BJAC_STOP'| & \fortinline|character(len=*)| & \fortinline|'FALSE'| \par \fortinline|'TRUE'| & \fortinline|'FALSE'| & Select whether to use a stopping criterion for the Block-Jacobi method used as a coarse solver. \\ \hline \fortinline|'BJAC_TRACE'| & \fortinline|character(len=*)| & \fortinline|'FALSE'| \par \fortinline|'TRUE'| & \fortinline|'FALSE'| & Select whether to print a trace for the calculated residual for the Block-Jacobi method used as a coarse solver. \\ \hline \fortinline|'BJAC_ITRACE'| & \fortinline|integer| & Any integer $>0$ & -1 & Number of iterations after which a trace is to be printed. \\ \hline \fortinline|'BJAC_RESCHECK'|& \fortinline|integer| & Any integer $>0$ & -1 & Number of iterations after which a residual is to be calculated. \\ \hline \fortinline|'BJAC_STOPTOL'| & \fortinline|real(kind_parameter)| & Any real $<1$ & 0 & Tolerance for the stopping criterion on the residual. \\ \hline \hline \end{tabular} \end{center} \caption{Additional parameters defining the coarse-space correction at the coarsest level.\label{tab:p_coarse_2}} \esideways \bsideways \begin{center} \small \begin{tabular}{|p{3.6cm}|l|p{1.9cm}|p{3.6cm}|p{6.5cm}|} \hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fortinline|'SMOOTHER_TYPE'| & \fortinline|character(len=*)| & \fortinline|'JACOBI'| \par \fortinline|'GS'| \par \fortinline|'BGS'| \par \fortinline|'BJAC'| \par \fortinline|'AS'| \par \fortinline|'L1-JACOBI'| \par \fortinline|'L1-BJAC'| \par \fortinline|'L1-FBGS'| & \fortinline|'FBGS'| & Type of smoother used in the multilevel preconditioner: point-Jacobi, hybrid (forward) Gauss-Seidel, hybrid backward Gauss-Seidel, block-Jacobi, $\ell_1$-Jacobi, $\ell_1$--hybrid (forward) Gauss-Seidel, $\ell_1$-point-Jacobi and Additive Schwarz. \par It is ignored by one-level preconditioners. \\ \hline \fortinline|'SUB_SOLVE'| & \fortinline|character(len=*)| & \fortinline|'JACOBI'| \fortinline|'GS'| \par \fortinline|'BGS'| \par \fortinline|'ILU'| \par \fortinline|'ILUT'| \par \fortinline|'MILU'| \par \par \fortinline|'MUMPS'| \par \fortinline|'SLU'| \par \fortinline|'UMF'| \par \fortinline|'INVT'| \par \fortinline|'INVK'| \par \fortinline|'AINV'| & \texttt{GS} and \texttt{BGS} for pre- and post-smoothers of multilevel preconditioners, respectively \par \texttt{ILU} for block-Jacobi and Additive Schwarz one-level preconditioners & The local solver to be used with the smoother or one-level preconditioner (see Remark~2, page~24): point-Jacobi, hybrid (forward) Gauss-Seidel, hybrid backward Gauss-Seidel, ILU($p$), ILU($p,t$), MILU($p$), LU from MUMPS, SuperLU or UMFPACK (plus triangular solve), Approximate Inverses INVK($p,q$), INVT($p_1,p2,t_1,t_2$) and AINV($t$); note that approximate inverses are specifically suited for GPUs since they do not employ triangular system solve kernels, see~\cite{BERTACCINIFILIPPONE}. See Note for details on hybrid Gauss-Seidel. \\ \hline \fortinline|'SMOOTHER_SWEEPS'| & \fortinline|integer| & Any integer \par number~$\ge 0$ & 1 & Number of sweeps of the smoother or one-level preconditioner. In the multilevel case, no pre-smother or post-smoother is used if this parameter is set to 0 together with \fortinline|pos='PRE'| or \fortinline|pos='POST'|, respectively. \\ \hline \fortinline|'SUB_OVR'| & \fortinline|integer| & Any integer \par number~$\ge 0$ & 1 & Number of overlap layers, for Additive Schwarz only. \\ \hline \end{tabular} \end{center} \caption{Parameters defining the smoother or the details of the one-level preconditioner. \label{tab:p_smoother}} \esideways \bsideways \begin{center} \small \begin{tabular}{|p{3.2cm}|l|p{2.6cm}|p{2.6cm}|p{6.7cm}|} \hline \fortinline|what| & \textsc{data type} & \fortinline|val| & \textsc{default} & \textsc{comments} \\ \hline \fortinline|'SUB_RESTR'| & \fortinline|character(len=*)| & \fortinline|'HALO'| \par \fortinline|'NONE'| & \fortinline|'HALO'| & Type of restriction operator, for Additive Schwarz only: \texttt{HALO} for taking into account the overlap, \fortinline|'NONE'| for neglecting it. \par Note that \texttt{HALO} must be chosen for the classical Addditive Schwarz smoother and its RAS variant.\\ \hline \fortinline|'SUB_PROL'| & \fortinline|character(len=*)| & \fortinline|'SUM'| \par \fortinline|'NONE'| & \fortinline|'NONE'| & Type of prolongation operator, for Additive Schwarz only: \fortinline|'SUM'| for adding the contributions from the overlap, \fortinline|'NONE'| for neglecting them. \par Note that \fortinline|'SUM'| must be chosen for the classical Additive Schwarz smoother, and \fortinline|'NONE'| for its RAS variant. \\ \hline \fortinline|'SUB_FILLIN'| & \fortinline|integer| & Any integer \par number~$\ge 0$ & 0 & Fill-in level $p$ of the incomplete LU factorizations. \\ \hline \fortinline|'SUB_ILUTHRS'| & \fortinline|real(kind_parameter)| & Any real number~$\ge 0$ & 0 & Drop tolerance $t$ in the ILU($p,t$) factorization. \\ \hline \fortinline|'MUMPS_LOC_GLOB'| & \fortinline|character(len=*)| & \fortinline|'LOCAL_SOLVER'| \par \fortinline|'GLOBAL_SOLVER'| & \fortinline|'GLOBAL_SOLVER'| & Whether MUMPS should be used as a distributed solver, or as a serial solver acting only on the part of the matrix local to each process. \\ \hline \fortinline|'MUMPS_IPAR_ENTRY'| & \fortinline|integer| & Any integer number & 0 & Set an entry in the MUMPS integer control array, as chosen via the \fortinline|idx| optional argument. \\ \hline \fortinline|'MUMPS_RPAR_ENTRY'| & \fortinline|real| & Any real number & 0 & Set an entry in the MUMPS real control array, as chosen via the \fortinline|idx| optional argument. \\ %\hline \hline \end{tabular} \end{center} \caption{Parameters defining the smoother or the details of the one-level preconditioner (continued).\label{tab:p_smoother_1}} \esideways \clearpage \subsection{Method hierarchy\_build\label{sec:hier_bld}} \begin{center} \fortinline|call p%hierarchy_build(a,desc_a,info)|\\ \end{center} \noindent This method builds the hierarchy of matrices and restriction/prolongation operators for the multilevel preconditioner \fortinline|p|, according to the requirements made by the user through the methods \fortinline|init| and \fortinline|set|. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|a| & \fortinline|type(psb_|\emph{x}\fortinline|spmat_type), intent(in)|. \\ & The sparse matrix structure containing the local part of the matrix to be preconditioned. Note that \emph{x} must be chosen according to the \fortinline|real|/\fortinline|complex|, single/double precision version of AMG4PSBLAS under use. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}.\\ \fortinline|desc_a| & \fortinline|type(psb_desc_type), intent(in)|. \\ & The communication descriptor of \fortinline|a|. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}.\\ \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \end{tabular} \clearpage \subsection{Method smoothers\_build\label{sec:smooth_bld}} \begin{center} \fortinline|call p%smoothers_build(a,desc_a,p,info[,amold,vmold,imold])|\\ \end{center} \noindent This method builds the smoothers and the coarsest-level solvers for the multilevel preconditioner \fortinline|p|, according to the requirements made by the user through the methods \fortinline|init| and \fortinline|set|, and based on the aggregation hierarchy produced by a previous call to \fortinline|hierarchy_build| (see Section~\ref{sec:hier_bld}). {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|a| & \fortinline|type(psb_|\emph{x}\fortinline|spmat_type), intent(in)|. \\ & The sparse matrix structure containing the local part of the matrix to be preconditioned. Note that \emph{x} must be chosen according to the \fortinline|real|/\fortinline|complex|, single/double precision version of AMG4PSBLAS under use. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}.\\ \fortinline|desc_a| & \fortinline|type(psb_desc_type), intent(in)|. \\ & The communication descriptor of \fortinline|a|. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}.\\ \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \fortinline|amold| & \fortinline|class(psb_|\emph{x}\fortinline|_base_sparse_mat), intent(in), optional|. \\ & The desired dynamic type for internal matrix components; this allows e.g. running on GPUs; it needs not be the same on all processes. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}. \\ \fortinline|vmold| & \fortinline|class(psb_|\emph{x}\fortinline|_base_vect_type), intent(in), optional|. \\ & The desired dynamic type for internal vector components; this allows e.g. running on GPUs. \\ \fortinline|imold| & \fortinline|class(psb_i_base_vect_type), intent(in), optional|. \\ & The desired dynamic type for internal integer vector components; this allows e.g. running on GPUs. \\ \end{tabular} \clearpage \subsection{Method build\label{sec:precbld}} \begin{center} \fortinline|call p%build(a,desc_a,info[,amold,vmold,imold])|\\ \end{center} \noindent This method builds the preconditioner \fortinline|p| according to the requirements made by the user through the methods \fortinline|init| and \fortinline|set| (see Sections~\ref{sec:hier_bld} and~\ref{sec:smooth_bld} for multilevel preconditioners). It is mostly provided for backward compatibility; indeed, it is internally implemented by invoking the two previous methods \fortinline|hierarchy_build| and \fortinline|smoothers_build|, whose nomenclature would however be somewhat unnatural when dealing with simple one-level preconditioners. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|a| & \fortinline|type(psb_|\emph{x}\fortinline|spmat_type), intent(in)|. \\ & The sparse matrix structure containing the local part of the matrix to be preconditioned. Note that \emph{x} must be chosen according to the \fortinline|real|/\fortinline|complex|, single/double precision version of AMG4PSBLAS under use. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}.\\ \fortinline|desc_a| & \fortinline|type(psb_desc_type), intent(in)|. \\ & The communication descriptor of \fortinline|a|. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}.\\ \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \fortinline|amold| & \fortinline|class(psb_|\emph{x}\fortinline|_base_sparse_mat), intent(in), optional|. \\ & The desired dynamic type for internal matrix components; this allows e.g. running on GPUs; it needs not be the same on all processes. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}. \\ \fortinline|vmold| & \fortinline|class(psb_|\emph{x}\fortinline|_base_vect_type), intent(in), optional|. \\ & The desired dynamic type for internal vector components; this allows e.g. running on GPUs. \\ \fortinline|imold| & \fortinline|class(psb_i_base_vect_type), intent(in), optional|. \\ & The desired dynamic type for internal integer vector components; this allows e.g. running on GPUs. \\ \end{tabular} \noindent The method can be used to build multilevel preconditioners too. \clearpage \subsection{Method apply\label{sec:precapply}} \begin{center} \fortinline|call p%apply(x,y,desc_a,info [,trans,work])|\\ \end{center} \noindent This method computes $y = op(B^{-1})\, x$, where $B$ is a previously built preconditioner, stored into \fortinline|p|, and $op$ denotes the preconditioner itself or its transpose, according to the value of \fortinline|trans|. Note that, when AMG4PSBLAS is used with a Krylov solver from PSBLAS, \fortinline|p%apply| is called within the PSBLAS method \fortinline|psb_krylov| and hence it is completely transparent to the user. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|x| & \emph{type}\fortinline|(kind_parameter)|, dimension(:), intent(in)|.\\ & The local part of the vector $x$. Note that \emph{type} and \emph{kind\_parameter} must be chosen according to the \fortinline|real|/\fortinline|complex|, single/double precision version of AMG4PSBLAS under use.\\ \fortinline|y| & \emph{type}\fortinline|(kind_parameter)|, dimension(:), intent(out)|.\\ & The local part of the vector $y$. Note that \emph{type} and \emph{kind\_parameter} must be chosen according to the \fortinline|real|/\fortinline|complex|, single/double precision version of AMG4PSBLAS under use.\\ \fortinline|desc_a| & \fortinline|type(psb_desc_type), intent(in)|. \\ & The communication descriptor associated to the matrix to be preconditioned.\\ \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \fortinline|trans| & \fortinline|character(len=1), optional, intent(in).|\\ & If \fortinline|trans| = \fortinline|'N','n'| then $op(B^{-1}) = B^{-1}$; if \fortinline|trans| = \fortinline|'T','t'| then $op(B^{-1}) = B^{-T}$ (transpose of $B^{-1})$; if \fortinline|trans| = \fortinline|'C','c'| then $op(B^{-1}) = B^{-C}$ (conjugate transpose of $B^{-1})$.\\ \fortinline|work| & \emph{type}\fortinline|(kind_parameter)|, dimension(:), optional, target|.\\ & Workspace. Its size should be at least \fortinline|4 * psb_cd_get_local_| \fortinline|cols(desc_a)| (see the PSBLAS User's Guide). Note that \emph{type} and \emph{kind\_parameter} must be chosen according to the \fortinline|real|/\fortinline|complex|, single/double precision version of AMG4PSBLAS under use.\\ \end{tabular} \clearpage \subsection{Method free\label{sec:precfree}} \begin{center} \fortinline|call p%free(p,info)|\\ \end{center} \noindent This method deallocates the preconditioner data structure \fortinline|p|. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{10.5cm}} \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \end{tabular} \clearpage \subsection{Method descr\label{sec:precdescr}} \begin{center} \fortinline|call p%descr(info, [iout, root, verbosity])|\\ \end{center} \noindent This method prints a description of the preconditioner \fortinline|p| to the standard output or to a file. It must be called after \fortinline|hierachy_build| and \fortinline|smoothers_build|, or \fortinline|build|, have been called. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{2.2cm}p{11cm}} \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \fortinline|iout| & \fortinline|integer, intent(in), optional|.\\ & The id of the file where the preconditioner description will be printed; the default is the standard output.\\ \fortinline|root| & \fortinline|integer, intent(in), optional|.\\ & The id of the process where the preconditioner description will be printed; the default is \fortinline|psb_root_|.\\ \fortinline|verbosity| & \fortinline|integer, intent(in), optional|.\\ & The verbosity level of the description. Default value is 0. For values higher than 0, it prints out further information, e.g., for a distributed multilevel preconditioner the size of the coarse matrices on every process.\\ \end{tabular} \subsection{Auxiliary Methods\label{sec:auxil}} Various functionalities are implemented as additional methods of the preconditioner object. \subsubsection{Method: dump} \begin{center} \fortinline|call p%dump(info[,istart,iend,prefix,head,ac,rp,smoother,solver,global_num])|\\ \end{center} \noindent Dump on file. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \fortinline|amold| & \fortinline|class(psb_|\emph{x}\fortinline|_base_sparse_mat), intent(in), optional|. \\ & The desired dynamic type for internal matrix components; this allows e.g. running on GPUs; it needs not be the same on all processes. See the PSBLAS User's Guide for details \cite{PSBLASGUIDE}. \\ \end{tabular} \subsubsection{Method: clone} \begin{center} \fortinline|call p%clone(pout,info)|\\ \end{center} \noindent Create a (deep) copy of the preconditioner object. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|pout| & \fortinline|type(amg_|\emph{x}\fortinline|prec_type), intent(out)|.\\ & The copy of the preconditioner data structure. Note that \emph{x} must be chosen according to the \fortinline|real|/\fortinline|complex|, single/double precision version of AMG4PSBLAS under use.\\ \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \end{tabular} \subsubsection{Method: sizeof} \begin{center} \fortinline|sz = p%sizeof([global])|\\ \end{center} \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|global| & \fortinline|logical, optional|.\\ & Whether the global or local preconditioner memory occupatio is desired. Default: \fortinline|.false.|.\\ \end{tabular} \noindent Return memory footprint in bytes. \subsubsection{Method: allocate\_wrk} \begin{center} \fortinline|call p%allocate_wrk(info[, vmold])|\\ \end{center} n \noindent Allocate internal work vectors. Each application of the preconditioner uses a number of work vectors which are allocated internally as necessary; therefore allocation and deallocation of memory occurs multiple times during the execution of a Krylov method. In most cases this strategy is perfectly acceptable, but on some platforms, most notably GPUs, memory allocation is a slow operation, and the default behaviour would lead to a slowdown. This method allows to trade space for time by preallocating the internal workspace outside of the invocation of a Krylov method. When using GPUs or other specialized devices, the \fortinline|vmold| argument is also necessary to ensure the internal work vectors are of the appropriate dynamic type to exploit the accelerator hardware; when allocation occurs internally this is taken care of based on the dynamic type of the \fortinline|x| argument to the \fortinline|apply| method. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \fortinline|vmold| & \fortinline|class(psb_|\emph{x}\fortinline|_base_vect_type), intent(in), optional|. \\ & The desired dynamic type for internal vector components; this allows e.g. running on GPUs. \\ \end{tabular} \subsubsection{Method: free\_wrk} \begin{center} \fortinline|call p%free_wrk(info)|\\ \end{center} \noindent Deallocate internal work vectors. {\vskip1.5\baselineskip\noindent\large\bfseries Arguments} \smallskip \begin{tabular}{p{1.2cm}p{12cm}} \fortinline|info| & \fortinline|integer, intent(out)|.\\ & Error code. If no error, 0 is returned. See Section~\ref{sec:errors} for details.\\ \end{tabular} %%% Local Variables: %%% mode: latex %%% TeX-master: "userguide" %%% End: