\section { Getting Started\label { sec:started} }
\markboth { \textsc { AMG4PSBLAS User's and Reference Guide} }
{ \textsc { \ref { sec:started} Getting Started} }
This section describes the basics for building and applying
AMG4PSBLAS one-level and multilevel (i.e., AMG) preconditioners with
the Krylov solvers included in PSBLAS~\cite { PSBLASGUIDE} .
The following steps are required:
\begin { enumerate}
\item \emph { Declare the preconditioner data structure} . It is a derived data type,
\verb |amg_ |\- \emph { x} \verb |prec_ | \verb |type|, where \emph { x} may be
\verb |s|, \verb |d|, \verb |c| or \verb |z|, according to the basic
data type of the sparse matrix (\verb |s| = real single precision;
\verb |d| = real double precision; \verb |c| = complex single
precision; \verb |z| = complex double precision). This data
structure is accessed by the user only through the AMG4PSBLAS
routines, following an object-oriented approach.
\item \emph { Allocate and initialize the preconditioner data structure,
according to a preconditioner type chosen by the user} . This is
performed by the routine \fortinline |init|, which also sets
defaults for each preconditioner type selected by the user. The
preconditioner types and the defaults associated with them are
given in Table~\ref { tab:precinit} , where the strings used by
\fortinline |init| to identify the preconditioner types are also
given. Note that these strings are valid also if uppercase
letters are substituted by corresponding lowercase ones.
\item \emph { Modify the selected preconditioner type, by properly
setting preconditioner parameters.} This is performed by the
routine \fortinline |set|. This routine must be called if the
user wants to modify the default values of the parameters
associated with the selected preconditioner type, to obtain a
variant of that preconditioner. Examples of use of
\fortinline |set| are given in Section~\ref { sec:examples} ; a
complete list of all the preconditioner parameters and their
allowed and default values is provided in
Section~\ref { sec:userinterface} ,
Tables~\ref { tab:p_ cycle} -\ref { tab:p_ smoother_ 1} .
\item \emph { Build the preconditioner for a given matrix} . If the selected preconditioner
is multilevel, then two steps must be performed, as specified next.
\begin { enumerate}
\item [4.1] \emph { Build the AMG hierarchy for a given matrix.} This is
performed by the routine \fortinline |hierarchy_ build|.
\item [4.2] \emph { Build the preconditioner for a given matrix.} This is performed
by the routine \fortinline |smoothers_ build|.
\end { enumerate}
If the selected preconditioner is one-level, it is built in a single step,
performed by the routine \fortinline |bld|.
\item \emph { Apply the preconditioner at each iteration of a Krylov solver.}
This is performed by the method \fortinline |apply|. When using the PSBLAS Krylov solvers,
this step is completely transparent to the user, since \fortinline |apply| is called
by the PSBLAS routine implementing the Krylov solver (\fortinline |psb_ krylov|).
\item \emph { Free the preconditioner data structure} . This is performed by
the routine \fortinline |free|. This step is complementary to step 1 and should
be performed when the preconditioner is no more used.
\end { enumerate}
All the previous routines are available as methods of the preconditioner object.
A detailed description of them is given in Section~\ref { sec:userinterface} .
Examples showing the basic use of AMG4PSBLAS are reported in Section~\ref { sec:examples} .
\begin { table} [h!]
\begin { center}
%{\small
\begin { tabular} { |l|p{ 2cm} |p{ 6.8cm} |}
\hline
\textsc { type} & \textsc { string} & \textsc { default preconditioner} \\ \hline
No preconditioner & \fortinline |'NONE'|& Considered to use the PSBLAS
Krylov solvers with no preconditioner. \\ \hline
Diagonal & \fortinline |'DIAG'|, \fortinline |'JACOBI'|, \fortinline |'L1-JACOBI'| & Diagonal preconditioner.
For any zero diagonal entry of the matrix to be preconditioned,
the corresponding entry of the preconditioner is set to~1.\\ \hline
Gauss-Seidel & \fortinline |'GS'|, \fortinline |'L1-GS'| & Hybrid Gauss-Seidel (forward), that is,
global block Jacobi with
Gauss-Seidel as local solver.\\ \hline
Symmetrized Gauss-Seidel & \fortinline |'FBGS'|, \fortinline |'L1-FBGS'| & Symmetrized hybrid Gauss-Seidel, that is,
forward Gauss-Seidel followed by
backward Gauss-Seidel.\\ \hline
Block Jacobi & \fortinline |'BJAC'|, \fortinline |'L1-BJAC'| & Block-Jacobi with ILU(0) on the local blocks.\\ \hline
Additive Schwarz & \fortinline |'AS'| & Additive Schwarz (AS),
with overlap~1 and ILU(0) on the local blocks. \\ \hline
Multilevel & \fortinline |'ML'| & V-cycle with one hybrid forward Gauss-Seidel
(GS) sweep as pre-smoother and one hybrid backward
GS sweep as post-smoother, decoupled smoothed aggregation
as coarsening algorithm, and LU (plus triangular solve)
as coarsest-level solver. See the default values in
Tables~\ref { tab:p_ cycle} -\ref { tab:p_ smoother_ 1}
for further details of the preconditioner. \\
\hline
\end { tabular}
%}
\caption { Preconditioner types, corresponding strings and default choices.
\label { tab:precinit} }
\end { center}
\end { table}
Note that the module \fortinline |amg_ prec_ mod|, containing the definition of the
preconditioner data type and the interfaces to the routines of AMG4PSBLAS,
must be used in any program calling such routines.
The modules \fortinline |psb_ base_ mod|, for the sparse matrix and communication descriptor
data types, and \fortinline |psb_ krylov_ mod|, for interfacing with the
Krylov solvers, must be also used (see Section~\ref { sec:examples} ). \\
\textbf { Remark 1.} Coarsest-level solvers based on the LU factorization,
such as those implemented in UMFPACK, MUMPS, SuperLU, and SuperLU\_ Dist,
usually lead to smaller numbers of preconditioned Krylov
iterations than inexact solvers, when the linear system comes from
a standard discretization of basic scalar elliptic PDE problems. However,
this does not necessarily correspond to the shortest execution time
on parallel~computers.
\subsection { Examples\label { sec:examples} }
The code reported in Figure~\ref { fig:ex1} shows how to set and apply the default
multilevel preconditioner available in the real double precision version
of AMG4PSBLAS (see Table~\ref { tab:precinit} ). This preconditioner is chosen
by simply specifying \fortinline |'ML'| as the second argument of \fortinline |P%init|
(a call to \fortinline |P%set| is not needed) and is applied with the CG
solver provided by PSBLAS (the matrix of the system to be solved is
assumed to be positive definite). As previously observed, the modules
\fortinline |psb_ base_ mod|, \fortinline |amg_ prec_ mod| and \fortinline |psb_ krylov_ mod|
must be used by the example program.
The part of the code dealing with reading and assembling the sparse
matrix and the right-hand side vector and the deallocation of the
relevant data structures, performed
through the PSBLAS routines for sparse matrix and vector management, is not reported
here for the sake of conciseness.
The complete code can be found in the example program file \verb |amg_ dexample_ ml.f90|,
in the directory \verb |samples/simple/file|\- \verb |read| of the AMG4PSBLAS implementation (see
Section~\ref { sec:ex_ and_ test} ). A sample test problem along with the relevant
input data is available in \verb |samples/simple/fileread/runs|.
For details on the use of the PSBLAS routines, see the PSBLAS User's
Guide~\cite { PSBLASGUIDE} .
The setup and application of the default multilevel preconditioner
for the real single precision and the complex, single and double
precision, versions are obtained with straightforward modifications of the previous
example (see Section~\ref { sec:userinterface} for details). If these versions are installed,
the corresponding codes are available in \verb |samples/simple/file|\- \verb |read|.
\begin { listing} [tbp]
\begin { center}
\begin { minipage} { .90\textwidth }
\ifpdf
\begin { minted} [breaklines=true,bgcolor=bg,fontsize=\small ]{ fortran}
use psb_ base_ mod
use amg_ prec_ mod
use psb_ krylov_ mod
... ...
!
! sparse matrix
type(psb_ dspmat_ type) :: A
! sparse matrix descriptor
type(psb_ desc_ type) :: desc_ A
! preconditioner
type(amg_ dprec_ type) :: P
! right-hand side and solution vectors
type(psb_ d_ vect_ type) :: b, x
... ...
!
! initialize the parallel environment
call psb_ init(ctxt)
call psb_ info(ctxt,iam,np)
... ...
!
! read and assemble the spd matrix A and the right-hand side b
! using PSBLAS routines for sparse matrix / vector management
... ...
!
! initialize the default multilevel preconditioner, i.e. V-cycle
! with basic smoothed aggregation, 1 hybrid forward/backward
! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
! solver
call P%init(ctxt,'ML',info)
!
! build the preconditioner
call P%hierarchy_build(A,desc_A,info)
call P%smoothers_build(A,desc_A,info)
!
! set the solver parameters and the initial guess
... ...
!
! solve Ax=b with preconditioned FCG
call psb_ krylov('FCG',A,P,b,x,tol,desc_ A,info)
... ...
!
! deallocate the preconditioner
call P%free(info)
!
! deallocate other data structures
... ...
!
! exit the parallel environment
call psb_ exit(ctxt)
stop
\end { minted}
\else
{ \small
\begin { verbatim}
use psb_ base_ mod
use amg_ prec_ mod
use psb_ krylov_ mod
... ...
!
! sparse matrix
type(psb_ dspmat_ type) :: A
! sparse matrix descriptor
type(psb_ desc_ type) :: desc_ A
! preconditioner
type(amg_ dprec_ type) :: P
! right-hand side and solution vectors
type(psb_ d_ vect_ type) :: b, x
... ...
!
! initialize the parallel environment
call psb_ init(ctxt)
call psb_ info(ctxt,iam,np)
... ...
!
! read and assemble the spd matrix A and the right-hand side b
! using PSBLAS routines for sparse matrix / vector management
... ...
!
! initialize the default multilevel preconditioner, i.e. V-cycle
! with basic smoothed aggregation, 1 hybrid forward/backward
! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
! solver
call P%init(ctxt,'ML',info)
!
! build the preconditioner
call P%hierarchy_build(A,desc_A,info)
call P%smoothers_build(A,desc_A,info)
!
! set the solver parameters and the initial guess
... ...
!
! solve Ax=b with preconditioned FCG
call psb_ krylov('FCG',A,P,b,x,tol,desc_ A,info)
... ...
!
! deallocate the preconditioner
call P%free(info)
!
! deallocate other data structures
... ...
!
! exit the parallel environment
call psb_ exit(ctxt)
stop
\end { verbatim}
}
\fi
\end { minipage}
\caption { setup and application of the default multilevel preconditioner (example 1).
\label { fig:ex1} }
\end { center}
\end { listing}
Different versions of the multilevel preconditioner can be obtained by changing
the default values of the preconditioner parameters. The code reported in
Figure~\ref { fig:ex2} shows how to set a V-cycle preconditioner
which applies 1 block-Jacobi sweep as pre- and post-smoother,
and solves the coarsest-level system with 8 block-Jacobi sweeps.
Note that the ILU(0) factorization (plus triangular solve) is used as
local solver for the block-Jacobi sweeps, since this is the default associated
with block-Jacobi and set by~\fortinline |P%init|.
Furthermore, specifying block-Jacobi as coarsest-level
solver implies that the coarsest-level matrix is distributed
among the processes.
Figure~\ref { fig:ex3} shows how to set a W-cycle preconditioner using the Coarsening based on Compatible Weighted Matching, aggregates of size at most $ 8 $ and smoothed prolongators. It applies
2 hybrid Gauss-Seidel sweeps as pre- and post-smoother,
and solves the coarsest-level system with the parallel flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi preconditioner having ILU(0) on the blocks. Default parameters are used for stopping criterion of the coarsest solver.
Note that, also in this case, specifying KRM as coarsest-level
solver implies that the coarsest-level matrix is distributed
among the processes.
%It is specified that the coarsest-level
%matrix is distributed, since MUMPS can be used on both
%replicated and distributed matrices, and by default
%it is used on replicated ones.
%Note the use of the parameter \fortinline|pos|
%to specify a property only for the pre-smoother or the post-smoother
%(see Section~\ref{sec:precset} for more details).
The code fragments shown in Figures~\ref { fig:ex2} and \ref { fig:ex3} are
included in the example program file \verb |amg_ dexample_ ml.f90| too.
Finally, Figure~\ref { fig:ex4} shows the setup of a one-level
additive Schwarz preconditioner, i.e., RAS with overlap 2.
Note also that a Krylov method different from CG must be used to solve
the preconditioned system, since the preconditione in nonsymmetric.
The corresponding example program is available in the file
\verb |amg_ dexample_ 1lev.f90|.
For all the previous preconditioners, example programs where the sparse matrix and
the right-hand side are generated by discretizing a PDE with Dirichlet
boundary conditions are also available in the directory \verb |samples/simple/pdegen|.
\vspace { -1em} \begin { listing} [tbh]
\ifpdf %
\begin { minted} [breaklines=true,bgcolor=bg,fontsize=\small ]{ fortran}
! build a V-cycle preconditioner with 1 block-Jacobi sweep (with
! ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi
! sweeps (with ILU(0) on the blocks) as coarsest-level solver
call P%init(ctxt,'ML',info)
call P%set('SMOOTHER_TYPE','BJAC',info)
call P%set('COARSE_SOLVE','BJAC',info)
call P%set('COARSE_SWEEPS',8,info)
call P%hierarchy_build(A,desc_A,info)
call P%smoothers_build(A,desc_A,info)
\end { minted}
\else %
\begin { center}
\begin { minipage} { .90\textwidth }
{ \small
\begin { verbatim}
... ...
! build a V-cycle preconditioner with 1 block-Jacobi sweep (with
! ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi
! sweeps (with ILU(0) on the blocks) as coarsest-level solver
call P%init(ctxt,'ML',info)
call P%set('SMOOTHER_TYPE','BJAC',info)
call P%set('COARSE_SOLVE','BJAC',info)
call P%set('COARSE_SWEEPS',8,info)
call P%hierarchy_build(A,desc_A,info)
call P%smoothers_build(A,desc_A,info)
... ...
\end { verbatim}
}
\end { minipage}
\end { center}
\fi \vspace { -2em} %
\caption { setup of a multilevel preconditioner based on the default decoupled coarsening\label { fig:ex2} }
\end { listing} \vspace * { -2em}
\begin { listing} [h!]
\ifpdf
\begin { minted} [breaklines=true,bgcolor=bg,fontsize=\small ]{ fortran}
!build a W-cycle using the coupled coarsening based on weighted matching,
!aggregates of size at most 8 and smoothed prolongators,
!2 hybrid Gauss-Seidel sweeps as pre- and post-smoother,
!and parallel flexible Conjugate Gradient coupled with the block-Jacobi
!preconditioner having ILU(0) on the blocks as coarsest solver.
call P%init(ctxt,'ML',info)
call P%set('PAR_AGGR_ALG','COUPLED',info)
call P%set('AGGR_TYPE','MATCHBOXP',info)
call P%set('AGGR_SIZE',8,info)
call P%set('ML_CYCLE','WCYCLE',info)
call P%set('SMOOTHER_TYPE','FBGS',info)
call P%set('SMOOTHER_SWEEPS',2,info)
call P%set('COARSE_SOLVE','KRM',info)
call P%set('COARSE_MAT','DIST',info)
call P%set('KRM_METHOD','FCG',info)
call P%hierarchy_build(A,desc_A,info)
call P%smoothers_build(A,desc_A,info)
\end { minted}
\else
\begin { center}
\begin { minipage} { .90\textwidth }
{ \small
\begin { verbatim}
... ...
! build a W-cycle preconditioner with 2 hybrid Gauss-Seidel sweeps
! as pre- and post-smoother, a distributed coarsest
! matrix, and MUMPS as coarsest-level solver
call P%init(ctxt,'ML',info)
call P%set('PAR_AGGR_ALG','COUPLED',info)
call P%set('AGGR_TYPE','MATCHBOXP',info)
call P%set('AGGR_SIZE',8,info)
call P%set('ML_CYCLE','WCYCLE',info)
call P%set('SMOOTHER_TYPE','FBGS',info)
call P%set('SMOOTHER_SWEEPS',2,info)
call P%set('COARSE_SOLVE','KRM',info)
call P%set('COARSE_MAT','DIST',info)
call P%set('KRM_METHOD','FCG',info)
call P%hierarchy_build(A,desc_A,info)
call P%smoothers_build(A,desc_A,info)
... ...
\end { verbatim}
}
\end { minipage}
\end { center}
\fi \vspace { -2em} %
\caption { setup of a multilevel preconditioner based on the coupled coarsening using weighted matching\label { fig:ex3} }
\end { listing} \vspace * { -2em}
\begin { listing} [h!]
\ifpdf
\begin { minted} [breaklines=true,bgcolor=bg,fontsize=\small ]{ fortran}
! build a one-level RAS with overlap 2 and ILU(0) on the local blocks.
call P%init(ctxt,'AS',info)
call P%set('SUB_OVR',2,info)
call P%build(A,desc_A,info)
... ...
! solve Ax=b with preconditioned BiCGSTAB
call psb_ krylov('BICGSTAB',A,P,b,x,tol,desc_ A,info)
\end { minted}
\else
\begin { center}
\begin { minipage} { .90\textwidth }
{ \small
\begin { verbatim}
... ...
! set RAS with overlap 2 and ILU(0) on the local blocks
call P%init(ctxt,'AS',info)
call P%set('SUB_OVR',2,info)
call P%bld(A,desc_A,info)
... ...
! solve Ax=b with preconditioned BiCGSTAB
call psb_ krylov('BICGSTAB',A,P,b,x,tol,desc_ A,info)
\end { verbatim}
}
\end { minipage}
\end { center}
\fi \vspace { -2em} %
\caption { setup of a one-level Schwarz preconditioner.\label { fig:ex4} }
\end { listing}
\subsection { GPU example\label { sec:gpu-example} }
The code discussed here shows how to set up a
program exploiting the combined GPU capabilities of PSBLAS and
AMG4PSBLAS. The code example is available in the source distribution
directory \verb |amg4psblas/examples/gpu|.
First of all, we need to include the appropriate modules and
declare some auxiliary variables:
\begin { listing} [h!]
\ifpdf
\begin { minted} [breaklines=true,bgcolor=bg,fontsize=\small ]{ fortran}
program amg_ dexample_ gpu
use psb_ base_ mod
use amg_ prec_ mod
use psb_ krylov_ mod
use psb_ util_ mod
use psb_ gpu_ mod
use data_ input
use amg_ d_ pde_ mod
implicit none
.......
! GPU variables
type(psb_ d_ hlg_ sparse_ mat) :: agmold
type(psb_ d_ vect_ gpu) :: vgmold
type(psb_ i_ vect_ gpu) :: igmold
\end { minted}
\else
\begin { center}
\begin { minipage} { .90\textwidth }
{ \small
\begin { verbatim}
program amg_ dexample_ gpu
use psb_ base_ mod
use amg_ prec_ mod
use psb_ krylov_ mod
use psb_ util_ mod
use psb_ gpu_ mod
use data_ input
use amg_ d_ pde_ mod
implicit none
.......
! GPU variables
type(psb_ d_ hlg_ sparse_ mat) :: agmold
type(psb_ d_ vect_ gpu) :: vgmold
type(psb_ i_ vect_ gpu) :: igmold
\end { verbatim}
}
\end { minipage}
\end { center}
\fi
\caption { setup of a GPU-enabled test program part one.\label { fig:gpu-ex1} }
\end { listing}
In this particular example we are choosing to employ a \verb |HLG| data
structure for sparse matrices on GPUs; for more information please
refer to the PSBLAS-EXT users' guide.
We then have to initialize the GPU environment, and pass the
appropriate MOLD variables to the build methods (see also the PSBLAS
and PSBLAS-EXT users' guides).
\begin { listing} [h!]
\ifpdf
\begin { minted} [breaklines=true,bgcolor=bg,fontsize=\small ]{ fortran}
call psb_ init(ctxt)
call psb_ info(ctxt,iam,np)
!
! BEWARE: if you have NGPUS per node, the default is to
! attach to mod(IAM,NGPUS)
!
call psb_ gpu_ init(ictxt)
......
t1 = psb_ wtime()
call prec%smoothers_build(a,desc_a,info, amold=agmold, vmold=vgmold, imold=igmold)
\end { minted}
\else
\begin { center}
\begin { minipage} { .90\textwidth }
{ \small
\begin { verbatim}
call psb_ init(ctxt)
call psb_ info(ctxt,iam,np)
!
! BEWARE: if you have NGPUS per node, the default is to
! attach to mod(IAM,NGPUS)
!
call psb_ gpu_ init(ictxt)
......
t1 = psb_ wtime()
call prec%smoothers_build(a,desc_a,info, amold=agmold, vmold=vgmold, imold=igmold)
\end { verbatim}
}
\end { minipage}
\end { center}
\fi
\caption { setup of a GPU-enabled test program part two.\label { fig:gpu-ex2} }
\end { listing}
Finally, we convert the input matrix, the descriptor and the vectors
to use a GPU-enabled internal storage format.
We then preallocate the preconditioner workspace before entering the
Krylov method. At the end of the code, we close the GPU environment
\begin { listing} [h!]
\ifpdf
\begin { minted} [breaklines=true,bgcolor=bg,fontsize=\small ]{ fortran}
call desc_ a%cnv(mold=igmold)
call a%cscnv(info,mold=agmold)
call psb_ geasb(x,desc_ a,info,mold=vgmold)
call psb_ geasb(b,desc_ a,info,mold=vgmold)
!
! iterative method parameters
!
call psb_ barrier(ctxt)
call prec%allocate_wrk(info)
t1 = psb_ wtime()
call psb_ krylov(s_ choice%kmethd,a,prec,b,x,s_choice%eps,&
& desc_ a,info,itmax=s_ choice%itmax,iter=iter,err=err,&
& itrace=s_ choice%itrace,&
& istop=s_ choice%istopc,irst=s_choice%irst)
call prec%deallocate_wrk(info)
call psb_ barrier(ctxt)
tslv = psb_ wtime() - t1
......
call psb_ gpu_ exit()
call psb_ exit(ctxt)
stop
\end { minted}
\else
\begin { center}
\begin { minipage} { .90\textwidth }
{ \small
\begin { verbatim}
call desc_ a%cnv(mold=igmold)
call a%cscnv(info,mold=agmold)
call psb_ geasb(x,desc_ a,info,mold=vgmold)
call psb_ geasb(b,desc_ a,info,mold=vgmold)
!
! iterative method parameters
!
call psb_ barrier(ctxt)
call prec%allocate_wrk(info)
t1 = psb_ wtime()
call psb_ krylov(s_ choice%kmethd,a,prec,b,x,s_choice%eps,&
& desc_ a,info,itmax=s_ choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,&
& istop=s_ choice%istopc,irst=s_choice%irst)
call prec%deallocate_wrk(info)
call psb_ barrier(ctxt)
tslv = psb_ wtime() - t1
......
call psb_ gpu_ exit()
call psb_ exit(ctxt)
stop
\end { verbatim}
}
\end { minipage}
\end { center}
\fi
\caption { setup of a GPU-enabled test program part three.\label { fig:gpu-ex3} }
\end { listing}
It is very important to employ smoothers and coarsest solvers that are suited
to the GPU, i.e. methods that do NOT employ triangular
system solve kernels. Methods that satisfy this constraint include:
\begin { itemize}
\item \verb |JACOBI|
\item \verb |BJAC| with the following methods on the local blocks:
\begin { itemize}
\item \verb |INVK|
\item \verb |INVT|
\item \verb |AINV|
\end { itemize}
\end { itemize}
and their $ \ell _ 1 $ variants.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "userguide"
%%% End: