You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
610 lines
22 KiB
TeX
610 lines
22 KiB
TeX
\section{Getting Started\label{sec:started}}
|
|
\markboth{\textsc{AMG4PSBLAS User's and Reference Guide}}
|
|
{\textsc{\ref{sec:started} Getting Started}}
|
|
|
|
This section describes the basics for building and applying
|
|
AMG4PSBLAS one-level and multilevel (i.e., AMG) preconditioners with
|
|
the Krylov solvers included in PSBLAS~\cite{PSBLASGUIDE}.
|
|
|
|
The following steps are required:
|
|
\begin{enumerate}
|
|
\item \emph{Declare the preconditioner data structure}. It is a derived data type,
|
|
\verb|amg_|\-\emph{x}\verb|prec_| \verb|type|, where \emph{x} may be
|
|
\verb|s|, \verb|d|, \verb|c| or \verb|z|, according to the basic
|
|
data type of the sparse matrix (\verb|s| = real single precision;
|
|
\verb|d| = real double precision; \verb|c| = complex single
|
|
precision; \verb|z| = complex double precision). This data
|
|
structure is accessed by the user only through the AMG4PSBLAS
|
|
routines, following an object-oriented approach.
|
|
\item \emph{Allocate and initialize the preconditioner data structure,
|
|
according to a preconditioner type chosen by the user}. This is
|
|
performed by the routine \fortinline|init|, which also sets
|
|
defaults for each preconditioner type selected by the user. The
|
|
preconditioner types and the defaults associated with them are
|
|
given in Table~\ref{tab:precinit}, where the strings used by
|
|
\fortinline|init| to identify the preconditioner types are also
|
|
given. Note that these strings are valid also if uppercase
|
|
letters are substituted by corresponding lowercase ones.
|
|
|
|
\item \emph{Modify the selected preconditioner type, by properly
|
|
setting preconditioner parameters.} This is performed by the
|
|
routine \fortinline|set|. This routine must be called if the
|
|
user wants to modify the default values of the parameters
|
|
associated with the selected preconditioner type, to obtain a
|
|
variant of that preconditioner. Examples of use of
|
|
\fortinline|set| are given in Section~\ref{sec:examples}; a
|
|
complete list of all the preconditioner parameters and their
|
|
allowed and default values is provided in
|
|
Section~\ref{sec:userinterface},
|
|
Tables~\ref{tab:p_cycle}-\ref{tab:p_smoother_1}.
|
|
\item \emph{Build the preconditioner for a given matrix}. If the selected preconditioner
|
|
is multilevel, then two steps must be performed, as specified next.
|
|
\begin{enumerate}
|
|
\item[4.1] \emph{Build the AMG hierarchy for a given matrix.} This is
|
|
performed by the routine \fortinline|hierarchy_build|.
|
|
\item[4.2] \emph{Build the preconditioner for a given matrix.} This is performed
|
|
by the routine \fortinline|smoothers_build|.
|
|
\end{enumerate}
|
|
If the selected preconditioner is one-level, it is built in a single step,
|
|
performed by the routine \fortinline|bld|.
|
|
\item \emph{Apply the preconditioner at each iteration of a Krylov solver.}
|
|
This is performed by the method \fortinline|apply|. When using the PSBLAS Krylov solvers,
|
|
this step is completely transparent to the user, since \fortinline|apply| is called
|
|
by the PSBLAS routine implementing the Krylov solver (\fortinline|psb_krylov|).
|
|
\item \emph{Free the preconditioner data structure}. This is performed by
|
|
the routine \fortinline|free|. This step is complementary to step 1 and should
|
|
be performed when the preconditioner is no more used.
|
|
\end{enumerate}
|
|
|
|
All the previous routines are available as methods of the preconditioner object.
|
|
A detailed description of them is given in Section~\ref{sec:userinterface}.
|
|
Examples showing the basic use of AMG4PSBLAS are reported in Section~\ref{sec:examples}.
|
|
|
|
\begin{table}[h!]
|
|
\begin{center}
|
|
%{\small
|
|
\begin{tabular}{|l|p{2cm}|p{6.8cm}|}
|
|
\hline
|
|
\textsc{type} & \textsc{string} & \textsc{default preconditioner} \\ \hline
|
|
No preconditioner &\fortinline|'NONE'|& Considered to use the PSBLAS
|
|
Krylov solvers with no preconditioner. \\ \hline
|
|
Diagonal & \fortinline|'DIAG'|, \fortinline|'JACOBI'|, \fortinline|'L1-JACOBI'| & Diagonal preconditioner.
|
|
For any zero diagonal entry of the matrix to be preconditioned,
|
|
the corresponding entry of the preconditioner is set to~1.\\ \hline
|
|
Gauss-Seidel & \fortinline|'GS'|, \fortinline|'L1-GS'| & Hybrid Gauss-Seidel (forward), that is,
|
|
global block Jacobi with
|
|
Gauss-Seidel as local solver.\\ \hline
|
|
Symmetrized Gauss-Seidel & \fortinline|'FBGS'|, \fortinline|'L1-FBGS'| & Symmetrized hybrid Gauss-Seidel, that is,
|
|
forward Gauss-Seidel followed by
|
|
backward Gauss-Seidel.\\ \hline
|
|
Block Jacobi & \fortinline|'BJAC'|, \fortinline|'L1-BJAC'| & Block-Jacobi with ILU(0) on the local blocks.\\ \hline
|
|
Additive Schwarz & \fortinline|'AS'| & Additive Schwarz (AS),
|
|
with overlap~1 and ILU(0) on the local blocks. \\ \hline
|
|
Multilevel &\fortinline|'ML'| & V-cycle with one hybrid forward Gauss-Seidel
|
|
(GS) sweep as pre-smoother and one hybrid backward
|
|
GS sweep as post-smoother, decoupled smoothed aggregation
|
|
as coarsening algorithm, and LU (plus triangular solve)
|
|
as coarsest-level solver. See the default values in
|
|
Tables~\ref{tab:p_cycle}-\ref{tab:p_smoother_1}
|
|
for further details of the preconditioner. \\
|
|
\hline
|
|
\end{tabular}
|
|
%}
|
|
\caption{Preconditioner types, corresponding strings and default choices.
|
|
\label{tab:precinit}}
|
|
\end{center}
|
|
\end{table}
|
|
|
|
Note that the module \fortinline|amg_prec_mod|, containing the definition of the
|
|
preconditioner data type and the interfaces to the routines of AMG4PSBLAS,
|
|
must be used in any program calling such routines.
|
|
The modules \fortinline|psb_base_mod|, for the sparse matrix and communication descriptor
|
|
data types, and \fortinline|psb_krylov_mod|, for interfacing with the
|
|
Krylov solvers, must be also used (see Section~\ref{sec:examples}). \\
|
|
|
|
\textbf{Remark 1.} Coarsest-level solvers based on the LU factorization,
|
|
such as those implemented in UMFPACK, MUMPS, SuperLU, and SuperLU\_Dist,
|
|
usually lead to smaller numbers of preconditioned Krylov
|
|
iterations than inexact solvers, when the linear system comes from
|
|
a standard discretization of basic scalar elliptic PDE problems. However,
|
|
this does not necessarily correspond to the shortest execution time
|
|
on parallel~computers.
|
|
|
|
|
|
\subsection{Examples\label{sec:examples}}
|
|
|
|
The code reported in Figure~\ref{fig:ex1} shows how to set and apply the default
|
|
multilevel preconditioner available in the real double precision version
|
|
of AMG4PSBLAS (see Table~\ref{tab:precinit}). This preconditioner is chosen
|
|
by simply specifying \fortinline|'ML'| as the second argument of \fortinline|P%init|
|
|
(a call to \fortinline|P%set| is not needed) and is applied with the CG
|
|
solver provided by PSBLAS (the matrix of the system to be solved is
|
|
assumed to be positive definite). As previously observed, the modules
|
|
\fortinline|psb_base_mod|, \fortinline|amg_prec_mod| and \fortinline|psb_krylov_mod|
|
|
must be used by the example program.
|
|
|
|
The part of the code dealing with reading and assembling the sparse
|
|
matrix and the right-hand side vector and the deallocation of the
|
|
relevant data structures, performed
|
|
through the PSBLAS routines for sparse matrix and vector management, is not reported
|
|
here for the sake of conciseness.
|
|
The complete code can be found in the example program file \verb|amg_dexample_ml.f90|,
|
|
in the directory \verb|samples/simple/file|\-\verb|read| of the AMG4PSBLAS implementation (see
|
|
Section~\ref{sec:ex_and_test}). A sample test problem along with the relevant
|
|
input data is available in \verb|samples/simple/fileread/runs|.
|
|
For details on the use of the PSBLAS routines, see the PSBLAS User's
|
|
Guide~\cite{PSBLASGUIDE}.
|
|
|
|
The setup and application of the default multilevel preconditioner
|
|
for the real single precision and the complex, single and double
|
|
precision, versions are obtained with straightforward modifications of the previous
|
|
example (see Section~\ref{sec:userinterface} for details). If these versions are installed,
|
|
the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|.
|
|
|
|
\begin{listing}[tbp]
|
|
\begin{center}
|
|
\begin{minipage}{.90\textwidth}
|
|
\ifpdf
|
|
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
|
|
use psb_base_mod
|
|
use amg_prec_mod
|
|
use psb_krylov_mod
|
|
... ...
|
|
!
|
|
! sparse matrix
|
|
type(psb_dspmat_type) :: A
|
|
! sparse matrix descriptor
|
|
type(psb_desc_type) :: desc_A
|
|
! preconditioner
|
|
type(amg_dprec_type) :: P
|
|
! right-hand side and solution vectors
|
|
type(psb_d_vect_type) :: b, x
|
|
... ...
|
|
!
|
|
! initialize the parallel environment
|
|
call psb_init(ctxt)
|
|
call psb_info(ctxt,iam,np)
|
|
... ...
|
|
!
|
|
! read and assemble the spd matrix A and the right-hand side b
|
|
! using PSBLAS routines for sparse matrix / vector management
|
|
... ...
|
|
!
|
|
! initialize the default multilevel preconditioner, i.e. V-cycle
|
|
! with basic smoothed aggregation, 1 hybrid forward/backward
|
|
! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
|
|
! solver
|
|
call P%init(ctxt,'ML',info)
|
|
!
|
|
! build the preconditioner
|
|
call P%hierarchy_build(A,desc_A,info)
|
|
call P%smoothers_build(A,desc_A,info)
|
|
|
|
!
|
|
! set the solver parameters and the initial guess
|
|
... ...
|
|
!
|
|
! solve Ax=b with preconditioned FCG
|
|
call psb_krylov('FCG',A,P,b,x,tol,desc_A,info)
|
|
... ...
|
|
!
|
|
! deallocate the preconditioner
|
|
call P%free(info)
|
|
!
|
|
! deallocate other data structures
|
|
... ...
|
|
!
|
|
! exit the parallel environment
|
|
call psb_exit(ctxt)
|
|
stop
|
|
\end{minted}
|
|
\else
|
|
{\small
|
|
\begin{verbatim}
|
|
use psb_base_mod
|
|
use amg_prec_mod
|
|
use psb_krylov_mod
|
|
... ...
|
|
!
|
|
! sparse matrix
|
|
type(psb_dspmat_type) :: A
|
|
! sparse matrix descriptor
|
|
type(psb_desc_type) :: desc_A
|
|
! preconditioner
|
|
type(amg_dprec_type) :: P
|
|
! right-hand side and solution vectors
|
|
type(psb_d_vect_type) :: b, x
|
|
... ...
|
|
!
|
|
! initialize the parallel environment
|
|
call psb_init(ctxt)
|
|
call psb_info(ctxt,iam,np)
|
|
... ...
|
|
!
|
|
! read and assemble the spd matrix A and the right-hand side b
|
|
! using PSBLAS routines for sparse matrix / vector management
|
|
... ...
|
|
!
|
|
! initialize the default multilevel preconditioner, i.e. V-cycle
|
|
! with basic smoothed aggregation, 1 hybrid forward/backward
|
|
! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
|
|
! solver
|
|
call P%init(ctxt,'ML',info)
|
|
!
|
|
! build the preconditioner
|
|
call P%hierarchy_build(A,desc_A,info)
|
|
call P%smoothers_build(A,desc_A,info)
|
|
|
|
!
|
|
! set the solver parameters and the initial guess
|
|
... ...
|
|
!
|
|
! solve Ax=b with preconditioned FCG
|
|
call psb_krylov('FCG',A,P,b,x,tol,desc_A,info)
|
|
... ...
|
|
!
|
|
! deallocate the preconditioner
|
|
call P%free(info)
|
|
!
|
|
! deallocate other data structures
|
|
... ...
|
|
!
|
|
! exit the parallel environment
|
|
call psb_exit(ctxt)
|
|
stop
|
|
\end{verbatim}
|
|
}
|
|
\fi
|
|
\end{minipage}
|
|
\caption{setup and application of the default multilevel preconditioner (example 1).
|
|
\label{fig:ex1}}
|
|
\end{center}
|
|
\end{listing}
|
|
|
|
Different versions of the multilevel preconditioner can be obtained by changing
|
|
the default values of the preconditioner parameters. The code reported in
|
|
Figure~\ref{fig:ex2} shows how to set a V-cycle preconditioner
|
|
which applies 1 block-Jacobi sweep as pre- and post-smoother,
|
|
and solves the coarsest-level system with 8 block-Jacobi sweeps.
|
|
Note that the ILU(0) factorization (plus triangular solve) is used as
|
|
local solver for the block-Jacobi sweeps, since this is the default associated
|
|
with block-Jacobi and set by~\fortinline|P%init|.
|
|
Furthermore, specifying block-Jacobi as coarsest-level
|
|
solver implies that the coarsest-level matrix is distributed
|
|
among the processes.
|
|
Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using the Coarsening based on Compatible Weighted Matching, aggregates of size at most $8$ and smoothed prolongators. It applies
|
|
2 hybrid Gauss-Seidel sweeps as pre- and post-smoother,
|
|
and solves the coarsest-level system with the parallel flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi preconditioner having ILU(0) on the blocks. Default parameters are used for stopping criterion of the coarsest solver.
|
|
Note that, also in this case, specifying KRM as coarsest-level
|
|
solver implies that the coarsest-level matrix is distributed
|
|
among the processes.
|
|
%It is specified that the coarsest-level
|
|
|
|
%matrix is distributed, since MUMPS can be used on both
|
|
|
|
%replicated and distributed matrices, and by default
|
|
|
|
%it is used on replicated ones.
|
|
%Note the use of the parameter \fortinline|pos|
|
|
%to specify a property only for the pre-smoother or the post-smoother
|
|
%(see Section~\ref{sec:precset} for more details).
|
|
The code fragments shown in Figures~\ref{fig:ex2} and \ref{fig:ex3} are
|
|
included in the example program file \verb|amg_dexample_ml.f90| too.
|
|
|
|
Finally, Figure~\ref{fig:ex4} shows the setup of a one-level
|
|
additive Schwarz preconditioner, i.e., RAS with overlap 2.
|
|
Note also that a Krylov method different from CG must be used to solve
|
|
the preconditioned system, since the preconditione in nonsymmetric.
|
|
The corresponding example program is available in the file
|
|
\verb|amg_dexample_1lev.f90|.
|
|
|
|
For all the previous preconditioners, example programs where the sparse matrix and
|
|
the right-hand side are generated by discretizing a PDE with Dirichlet
|
|
boundary conditions are also available in the directory \verb|samples/simple/pdegen|.
|
|
\vspace{-1em}\begin{listing}[tbh]
|
|
\ifpdf%
|
|
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
|
|
! build a V-cycle preconditioner with 1 block-Jacobi sweep (with
|
|
! ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi
|
|
! sweeps (with ILU(0) on the blocks) as coarsest-level solver
|
|
call P%init(ctxt,'ML',info)
|
|
call P%set('SMOOTHER_TYPE','BJAC',info)
|
|
call P%set('COARSE_SOLVE','BJAC',info)
|
|
call P%set('COARSE_SWEEPS',8,info)
|
|
call P%hierarchy_build(A,desc_A,info)
|
|
call P%smoothers_build(A,desc_A,info)
|
|
\end{minted}
|
|
\else%
|
|
\begin{center}
|
|
\begin{minipage}{.90\textwidth}
|
|
{\small
|
|
\begin{verbatim}
|
|
... ...
|
|
! build a V-cycle preconditioner with 1 block-Jacobi sweep (with
|
|
! ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi
|
|
! sweeps (with ILU(0) on the blocks) as coarsest-level solver
|
|
call P%init(ctxt,'ML',info)
|
|
call P%set('SMOOTHER_TYPE','BJAC',info)
|
|
call P%set('COARSE_SOLVE','BJAC',info)
|
|
call P%set('COARSE_SWEEPS',8,info)
|
|
call P%hierarchy_build(A,desc_A,info)
|
|
call P%smoothers_build(A,desc_A,info)
|
|
... ...
|
|
\end{verbatim}
|
|
}
|
|
\end{minipage}
|
|
\end{center}
|
|
\fi\vspace{-2em}%
|
|
\caption{setup of a multilevel preconditioner based on the default decoupled coarsening\label{fig:ex2}}
|
|
\end{listing}\vspace*{-2em}
|
|
\begin{listing}[h!]
|
|
\ifpdf
|
|
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
|
|
!build a W-cycle using the coupled coarsening based on weighted matching,
|
|
!aggregates of size at most 8 and smoothed prolongators,
|
|
!2 hybrid Gauss-Seidel sweeps as pre- and post-smoother,
|
|
!and parallel flexible Conjugate Gradient coupled with the block-Jacobi
|
|
!preconditioner having ILU(0) on the blocks as coarsest solver.
|
|
call P%init(ctxt,'ML',info)
|
|
call P%set('PAR_AGGR_ALG','COUPLED',info)
|
|
call P%set('AGGR_TYPE','MATCHBOXP',info)
|
|
call P%set('AGGR_SIZE',8,info)
|
|
call P%set('ML_CYCLE','WCYCLE',info)
|
|
call P%set('SMOOTHER_TYPE','FBGS',info)
|
|
call P%set('SMOOTHER_SWEEPS',2,info)
|
|
call P%set('COARSE_SOLVE','KRM',info)
|
|
call P%set('COARSE_MAT','DIST',info)
|
|
call P%set('KRM_METHOD','FCG',info)
|
|
call P%hierarchy_build(A,desc_A,info)
|
|
call P%smoothers_build(A,desc_A,info)
|
|
\end{minted}
|
|
\else
|
|
\begin{center}
|
|
\begin{minipage}{.90\textwidth}
|
|
{\small
|
|
\begin{verbatim}
|
|
... ...
|
|
! build a W-cycle preconditioner with 2 hybrid Gauss-Seidel sweeps
|
|
! as pre- and post-smoother, a distributed coarsest
|
|
! matrix, and MUMPS as coarsest-level solver
|
|
call P%init(ctxt,'ML',info)
|
|
call P%set('PAR_AGGR_ALG','COUPLED',info)
|
|
call P%set('AGGR_TYPE','MATCHBOXP',info)
|
|
call P%set('AGGR_SIZE',8,info)
|
|
call P%set('ML_CYCLE','WCYCLE',info)
|
|
call P%set('SMOOTHER_TYPE','FBGS',info)
|
|
call P%set('SMOOTHER_SWEEPS',2,info)
|
|
call P%set('COARSE_SOLVE','KRM',info)
|
|
call P%set('COARSE_MAT','DIST',info)
|
|
call P%set('KRM_METHOD','FCG',info)
|
|
call P%hierarchy_build(A,desc_A,info)
|
|
call P%smoothers_build(A,desc_A,info)
|
|
... ...
|
|
\end{verbatim}
|
|
}
|
|
\end{minipage}
|
|
\end{center}
|
|
\fi\vspace{-2em}%
|
|
\caption{setup of a multilevel preconditioner based on the coupled coarsening using weighted matching\label{fig:ex3}}
|
|
\end{listing}\vspace*{-2em}
|
|
\begin{listing}[h!]
|
|
\ifpdf
|
|
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
|
|
! build a one-level RAS with overlap 2 and ILU(0) on the local blocks.
|
|
call P%init(ctxt,'AS',info)
|
|
call P%set('SUB_OVR',2,info)
|
|
call P%build(A,desc_A,info)
|
|
... ...
|
|
! solve Ax=b with preconditioned BiCGSTAB
|
|
call psb_krylov('BICGSTAB',A,P,b,x,tol,desc_A,info)
|
|
\end{minted}
|
|
\else
|
|
\begin{center}
|
|
\begin{minipage}{.90\textwidth}
|
|
{\small
|
|
\begin{verbatim}
|
|
... ...
|
|
! set RAS with overlap 2 and ILU(0) on the local blocks
|
|
call P%init(ctxt,'AS',info)
|
|
call P%set('SUB_OVR',2,info)
|
|
call P%bld(A,desc_A,info)
|
|
... ...
|
|
! solve Ax=b with preconditioned BiCGSTAB
|
|
call psb_krylov('BICGSTAB',A,P,b,x,tol,desc_A,info)
|
|
\end{verbatim}
|
|
}
|
|
\end{minipage}
|
|
\end{center}
|
|
\fi\vspace{-2em}%
|
|
\caption{setup of a one-level Schwarz preconditioner.\label{fig:ex4}}
|
|
\end{listing}
|
|
|
|
|
|
|
|
\subsection{GPU example\label{sec:gpu-example}}
|
|
|
|
The code discussed here shows how to set up a
|
|
program exploiting the combined GPU capabilities of PSBLAS and
|
|
AMG4PSBLAS. The code example is available in the source distribution
|
|
directory \verb|amg4psblas/examples/gpu|.
|
|
|
|
First of all, we need to include the appropriate modules and
|
|
declare some auxiliary variables:
|
|
\begin{listing}[h!]
|
|
\ifpdf
|
|
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
|
|
program amg_dexample_gpu
|
|
use psb_base_mod
|
|
use amg_prec_mod
|
|
use psb_krylov_mod
|
|
use psb_util_mod
|
|
use psb_gpu_mod
|
|
use data_input
|
|
use amg_d_pde_mod
|
|
implicit none
|
|
.......
|
|
! GPU variables
|
|
type(psb_d_hlg_sparse_mat) :: agmold
|
|
type(psb_d_vect_gpu) :: vgmold
|
|
type(psb_i_vect_gpu) :: igmold
|
|
\end{minted}
|
|
\else
|
|
\begin{center}
|
|
\begin{minipage}{.90\textwidth}
|
|
{\small
|
|
\begin{verbatim}
|
|
program amg_dexample_gpu
|
|
use psb_base_mod
|
|
use amg_prec_mod
|
|
use psb_krylov_mod
|
|
use psb_util_mod
|
|
use psb_gpu_mod
|
|
use data_input
|
|
use amg_d_pde_mod
|
|
implicit none
|
|
.......
|
|
! GPU variables
|
|
type(psb_d_hlg_sparse_mat) :: agmold
|
|
type(psb_d_vect_gpu) :: vgmold
|
|
type(psb_i_vect_gpu) :: igmold
|
|
|
|
\end{verbatim}
|
|
}
|
|
\end{minipage}
|
|
\end{center}
|
|
\fi
|
|
\caption{setup of a GPU-enabled test program part one.\label{fig:gpu-ex1}}
|
|
\end{listing}
|
|
In this particular example we are choosing to employ a \verb|HLG| data
|
|
structure for sparse matrices on GPUs; for more information please
|
|
refer to the PSBLAS-EXT users' guide.
|
|
|
|
We then have to initialize the GPU environment, and pass the
|
|
appropriate MOLD variables to the build methods (see also the PSBLAS
|
|
and PSBLAS-EXT users' guides).
|
|
\begin{listing}[h!]
|
|
\ifpdf
|
|
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
|
|
call psb_init(ctxt)
|
|
call psb_info(ctxt,iam,np)
|
|
!
|
|
! BEWARE: if you have NGPUS per node, the default is to
|
|
! attach to mod(IAM,NGPUS)
|
|
!
|
|
call psb_gpu_init(ictxt)
|
|
......
|
|
t1 = psb_wtime()
|
|
call prec%smoothers_build(a,desc_a,info, amold=agmold, vmold=vgmold, imold=igmold)
|
|
|
|
\end{minted}
|
|
\else
|
|
\begin{center}
|
|
\begin{minipage}{.90\textwidth}
|
|
{\small
|
|
\begin{verbatim}
|
|
call psb_init(ctxt)
|
|
call psb_info(ctxt,iam,np)
|
|
!
|
|
! BEWARE: if you have NGPUS per node, the default is to
|
|
! attach to mod(IAM,NGPUS)
|
|
!
|
|
call psb_gpu_init(ictxt)
|
|
......
|
|
t1 = psb_wtime()
|
|
call prec%smoothers_build(a,desc_a,info, amold=agmold, vmold=vgmold, imold=igmold)
|
|
|
|
\end{verbatim}
|
|
}
|
|
\end{minipage}
|
|
\end{center}
|
|
\fi
|
|
\caption{setup of a GPU-enabled test program part two.\label{fig:gpu-ex2}}
|
|
\end{listing}
|
|
Finally, we convert the input matrix, the descriptor and the vectors
|
|
to use a GPU-enabled internal storage format.
|
|
We then preallocate the preconditioner workspace before entering the
|
|
Krylov method. At the end of the code, we close the GPU environment
|
|
\begin{listing}[h!]
|
|
\ifpdf
|
|
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
|
|
call desc_a%cnv(mold=igmold)
|
|
call a%cscnv(info,mold=agmold)
|
|
call psb_geasb(x,desc_a,info,mold=vgmold)
|
|
call psb_geasb(b,desc_a,info,mold=vgmold)
|
|
|
|
!
|
|
! iterative method parameters
|
|
!
|
|
call psb_barrier(ctxt)
|
|
call prec%allocate_wrk(info)
|
|
t1 = psb_wtime()
|
|
call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,&
|
|
& desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,&
|
|
& itrace=s_choice%itrace,&
|
|
& istop=s_choice%istopc,irst=s_choice%irst)
|
|
call prec%deallocate_wrk(info)
|
|
call psb_barrier(ctxt)
|
|
tslv = psb_wtime() - t1
|
|
|
|
......
|
|
call psb_gpu_exit()
|
|
call psb_exit(ctxt)
|
|
stop
|
|
|
|
|
|
\end{minted}
|
|
\else
|
|
\begin{center}
|
|
\begin{minipage}{.90\textwidth}
|
|
{\small
|
|
\begin{verbatim}
|
|
call desc_a%cnv(mold=igmold)
|
|
call a%cscnv(info,mold=agmold)
|
|
call psb_geasb(x,desc_a,info,mold=vgmold)
|
|
call psb_geasb(b,desc_a,info,mold=vgmold)
|
|
|
|
!
|
|
! iterative method parameters
|
|
!
|
|
call psb_barrier(ctxt)
|
|
call prec%allocate_wrk(info)
|
|
t1 = psb_wtime()
|
|
call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,&
|
|
& desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,&
|
|
& istop=s_choice%istopc,irst=s_choice%irst)
|
|
call prec%deallocate_wrk(info)
|
|
call psb_barrier(ctxt)
|
|
tslv = psb_wtime() - t1
|
|
|
|
......
|
|
call psb_gpu_exit()
|
|
call psb_exit(ctxt)
|
|
stop
|
|
|
|
\end{verbatim}
|
|
}
|
|
\end{minipage}
|
|
\end{center}
|
|
\fi
|
|
\caption{setup of a GPU-enabled test program part three.\label{fig:gpu-ex3}}
|
|
\end{listing}
|
|
|
|
It is very important to employ smoothers and coarsest solvers that are suited
|
|
to the GPU, i.e. methods that do NOT employ triangular
|
|
system solve kernels. Methods that satisfy this constraint include:
|
|
\begin{itemize}
|
|
\item \verb|JACOBI|
|
|
\item \verb|BJAC| with the following methods on the local blocks:
|
|
\begin{itemize}
|
|
\item \verb|INVK|
|
|
\item \verb|INVT|
|
|
\item \verb|AINV|
|
|
\end{itemize}
|
|
\end{itemize}
|
|
and their $\ell_1$ variants.
|
|
|
|
%%% Local Variables:
|
|
%%% mode: latex
|
|
%%% TeX-master: "userguide"
|
|
%%% End:
|