Getting Started

This section describes the basics for building and applying AMG4PSBLAS one-level and multilevel (i.e., AMG) preconditioners with the Krylov solvers included in PSBLAS [21].

All the previous routines are available as methods of the preconditioner object. A detailed description of them is given in Section 5. Examples showing the basic use of AMG4PSBLAS are reported in Section 4.1.


type	string	default preconditioner

No preconditioner	`’NONE’`	Considered to use the PSBLAS Krylov solvers with no preconditioner.

Diagonal	`’DIAG’`, `’JACOBI’`, `’L1-JACOBI’`	Diagonal preconditioner. For any zero diagonal entry of the matrix to be preconditioned, the corresponding entry of the preconditioner is set to 1.

Gauss-Seidel	`’GS’`, `’L1-GS’`	Hybrid Gauss-Seidel (forward), that is, global block Jacobi with Gauss-Seidel as local solver.

Symmetrized Gauss-Seidel	`’FBGS’`, `’L1-FBGS’`	Symmetrized hybrid Gauss-Seidel, that is, forward Gauss-Seidel followed by backward Gauss-Seidel.

Block Jacobi	`’BJAC’`, `’L1-BJAC’`	Block-Jacobi with ILU(0) on the local blocks.

Additive Schwarz	`’AS’`	Additive Schwarz (AS), with overlap 1 and ILU(0) on the local blocks.

Multilevel	`’ML’`	V-cycle with one hybrid forward Gauss-Seidel (GS) sweep as pre-smoother and one hybrid backward GS sweep as post-smoother, decoupled smoothed aggregation as coarsening algorithm, and LU (plus triangular solve) as coarsest-level solver. See the default values in Tables 2-8 for further details of the preconditioner.

Table 1: Preconditioner types, corresponding strings and default choices.

Note that the module amg_prec_mod, containing the definition of the preconditioner data type and the interfaces to the routines of AMG4PSBLAS, must be used in any program calling such routines. The modules psb_base_mod, for the sparse matrix and communication descriptor data types, and psb_krylov_mod, for interfacing with the Krylov solvers, must be also used (see Section 4.1).

Remark 1. Coarsest-level solvers based on the LU factorization, such as those implemented in UMFPACK, MUMPS, SuperLU, and SuperLU_Dist, usually lead to smaller numbers of preconditioned Krylov iterations than inexact solvers, when the linear system comes from a standard discretization of basic scalar elliptic PDE problems. However, this does not necessarily correspond to the shortest execution time on parallel computers.

4.1 Examples

The code reported in Figure 1 shows how to set and apply the default multilevel preconditioner available in the real double precision version of AMG4PSBLAS (see Table 1). This preconditioner is chosen by simply specifying ’ML’ as the second argument of P%init (a call to P%set is not needed) and is applied with the CG solver provided by PSBLAS (the matrix of the system to be solved is assumed to be positive definite). As previously observed, the modules psb_base_mod, amg_prec_mod and psb_krylov_mod must be used by the example program.

The part of the code dealing with reading and assembling the sparse matrix and the right-hand side vector and the deallocation of the relevant data structures, performed through the PSBLAS routines for sparse matrix and vector management, is not reported here for the sake of conciseness. The complete code can be found in the example program file amg_dexample_ml.f90, in the directory samples/simple/fileread of the AMG4PSBLAS implementation (see Section 3.5). A sample test problem along with the relevant input data is available in samples/simple/fileread/runs. For details on the use of the PSBLAS routines, see the PSBLAS User’s Guide [21].

The setup and application of the default multilevel preconditioner for the real single precision and the complex, single and double precision, versions are obtained with straightforward modifications of the previous example (see Section 5 for details). If these versions are installed, the corresponding codes are available in samples/simple/fileread.

  use psb_base_mod
  use amg_prec_mod
  use psb_krylov_mod
... ...
!
! sparse matrix
  type(psb_dspmat_type) :: A
! sparse matrix descriptor
  type(psb_desc_type)   :: desc_A
! preconditioner
  type(amg_dprec_type)  :: P
! right-hand side and solution vectors
  type(psb_d_vect_type) :: b, x
... ...
!
! initialize the parallel environment
  call psb_init(ctxt)
  call psb_info(ctxt,iam,np)
... ...
!
! read and assemble the spd matrix A and the right-hand side b
! using PSBLAS routines for sparse matrix / vector management
... ...
!
! initialize the default multilevel preconditioner, i.e. V-cycle
! with basic smoothed aggregation, 1 hybrid forward/backward
! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
! solver
  call P%init(ctxt,’ML’,info)
!
! build the preconditioner
  call P%hierarchy_build(A,desc_A,info)
  call P%smoothers_build(A,desc_A,info)

!
! set the solver parameters and the initial guess
  ... ...
!
! solve Ax=b with preconditioned FCG
  call psb_krylov(’FCG’,A,P,b,x,tol,desc_A,info)
  ... ...
!
! deallocate the preconditioner
  call P%free(info)
!
! deallocate other data structures
  ... ...
!
! exit the parallel environment
  call psb_exit(ctxt)
  stop

Listing 1: setup and application of the default multilevel preconditioner (example 1).

Different versions of the multilevel preconditioner can be obtained by changing the default values of the preconditioner parameters. The code reported in Figure 2 shows how to set a V-cycle preconditioner which applies 1 block-Jacobi sweep as pre- and post-smoother, and solves the coarsest-level system with 8 block-Jacobi sweeps. Note that the ILU(0) factorization (plus triangular solve) is used as local solver for the block-Jacobi sweeps, since this is the default associated with block-Jacobi and set by P%init. Furthermore, specifying block-Jacobi as coarsest-level solver implies that the coarsest-level matrix is distributed among the processes. Figure 3 shows how to set a W-cycle preconditioner using the Coarsening based on Compatible Weighted Matching, aggregates of size at most 8 and smoothed prolongators. It applies 2 hybrid Gauss-Seidel sweeps as pre- and post-smoother, and solves the coarsest-level system with the parallel flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi preconditioner having ILU(0) on the blocks. Default parameters are used for stopping criterion of the coarsest solver. Note that, also in this case, specifying KRM as coarsest-level solver implies that the coarsest-level matrix is distributed among the processes.

The code fragments shown in Figures 2 and 3 are included in the example program file amg_dexample_ml.f90 too.

Finally, Figure 4 shows the setup of a one-level additive Schwarz preconditioner, i.e., RAS with overlap 2. Note also that a Krylov method different from CG must be used to solve the preconditioned system, since the preconditione in nonsymmetric. The corresponding example program is available in the file amg_dexample_1lev.f90.

For all the previous preconditioners, example programs where the sparse matrix and the right-hand side are generated by discretizing a PDE with Dirichlet boundary conditions are also available in the directory samples/simple/pdegen.

... ...
! build a V-cycle preconditioner with 1 block-Jacobi sweep (with
! ILU(0) on the blocks) as pre- and post-smoother, and 8  block-Jacobi
! sweeps (with ILU(0) on the blocks) as coarsest-level solver
  call P%init(ctxt,’ML’,info)
  call P%set(’SMOOTHER_TYPE’,’BJAC’,info)
  call P%set(’COARSE_SOLVE’,’BJAC’,info)
  call P%set(’COARSE_SWEEPS’,8,info)
  call P%hierarchy_build(A,desc_A,info)
  call P%smoothers_build(A,desc_A,info)
... ...

Listing 2: setup of a multilevel preconditioner based on the default decoupled coarsening

... ...
! build a W-cycle preconditioner with 2 hybrid Gauss-Seidel sweeps
! as pre- and post-smoother, a distributed coarsest
! matrix, and MUMPS as coarsest-level solver
  call P%init(ctxt,’ML’,info)
  call P%set(’PAR_AGGR_ALG’,’COUPLED’,info)
  call P%set(’AGGR_TYPE’,’MATCHBOXP’,info)
  call P%set(’AGGR_SIZE’,8,info)
  call P%set(’ML_CYCLE’,’WCYCLE’,info)
  call P%set(’SMOOTHER_TYPE’,’FBGS’,info)
  call P%set(’SMOOTHER_SWEEPS’,2,info)
  call P%set(’COARSE_SOLVE’,’KRM’,info)
  call P%set(’COARSE_MAT’,’DIST’,info)
  call P%set(’KRM_METHOD’,’FCG’,info)
  call P%hierarchy_build(A,desc_A,info)
  call P%smoothers_build(A,desc_A,info)
... ...

Listing 3: setup of a multilevel preconditioner based on the coupled coarsening using weighted matching

4.2 GPU example

The code discussed here shows how to set up a program exploiting the combined GPU capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the source distribution directory amg4psblas/examples/gpu.

First of all, we need to include the appropriate modules and declare some auxiliary variables:

In this particular example we are choosing to employ a HLG data structure for sparse matrices on GPUs; for more information please refer to the PSBLAS-EXT users’ guide.

We then have to initialize the GPU environment, and pass the appropriate MOLD variables to the build methods (see also the PSBLAS and PSBLAS-EXT users’ guides).

Finally, we convert the input matrix, the descriptor and the vectors to use a GPU-enabled internal storage format. We then preallocate the preconditioner workspace before entering the Krylov method. At the end of the code, we close the GPU environment

  call desc_a%cnv(mold=igmold)
  call a%cscnv(info,mold=agmold)
  call psb_geasb(x,desc_a,info,mold=vgmold)
  call psb_geasb(b,desc_a,info,mold=vgmold)

  !
  ! iterative method parameters
  !
  call psb_barrier(ctxt)
  call prec%allocate_wrk(info)
  t1 = psb_wtime()
  call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,&
       & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,&
       & istop=s_choice%istopc,irst=s_choice%irst)
  call prec%deallocate_wrk(info)
  call psb_barrier(ctxt)
  tslv = psb_wtime() - t1

  ......
  call psb_gpu_exit()
  call psb_exit(ctxt)
  stop

Listing 7: setup of a GPU-enabled test program part three.

It is very important to employ smoothers and coarsest solvers that are suited to the GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that satisfy this constraint include:

4 Getting Started

4.1 Examples

4.2 GPU example