4.2 GPU example

The code discussed here shows how to set up a program exploiting the combined GPU capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the source distribution directory amg4psblas/examples/gpu.

First of all, we need to include the appropriate modules and declare some auxiliary variables:


program amg_dexample_gpu  
  use psb_base_mod  
  use amg_prec_mod  
  use psb_krylov_mod  
  use psb_util_mod  
  use psb_gpu_mod  
  use data_input  
  use amg_d_pde_mod  
  implicit none  
  .......  
  ! GPU variables  
  type(psb_d_hlg_sparse_mat) :: agmold  
  type(psb_d_vect_gpu)       :: vgmold  
  type(psb_i_vect_gpu)       :: igmold  
 
 


Listing 5: setup of a GPU-enabled test program part one.

In this particular example we are choosing to employ a HLG data structure for sparse matrices on GPUs; for more information please refer to the PSBLAS-EXT users’ guide.

We then have to initialize the GPU environment, and pass the appropriate MOLD variables to the build methods (see also the PSBLAS and PSBLAS-EXT users’ guides).


  call psb_init(ctxt)  
  call psb_info(ctxt,iam,np)  
  !  
  ! BEWARE: if you have NGPUS  per node, the default is to  
  ! attach to mod(IAM,NGPUS)  
  !  
  call psb_gpu_init(ictxt)  
  ......  
  t1 = psb_wtime()  
  call prec%smoothers_build(a,desc_a,info, amold=agmold, vmold=vgmold, imold=igmold)  
 
 


Listing 6: setup of a GPU-enabled test program part two.

Finally, we convert the input matrix, the descriptor and the vectors to use a GPU-enabled internal storage format. We then preallocate the preconditioner workspace before entering the Krylov method. At the end of the code, we close the GPU environment


  call desc_a%cnv(mold=igmold)  
  call a%cscnv(info,mold=agmold)  
  call psb_geasb(x,desc_a,info,mold=vgmold)  
  call psb_geasb(b,desc_a,info,mold=vgmold)  
 
  !  
  ! iterative method parameters  
  !  
  call psb_barrier(ctxt)  
  call prec%allocate_wrk(info)  
  t1 = psb_wtime()  
  call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,&  
       & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,&  
       & istop=s_choice%istopc,irst=s_choice%irst)  
  call prec%deallocate_wrk(info)  
  call psb_barrier(ctxt)  
  tslv = psb_wtime() - t1  
 
  ......  
  call psb_gpu_exit()  
  call psb_exit(ctxt)  
  stop  
 
 


Listing 7: setup of a GPU-enabled test program part three.

It is very important to employ smoothers and coarsest solvers that are suited to the GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that satisfy this constraint include:

and their 1 variants.