The code discussed here shows how to set up a program exploiting the combined GPU capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the source distribution directory amg4psblas/examples/gpu.
First of all, we need to include the appropriate modules and declare some auxiliary variables:
program amg_dexample_gpu use psb_base_mod use amg_prec_mod use psb_krylov_mod use psb_util_mod use psb_gpu_mod use data_input use amg_d_pde_mod implicit none ....... ! GPU variables type(psb_d_hlg_sparse_mat) :: agmold type(psb_d_vect_gpu) :: vgmold type(psb_i_vect_gpu) :: igmold
In this particular example we are choosing to employ a HLG data structure for sparse matrices on GPUs; for more information please refer to the PSBLAS-EXT users’ guide.
We then have to initialize the GPU environment, and pass the appropriate MOLD variables to the build methods (see also the PSBLAS and PSBLAS-EXT users’ guides).
call psb_init(ctxt) call psb_info(ctxt,iam,np) ! ! BEWARE: if you have NGPUS per node, the default is to ! attach to mod(IAM,NGPUS) ! call psb_gpu_init(ictxt) ...... t1 = psb_wtime() call prec%smoothers_build(a,desc_a,info, amold=agmold, vmold=vgmold, imold=igmold)
Finally, we convert the input matrix, the descriptor and the vectors to use a GPU-enabled internal storage format. We then preallocate the preconditioner workspace before entering the Krylov method. At the end of the code, we close the GPU environment
call desc_a%cnv(mold=igmold) call a%cscnv(info,mold=agmold) call psb_geasb(x,desc_a,info,mold=vgmold) call psb_geasb(b,desc_a,info,mold=vgmold) ! ! iterative method parameters ! call psb_barrier(ctxt) call prec%allocate_wrk(info) t1 = psb_wtime() call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,& & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,& & istop=s_choice%istopc,irst=s_choice%irst) call prec%deallocate_wrk(info) call psb_barrier(ctxt) tslv = psb_wtime() - t1 ...... call psb_gpu_exit() call psb_exit(ctxt) stop
It is very important to employ solvers that are suited to the GPU, i.e. solvers that do NOT employ triangular system solve kernels. Solvers that satisfy this constraint include:
and their ℓ1 variants.