The code discussed here shows how to set up a program exploiting the combined GPU capabilities of PSBLAS and AMG4PSBLAS. The code example is availabile in the source distribution directory amg4psblas/tests/gpu.
First of all, we need to include the appropriate modules and declare some auxiliary variables:
program amg_d_pde3d use psb_base_mod use amg_prec_mod use psb_krylov_mod use psb_util_mod use psb_gpu_mod use data_input use amg_d_pde3d_base_mod use amg_d_pde3d_exp_mod use amg_d_pde3d_gauss_mod use amg_d_genpde_mod implicit none ....... ! GPU variables type(psb_d_hlg_sparse_mat) :: agmold type(psb_d_vect_gpu) :: vgmold type(psb_i_vect_gpu) :: igmold
We then have to initialize the GPU environment, and pass the appropriate MOLD variables to the build methods
call psb_init(ctxt) call psb_info(ctxt,iam,np) ! ! BEWARE: if you have NGPUS per node, the default is to ! attach to mod(IAM,NGPUS) ! call psb_gpu_init(ictxt) ...... t1 = psb_wtime() call prec%smoothers_build(a,desc_a,info, amold=agmold, vmold=vgmold, imold=igmold)
Finally, we convert the input matrix, the descriptor and the vectors, then preallocate the preconditioner workspace before entering the Krylov method. At the end of the code, we close the GPU environment
call desc_a%cnv(mold=igmold) call a%cscnv(info,mold=agmold) call psb_geasb(x,desc_a,info,mold=vgmold) call psb_geasb(b,desc_a,info,mold=vgmold) ! ! iterative method parameters ! call psb_barrier(ctxt) call prec%allocate_wrk(info) t1 = psb_wtime() call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,& & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,& & istop=s_choice%istopc,irst=s_choice%irst) call prec%deallocate_wrk(info) call psb_barrier(ctxt) tslv = psb_wtime() - t1 ...... call psb_gpu_exit() call psb_exit(ctxt) stop
It is very important to employ solvers that are suited to the GPU, i.e. solvers that do NOT employ triangular system solve kernels. Solvers that satisfy this constraint include:
and their L1 variants.