<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html > <head><title>General Overview</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="generator" content="TeX4ht (https://tug.org/tex4ht/)"> <meta name="originator" content="TeX4ht (https://tug.org/tex4ht/)"> <!-- html,3 --> <meta name="src" content="userhtml.tex"> <link rel="stylesheet" type="text/css" href="userhtml.css"> </head><body > <!--l. 1--><div class="crosslinks"><p class="noindent"><span class="cmr-12">[</span><a href="userhtmlse2.html" ><span class="cmr-12">next</span></a><span class="cmr-12">] [</span><a href="userhtmlli2.html" ><span class="cmr-12">prev</span></a><span class="cmr-12">] [</span><a href="userhtmlli2.html#tailuserhtmlli2.html" ><span class="cmr-12">prev-tail</span></a><span class="cmr-12">] [</span><a href="#tailuserhtmlse1.html"><span class="cmr-12">tail</span></a><span class="cmr-12">] [</span><a href="userhtml.html#userhtmlse1.html" ><span class="cmr-12">up</span></a><span class="cmr-12">] </span></p></div> <h3 class="sectionHead"><span class="titlemark"><span class="cmr-12">1 </span></span> <a id="x4-30001"></a><span class="cmr-12">General Overview</span></h3> <!--l. 5--><p class="noindent" ><span class="cmr-12">The </span><span class="cmcsc-10x-x-120">A<span class="small-caps">l</span><span class="small-caps">g</span><span class="small-caps">e</span><span class="small-caps">b</span><span class="small-caps">r</span><span class="small-caps">a</span><span class="small-caps">i</span><span class="small-caps">c</span> M<span class="small-caps">u</span><span class="small-caps">l</span><span class="small-caps">t</span><span class="small-caps">i</span>G<span class="small-caps">r</span><span class="small-caps">i</span><span class="small-caps">d</span> P<span class="small-caps">r</span><span class="small-caps">e</span><span class="small-caps">c</span><span class="small-caps">o</span><span class="small-caps">n</span><span class="small-caps">d</span><span class="small-caps">i</span><span class="small-caps">t</span><span class="small-caps">i</span><span class="small-caps">o</span><span class="small-caps">n</span><span class="small-caps">e</span><span class="small-caps">r</span><span class="small-caps">s</span> P<span class="small-caps">a</span><span class="small-caps">c</span><span class="small-caps">k</span><span class="small-caps">a</span><span class="small-caps">g</span><span class="small-caps">e</span> <span class="small-caps">b</span><span class="small-caps">a</span><span class="small-caps">s</span><span class="small-caps">e</span><span class="small-caps">d</span> <span class="small-caps">o</span><span class="small-caps">n</span> PSBLAS</span> <span class="cmr-12">(</span><span class="cmcsc-10x-x-120">AMG4PSBLAS</span><span class="cmr-12">) provides parallel Algebraic MultiGrid (AMG) preconditioners (see,</span> <span class="cmr-12">e.g., </span><span class="cite"><span class="cmr-12">[</span><a href="userhtmlli5.html#XBriggs2000"><span class="cmr-12">5</span></a><span class="cmr-12">,</span><span class="cmr-12"> </span><a href="userhtmlli5.html#XStuben_01"><span class="cmr-12">31</span></a><span class="cmr-12">]</span></span><span class="cmr-12">), to be used in the iterative solution of linear systems,</span> <table class="equation"><tr><td> <center class="math-display" > <img src="userhtml0x.png" alt="Ax = b, " class="math-display" ><a id="x4-3001r1"></a></center></td><td class="equation-label"><span class="cmr-12">(1)</span></td></tr></table> <!--l. 11--><p class="nopar" > <span class="cmr-12">where </span><span class="cmmi-12">A </span><span class="cmr-12">is a square, real or complex, sparse symmetric positive definite (s.p.d)</span> <span class="cmr-12">matrix.</span> <!--l. 19--><p class="indent" > <span class="cmr-12">The preconditioners implemented in AMG4PSBLAS are obtained by combining 3</span> <span class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. Available</span> <span class="cmr-12">multigrid cycles include the V-, W-, and a version of a Krylov-type cycle</span> <span class="cmr-12">(K-cycle)</span><span class="cmr-12"> </span><span class="cite"><span class="cmr-12">[</span><a href="userhtmlli5.html#XBriggs2000"><span class="cmr-12">5</span></a><span class="cmr-12">,</span><span class="cmr-12"> </span><a href="userhtmlli5.html#XNotay2008"><span class="cmr-12">27</span></a><span class="cmr-12">]</span></span><span class="cmr-12">; they can be combined with Jacobi, hybrid forward/backward</span> <span class="cmr-12">Gauss-Seidel, block-Jacobi and additive Schwarz smoothers with various versions of</span> <span class="cmr-12">local incomplete factorizations and approximate inverses on the blocks. The</span> <span class="cmr-12">Jacobi, block-Jacobi and Gauss-Seidel smoothers are also available in the </span><span class="cmmi-12">ℓ</span><sub><span class="cmr-8">1</span></sub> <span class="cmr-12">version</span><span class="cmr-12"> </span><span class="cite"><span class="cmr-12">[</span><a href="userhtmlli5.html#XDDF2020"><span class="cmr-12">14</span></a><span class="cmr-12">]</span></span><span class="cmr-12">.</span> <!--l. 30--><p class="indent" > <span class="cmr-12">An algebraic approach is used to generate a hierarchy of coarse-level matrices and</span> <span class="cmr-12">operators, without explicitly using any information on the geometry of the original</span> <span class="cmr-12">problem, e.g., the discretization of a PDE. To this end, two different coarsening</span> <span class="cmr-12">strategies, based on aggregation, are available:</span> <ul class="itemize1"> <li class="itemize"><span class="cmr-12">a decoupled version of the smoothed aggregation procedure proposed in</span><span class="cmr-12"> </span><span class="cite"><span class="cmr-12">[</span><a href="userhtmlli5.html#XBREZINA_VANEK"><span class="cmr-12">4</span></a><span class="cmr-12">,</span> <span class="cmr-12"> </span><a href="userhtmlli5.html#XVANEK_MANDEL_BREZINA"><span class="cmr-12">33</span></a><span class="cmr-12">]</span></span><span class="cmr-12">, and already included in the previous versions of the package</span><span class="cmr-12"> </span><span class="cite"><span class="cmr-12">[</span><a href="userhtmlli5.html#Xaaecc_07"><span class="cmr-12">7</span></a><span class="cmr-12">,</span><span class="cmr-12"> </span><a href="userhtmlli5.html#XMLD2P4_TOMS"><span class="cmr-12">11</span></a><span class="cmr-12">]</span></span><span class="cmr-12">;</span> </li> <li class="itemize"><span class="cmr-12">a coupled, parallel implementation of the Coarsening based on Compatible</span> <span class="cmr-12">Weighted Matching introduced in</span><span class="cmr-12"> </span><span class="cite"><span class="cmr-12">[</span><a href="userhtmlli5.html#XDV2013"><span class="cmr-12">12</span></a><span class="cmr-12">,</span><span class="cmr-12"> </span><a href="userhtmlli5.html#XDFV2018"><span class="cmr-12">13</span></a><span class="cmr-12">]</span></span> <span class="cmr-12">and described in detail in</span><span class="cmr-12"> </span><span class="cite"><span class="cmr-12">[</span><a href="userhtmlli5.html#XDDF2020"><span class="cmr-12">14</span></a><span class="cmr-12">]</span></span><span class="cmr-12">;</span></li></ul> <!--l. 43--><p class="noindent" ><span class="cmr-12">Either exact or approximate solvers can be used on the coarsest-level system. We provide</span> <span class="cmr-12">interfaces to various parallel and sequential sparse LU factorizations from external</span> <span class="cmr-12">packages, sequential native incomplete LU and approximate inverse factorizations,</span> <span class="cmr-12">parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solvers and calls to</span> <span class="cmr-12">preconditioned Krylov methods; all smoothers can be also exploited as one-level</span> <span class="cmr-12">preconditioners.</span> <!--l. 50--><p class="indent" > <span class="cmr-12">AMG4PSBLAS is written in Fortran</span><span class="cmr-12"> 2003, following an object-oriented design</span> <span class="cmr-12">through the exploitation of features such as abstract data type creation, type</span> <span class="cmr-12">extension, functional overloading, and dynamic memory management. The parallel</span> <span class="cmr-12">implementation is based on a Single Program Multiple Data (SPMD) paradigm.</span> <span class="cmr-12">Single and double precision implementations of AMG4PSBLAS are available</span> <span class="cmr-12">for both the real and the complex case, which can be used through a single</span> <span class="cmr-12">interface.</span> <!--l. 60--><p class="indent" > <span class="cmr-12">AMG4PSBLAS has been designed to implement scalable and easy-to-use</span> <span class="cmr-12">multilevel preconditioners in the context of the PSBLAS (Parallel Sparse BLAS)</span> <span class="cmr-12">computational framework</span><span class="cmr-12"> </span><span class="cite"><span class="cmr-12">[</span><a href="userhtmlli5.html#Xpsblas_00"><span class="cmr-12">22</span></a><span class="cmr-12">,</span><span class="cmr-12"> </span><a href="userhtmlli5.html#XPSBLAS3"><span class="cmr-12">21</span></a><span class="cmr-12">]</span></span><span class="cmr-12">. PSBLAS provides basic linear algebra operators</span> <span class="cmr-12">and data management facilities for distributed sparse matrices, kernels for</span> <span class="cmr-12">sequential incomplete factorizations needed for the parallel block-Jacobi and</span> <span class="cmr-12">additive Schwarz smoothers, and parallel Krylov solvers which can be used with</span> <span class="cmr-12">the AMG4PSBLAS preconditioners. The choice of PSBLAS has been mainly</span> <span class="cmr-12">motivated by the need of having a portable and efficient software infrastructure</span> <span class="cmr-12">implementing “de facto” standard parallel sparse linear algebra kernels, to</span> <span class="cmr-12">pursue goals such as performance, portability, modularity ed extensibility</span> <span class="cmr-12">in the development of the preconditioner package. On the other hand, the</span> <span class="cmr-12">implementation of AMG4PSBLAS, which was driven by the need to face the exascale</span> <span class="cmr-12">challenge, has led to some important revisions and extentions of the PSBLAS</span> <span class="cmr-12">infrastructure. The inter-process comunication required by AMG4PSBLAS</span> <span class="cmr-12">is encapsulated in the PSBLAS routines; therefore, AMG4PSBLAS can be</span> <span class="cmr-12">run on any parallel machine where PSBLAS implementations are available.</span> <span class="cmr-12">In the most recent version of PSBLAS (release 3.7), a plug-in for GPU is</span> <span class="cmr-12">included; it includes CUDA versions of main vector operations and of sparse</span> <span class="cmr-12">matrix-vector multiplication, so that Krylov methods coupled with AMG4PSBLAS</span> <span class="cmr-12">preconditioners relying on Jacobi and block-Jacobi smoothers with sparse</span> <span class="cmr-12">approximate inverses on the blocks can be efficiently executed on cluster of</span> <span class="cmr-12">GPUs.</span> <!--l. 85--><p class="indent" > <span class="cmr-12">AMG4PSBLAS has a layered and modular software architecture where three main</span> <span class="cmr-12">layers can be identified. The lower layer consists of the PSBLAS kernels, the middle</span> <span class="cmr-12">one implements the construction and application phases of the preconditioners, and the</span> <span class="cmr-12">upper one provides a uniform interface to all the preconditioners. This architecture</span> <span class="cmr-12">allows for different levels of use of the package: few black-box routines at the upper</span> <span class="cmr-12">layer allow all users to easily build and apply any preconditioner available in</span> <span class="cmr-12">AMG4PSBLAS; facilities are also available allowing expert users to extend the set of</span> <span class="cmr-12">smoothers and solvers for building new versions of the preconditioners (see</span> <span class="cmr-12">Section</span><span class="cmr-12"> </span><a href="userhtmlse6.html#x27-310006"><span class="cmr-12">6</span><!--tex4ht:ref: sec:adding --></a><span class="cmr-12">).</span> <!--l. 96--><p class="indent" > <span class="cmr-12">This guide is organized as follows. General information on the distribution of the</span> <span class="cmr-12">source code is reported in Section</span><span class="cmr-12"> </span><a href="userhtmlse2.html#x5-40002"><span class="cmr-12">2</span><!--tex4ht:ref: sec:distribution --></a><span class="cmr-12">, while details on the configuration and installation</span> <span class="cmr-12">of the package are given in Section</span><span class="cmr-12"> </span><a href="userhtmlse3.html#x8-70003"><span class="cmr-12">3</span><!--tex4ht:ref: sec:building --></a><span class="cmr-12">. The basics for building and applying the</span> <span class="cmr-12">preconditioners with the Krylov solvers implemented in PSBLAS are reported</span> <span class="cmr-12">in</span><span class="cmr-12"> Section</span><span class="cmr-12"> </span><a href="userhtmlse4.html#x14-130004"><span class="cmr-12">4</span><!--tex4ht:ref: sec:started --></a><span class="cmr-12">, where the Fortran codes of a few sample programs are also shown.</span> <span class="cmr-12">A reference guide for the user interface routines is provided in Section</span><span class="cmr-12"> </span><a href="userhtmlse5.html#x17-160005"><span class="cmr-12">5</span><!--tex4ht:ref: sec:userinterface --></a><span class="cmr-12">.</span> <span class="cmr-12">Information on the extension of the package through the addition of new</span> <span class="cmr-12">smoothers and solvers is reported in Section</span><span class="cmr-12"> </span><a href="userhtmlse6.html#x27-310006"><span class="cmr-12">6</span><!--tex4ht:ref: sec:adding --></a><span class="cmr-12">. The error handling mechanism</span> <span class="cmr-12">used by the package is briefly described in Section</span><span class="cmr-12"> </span><a href="userhtmlse7.html#x28-320007"><span class="cmr-12">7</span><!--tex4ht:ref: sec:errors --></a><span class="cmr-12">. The copyright terms</span> <span class="cmr-12">concerning the distribution and modification of AMG4PSBLAS are reported in</span> <span class="cmr-12">Appendix</span><span class="cmr-12"> </span><a href="userhtmlse8.html#x29-33000A"><span class="cmr-12">A</span><!--tex4ht:ref: sec:license --></a><span class="cmr-12">.</span> <!--l. 1--><div class="crosslinks"><p class="noindent"><span class="cmr-12">[</span><a href="userhtmlse2.html" ><span class="cmr-12">next</span></a><span class="cmr-12">] [</span><a href="userhtmlli2.html" ><span class="cmr-12">prev</span></a><span class="cmr-12">] [</span><a href="userhtmlli2.html#tailuserhtmlli2.html" ><span class="cmr-12">prev-tail</span></a><span class="cmr-12">] [</span><a href="userhtmlse1.html" ><span class="cmr-12">front</span></a><span class="cmr-12">] [</span><a href="userhtml.html#userhtmlse1.html" ><span class="cmr-12">up</span></a><span class="cmr-12">] </span></p></div> <!--l. 1--><p class="indent" > <a id="tailuserhtmlse1.html"></a> </body></html>