|
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
|
|
|
|
"http://www.w3.org/TR/html4/loose.dtd">
|
|
|
|
<html >
|
|
|
|
<head><title>General Overview</title>
|
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
|
|
<meta name="generator" content="TeX4ht (http://www.tug.org/tex4ht/)">
|
|
|
|
<meta name="originator" content="TeX4ht (http://www.tug.org/tex4ht/)">
|
|
|
|
<!-- html,3 -->
|
|
|
|
<meta name="src" content="userhtml.tex">
|
|
|
|
<link rel="stylesheet" type="text/css" href="userhtml.css">
|
|
|
|
</head><body
|
|
|
|
>
|
|
|
|
<!--l. 1--><div class="crosslinks"><p class="noindent"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlse2.html" ><span
|
|
|
|
class="cmr-12">next</span></a><span
|
|
|
|
class="cmr-12">] [</span><a
|
|
|
|
href="userhtmlli2.html" ><span
|
|
|
|
class="cmr-12">prev</span></a><span
|
|
|
|
class="cmr-12">] [</span><a
|
|
|
|
href="userhtmlli2.html#tailuserhtmlli2.html" ><span
|
|
|
|
class="cmr-12">prev-tail</span></a><span
|
|
|
|
class="cmr-12">] [</span><a
|
|
|
|
href="#tailuserhtmlse1.html"><span
|
|
|
|
class="cmr-12">tail</span></a><span
|
|
|
|
class="cmr-12">] [</span><a
|
|
|
|
href="userhtml.html#userhtmlse1.html" ><span
|
|
|
|
class="cmr-12">up</span></a><span
|
|
|
|
class="cmr-12">] </span></p></div>
|
|
|
|
<h3 class="sectionHead"><span class="titlemark"><span
|
|
|
|
class="cmr-12">1 </span></span> <a
|
|
|
|
id="x4-30001"></a><span
|
|
|
|
class="cmr-12">General Overview</span></h3>
|
|
|
|
<!--l. 5--><p class="noindent" ><span
|
|
|
|
class="cmr-12">The </span><span
|
|
|
|
class="cmcsc-10x-x-120">A<span
|
|
|
|
class="small-caps">l</span><span
|
|
|
|
class="small-caps">g</span><span
|
|
|
|
class="small-caps">e</span><span
|
|
|
|
class="small-caps">b</span><span
|
|
|
|
class="small-caps">r</span><span
|
|
|
|
class="small-caps">a</span><span
|
|
|
|
class="small-caps">i</span><span
|
|
|
|
class="small-caps">c</span> M<span
|
|
|
|
class="small-caps">u</span><span
|
|
|
|
class="small-caps">l</span><span
|
|
|
|
class="small-caps">t</span><span
|
|
|
|
class="small-caps">i</span>G<span
|
|
|
|
class="small-caps">r</span><span
|
|
|
|
class="small-caps">i</span><span
|
|
|
|
class="small-caps">d</span> P<span
|
|
|
|
class="small-caps">r</span><span
|
|
|
|
class="small-caps">e</span><span
|
|
|
|
class="small-caps">c</span><span
|
|
|
|
class="small-caps">o</span><span
|
|
|
|
class="small-caps">n</span><span
|
|
|
|
class="small-caps">d</span><span
|
|
|
|
class="small-caps">i</span><span
|
|
|
|
class="small-caps">t</span><span
|
|
|
|
class="small-caps">i</span><span
|
|
|
|
class="small-caps">o</span><span
|
|
|
|
class="small-caps">n</span><span
|
|
|
|
class="small-caps">e</span><span
|
|
|
|
class="small-caps">r</span><span
|
|
|
|
class="small-caps">s</span> P<span
|
|
|
|
class="small-caps">a</span><span
|
|
|
|
class="small-caps">c</span><span
|
|
|
|
class="small-caps">k</span><span
|
|
|
|
class="small-caps">a</span><span
|
|
|
|
class="small-caps">g</span><span
|
|
|
|
class="small-caps">e</span> <span
|
|
|
|
class="small-caps">b</span><span
|
|
|
|
class="small-caps">a</span><span
|
|
|
|
class="small-caps">s</span><span
|
|
|
|
class="small-caps">e</span><span
|
|
|
|
class="small-caps">d</span> <span
|
|
|
|
class="small-caps">o</span><span
|
|
|
|
class="small-caps">n</span> PSBLAS</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">(</span><span
|
|
|
|
class="cmcsc-10x-x-120">AMG4PSBLAS</span><span
|
|
|
|
class="cmr-12">) provides parallel Algebraic MultiGrid (AMG) preconditioners (see,</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">e.g., </span><span class="cite"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlli5.html#XBriggs2000"><span
|
|
|
|
class="cmr-12">4</span></a><span
|
|
|
|
class="cmr-12">,</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlli5.html#XStuben_01"><span
|
|
|
|
class="cmr-12">29</span></a><span
|
|
|
|
class="cmr-12">]</span></span><span
|
|
|
|
class="cmr-12">), to be used in the iterative solution of linear systems,</span>
|
|
|
|
<table
|
|
|
|
class="equation"><tr><td>
|
|
|
|
<center class="math-display" >
|
|
|
|
<img
|
|
|
|
src="userhtml0x.png" alt="Ax = b,
|
|
|
|
" class="math-display" ><a
|
|
|
|
id="x4-3001r1"></a></center></td><td class="equation-label"><span
|
|
|
|
class="cmr-12">(1)</span></td></tr></table>
|
|
|
|
<!--l. 11--><p class="nopar" >
|
|
|
|
<span
|
|
|
|
class="cmr-12">where </span><span
|
|
|
|
class="cmmi-12">A </span><span
|
|
|
|
class="cmr-12">is a square, real or complex, sparse symmetric positive definite (s.p.d)</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">matrix.</span>
|
|
|
|
<!--l. 19--><p class="indent" > <span
|
|
|
|
class="cmr-12">The preconditioners implemented in AMG4PSBLAS are obtained by combining 3</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. Available</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">multigrid cycles include the V-, W-, and a version of a Krylov-type cycle</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">(K-cycle)</span><span
|
|
|
|
class="cmr-12"> </span><span class="cite"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlli5.html#XBriggs2000"><span
|
|
|
|
class="cmr-12">4</span></a><span
|
|
|
|
class="cmr-12">,</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlli5.html#XNotay2008"><span
|
|
|
|
class="cmr-12">25</span></a><span
|
|
|
|
class="cmr-12">]</span></span><span
|
|
|
|
class="cmr-12">; they can be combined with Jacobi, hybrid forward/backward</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">Gauss-Seidel, block-Jacobi and additive Schwarz smoothers with various versions of</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">local incomplete factorizations and approximate inverses on the blocks. The</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">Jacobi, block-Jacobi and Gauss-Seidel smoothers are also available in the </span><span
|
|
|
|
class="cmmi-12">ℓ</span><sub><span
|
|
|
|
class="cmr-8">1</span></sub>
|
|
|
|
<span
|
|
|
|
class="cmr-12">version</span><span
|
|
|
|
class="cmr-12"> </span><span class="cite"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlli5.html#XDDF2020"><span
|
|
|
|
class="cmr-12">13</span></a><span
|
|
|
|
class="cmr-12">]</span></span><span
|
|
|
|
class="cmr-12">.</span>
|
|
|
|
<!--l. 30--><p class="indent" > <span
|
|
|
|
class="cmr-12">An algebraic approach is used to generate a hierarchy of coarse-level matrices and</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">operators, without explicitly using any information on the geometry of the original</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">problem, e.g., the discretization of a PDE. To this end, two different coarsening</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">strategies, based on aggregation, are available:</span>
|
|
|
|
<ul class="itemize1">
|
|
|
|
<li class="itemize"><span
|
|
|
|
class="cmr-12">a decoupled version of the smoothed aggregation procedure proposed in</span><span
|
|
|
|
class="cmr-12"> </span><span class="cite"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlli5.html#XBREZINA_VANEK"><span
|
|
|
|
class="cmr-12">3</span></a><span
|
|
|
|
class="cmr-12">,</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlli5.html#XVANEK_MANDEL_BREZINA"><span
|
|
|
|
class="cmr-12">31</span></a><span
|
|
|
|
class="cmr-12">]</span></span><span
|
|
|
|
class="cmr-12">, and already included in the previous versions of the package</span><span
|
|
|
|
class="cmr-12"> </span><span class="cite"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlli5.html#Xaaecc_07"><span
|
|
|
|
class="cmr-12">6</span></a><span
|
|
|
|
class="cmr-12">,</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlli5.html#XMLD2P4_TOMS"><span
|
|
|
|
class="cmr-12">10</span></a><span
|
|
|
|
class="cmr-12">]</span></span><span
|
|
|
|
class="cmr-12">;</span>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</li>
|
|
|
|
<li class="itemize"><span
|
|
|
|
class="cmr-12">a coupled, parallel implementation of the Coarsening based on Compatible</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">Weighted Matching introduced in</span><span
|
|
|
|
class="cmr-12"> </span><span class="cite"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlli5.html#XDV2013"><span
|
|
|
|
class="cmr-12">11</span></a><span
|
|
|
|
class="cmr-12">,</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlli5.html#XDFV2018"><span
|
|
|
|
class="cmr-12">12</span></a><span
|
|
|
|
class="cmr-12">]</span></span> <span
|
|
|
|
class="cmr-12">and described in detail in</span><span
|
|
|
|
class="cmr-12"> </span><span class="cite"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlli5.html#XDDF2020"><span
|
|
|
|
class="cmr-12">13</span></a><span
|
|
|
|
class="cmr-12">]</span></span><span
|
|
|
|
class="cmr-12">;</span></li></ul>
|
|
|
|
<!--l. 43--><p class="noindent" ><span
|
|
|
|
class="cmr-12">Either exact or approximate solvers can be used on the coarsest-level system. We provide</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">interfaces to various parallel and sequential sparse LU factorizations from external</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">packages, sequential native incomplete LU and approximate inverse factorizations,</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solvers and calls to</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">preconditioned Krylov methods; all smoothers can be also exploited as one-level</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">preconditioners.</span>
|
|
|
|
<!--l. 50--><p class="indent" > <span
|
|
|
|
class="cmr-12">AMG4PSBLAS is written in Fortran</span><span
|
|
|
|
class="cmr-12"> 2003, following an object-oriented design</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">through the exploitation of features such as abstract data type creation, type</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">extension, functional overloading, and dynamic memory management. The parallel</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">implementation is based on a Single Program Multiple Data (SPMD) paradigm.</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">Single and double precision implementations of AMG4PSBLAS are available</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">for both the real and the complex case, which can be used through a single</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">interface.</span>
|
|
|
|
<!--l. 60--><p class="indent" > <span
|
|
|
|
class="cmr-12">AMG4PSBLAS has been designed to implement scalable and easy-to-use</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">multilevel preconditioners in the context of the PSBLAS (Parallel Sparse BLAS)</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">computational framework</span><span
|
|
|
|
class="cmr-12"> </span><span class="cite"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlli5.html#Xpsblas_00"><span
|
|
|
|
class="cmr-12">20</span></a><span
|
|
|
|
class="cmr-12">,</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlli5.html#XPSBLAS3"><span
|
|
|
|
class="cmr-12">19</span></a><span
|
|
|
|
class="cmr-12">]</span></span><span
|
|
|
|
class="cmr-12">. PSBLAS provides basic linear algebra operators</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">and data management facilities for distributed sparse matrices, kernels for</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">sequential incomplete factorizations needed for the parallel block-Jacobi and</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">additive Schwarz smoothers, and parallel Krylov solvers which can be used with</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">the AMG4PSBLAS preconditioners. The choice of PSBLAS has been mainly</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">motivated by the need of having a portable and efficient software infrastructure</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">implementing “de facto” standard parallel sparse linear algebra kernels, to</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">pursue goals such as performance, portability, modularity ed extensibility</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">in the development of the preconditioner package. On the other hand, the</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">implementation of AMG4PSBLAS, which was driven by the need to face the exascale</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">challenge, has led to some important revisions and extentions of the PSBLAS</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">infrastructure. The inter-process comunication required by AMG4PSBLAS</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">is encapsulated in the PSBLAS routines; therefore, AMG4PSBLAS can be</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">run on any parallel machine where PSBLAS implementations are available.</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">In the most recent version of PSBLAS (release 3.7), a plug-in for GPU is</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">included; it includes CUDA versions of main vector operations and of sparse</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">matrix-vector multiplication, so that Krylov methods coupled with AMG4PSBLAS</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">preconditioners relying on Jacobi and block-Jacobi smoothers with sparse</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">approximate inverses on the blocks can be efficiently executed on cluster of</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">GPUs.</span>
|
|
|
|
<!--l. 85--><p class="indent" > <span
|
|
|
|
class="cmr-12">AMG4PSBLAS has a layered and modular software architecture where three main</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">layers can be identified. The lower layer consists of the PSBLAS kernels, the middle</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">one implements the construction and application phases of the preconditioners, and the</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">upper one provides a uniform interface to all the preconditioners. This architecture</span>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<span
|
|
|
|
class="cmr-12">allows for different levels of use of the package: few black-box routines at the upper</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">layer allow all users to easily build and apply any preconditioner available in</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">AMG4PSBLAS; facilities are also available allowing expert users to extend the set of</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">smoothers and solvers for building new versions of the preconditioners (see</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">Section</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlse6.html#x26-300006"><span
|
|
|
|
class="cmr-12">6</span><!--tex4ht:ref: sec:adding --></a><span
|
|
|
|
class="cmr-12">).</span>
|
|
|
|
<!--l. 96--><p class="indent" > <span
|
|
|
|
class="cmr-12">This guide is organized as follows. General information on the distribution of the</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">source code is reported in Section</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlse2.html#x5-40002"><span
|
|
|
|
class="cmr-12">2</span><!--tex4ht:ref: sec:distribution --></a><span
|
|
|
|
class="cmr-12">, while details on the configuration and installation</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">of the package are given in Section</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlse3.html#x8-70003"><span
|
|
|
|
class="cmr-12">3</span><!--tex4ht:ref: sec:building --></a><span
|
|
|
|
class="cmr-12">. The basics for building and applying the</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">preconditioners with the Krylov solvers implemented in PSBLAS are reported</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">in</span><span
|
|
|
|
class="cmr-12"> Section</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlse4.html#x14-130004"><span
|
|
|
|
class="cmr-12">4</span><!--tex4ht:ref: sec:started --></a><span
|
|
|
|
class="cmr-12">, where the Fortran codes of a few sample programs are also shown.</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">A reference guide for the user interface routines is provided in Section</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlse5.html#x16-150005"><span
|
|
|
|
class="cmr-12">5</span><!--tex4ht:ref: sec:userinterface --></a><span
|
|
|
|
class="cmr-12">.</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">Information on the extension of the package through the addition of new</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">smoothers and solvers is reported in Section</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlse6.html#x26-300006"><span
|
|
|
|
class="cmr-12">6</span><!--tex4ht:ref: sec:adding --></a><span
|
|
|
|
class="cmr-12">. The error handling mechanism</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">used by the package is briefly described in Section</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlse7.html#x27-310007"><span
|
|
|
|
class="cmr-12">7</span><!--tex4ht:ref: sec:errors --></a><span
|
|
|
|
class="cmr-12">. The copyright terms</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">concerning the distribution and modification of AMG4PSBLAS are reported in</span>
|
|
|
|
<span
|
|
|
|
class="cmr-12">Appendix</span><span
|
|
|
|
class="cmr-12"> </span><a
|
|
|
|
href="userhtmlse8.html#x28-32000A"><span
|
|
|
|
class="cmr-12">A</span><!--tex4ht:ref: sec:license --></a><span
|
|
|
|
class="cmr-12">.</span>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--l. 1--><div class="crosslinks"><p class="noindent"><span
|
|
|
|
class="cmr-12">[</span><a
|
|
|
|
href="userhtmlse2.html" ><span
|
|
|
|
class="cmr-12">next</span></a><span
|
|
|
|
class="cmr-12">] [</span><a
|
|
|
|
href="userhtmlli2.html" ><span
|
|
|
|
class="cmr-12">prev</span></a><span
|
|
|
|
class="cmr-12">] [</span><a
|
|
|
|
href="userhtmlli2.html#tailuserhtmlli2.html" ><span
|
|
|
|
class="cmr-12">prev-tail</span></a><span
|
|
|
|
class="cmr-12">] [</span><a
|
|
|
|
href="userhtmlse1.html" ><span
|
|
|
|
class="cmr-12">front</span></a><span
|
|
|
|
class="cmr-12">] [</span><a
|
|
|
|
href="userhtml.html#userhtmlse1.html" ><span
|
|
|
|
class="cmr-12">up</span></a><span
|
|
|
|
class="cmr-12">] </span></p></div>
|
|
|
|
<!--l. 1--><p class="indent" > <a
|
|
|
|
id="tailuserhtmlse1.html"></a>
|
|
|
|
</body></html>
|