You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
126 lines
6.9 KiB
HTML
126 lines
6.9 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
|
|
"http://www.w3.org/TR/html4/loose.dtd">
|
|
<html >
|
|
<head><title>General overview</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<meta name="generator" content="TeX4ht (https://tug.org/tex4ht/)">
|
|
<meta name="originator" content="TeX4ht (https://tug.org/tex4ht/)">
|
|
<!-- html,3 -->
|
|
<meta name="src" content="userhtml.tex">
|
|
<link rel="stylesheet" type="text/css" href="userhtml.css">
|
|
</head><body
|
|
>
|
|
<!--l. 72--><div class="crosslinks"><p class="noindent">[<a
|
|
href="userhtmlsu7.html" >next</a>] [<a
|
|
href="userhtmlse1.html" >prev</a>] [<a
|
|
href="userhtmlse1.html#tailuserhtmlse1.html" >prev-tail</a>] [<a
|
|
href="#tailuserhtmlse2.html">tail</a>] [<a
|
|
href="userhtml.html#userhtmlse2.html" >up</a>] </p></div>
|
|
<h3 class="sectionHead"><span class="titlemark">2 </span> <a
|
|
id="x4-30002"></a>General overview</h3>
|
|
<!--l. 74--><p class="noindent" >The PSBLAS library is designed to handle the implementation of iterative solvers for
|
|
sparse linear systems on distributed memory parallel computers. The system
|
|
coefficient matrix <span
|
|
class="cmmi-10">A </span>must be square; it may be real or complex, nonsymmetric, and
|
|
its sparsity pattern needs not to be symmetric. The serial computation parts are
|
|
based on the serial sparse BLAS, so that any extension made to the data structures
|
|
of the serial kernels is available to the parallel version. The overall design and
|
|
parallelization strategy have been influenced by the structure of the ScaLAPACK
|
|
parallel library. The layered structure of the PSBLAS library is shown in figure <a
|
|
href="#x4-3001r1">1<!--tex4ht:ref: fig:psblas --></a>;
|
|
lower layers of the library indicate an encapsulation relationship with upper
|
|
layers. The ongoing discussion focuses on the Fortran 2003 layer immediately
|
|
below the application layer. The serial parts of the computation on each
|
|
process are executed through calls to the serial sparse BLAS subroutines. In a
|
|
similar way, the inter-process message exchanges are encapsulated in an
|
|
applicaiton layer that has been strongly inspired by the Basic Linear Algebra
|
|
Communication Subroutines (BLACS) library <span class="cite">[<a
|
|
href="userhtmlli2.html#XBLACS">7</a>]</span>. Usually there is no need to deal
|
|
directly with MPI; however, in some cases, MPI routines are used directly
|
|
to improve efficiency. For further details on our communication layer see
|
|
Sec. <a
|
|
href="userhtmlse7.html#x68-1050007">7<!--tex4ht:ref: sec:parenv --></a>.
|
|
<!--l. 101--><p class="indent" > <hr class="figure"><div class="figure"
|
|
>
|
|
|
|
|
|
|
|
<a
|
|
id="x4-3001r1"></a>
|
|
|
|
|
|
|
|
<div class="center"
|
|
>
|
|
<!--l. 102--><p class="noindent" >
|
|
<!--l. 104--><p class="noindent" ><img
|
|
src="psblas.png" alt="PIC"
|
|
width="46" height="46" ></div>
|
|
<br /> <div class="caption"
|
|
><span class="id">Figure 1: </span><span
|
|
class="content">PSBLAS library components hierarchy.</span></div><!--tex4ht:label?: x4-3001r1 -->
|
|
|
|
|
|
|
|
<!--l. 110--><p class="indent" > </div><hr class="endfigure">
|
|
<!--l. 113--><p class="indent" > The type of linear system matrices that we address typically arise in
|
|
the numerical solution of PDEs; in such a context, it is necessary to pay
|
|
special attention to the structure of the problem from which the application
|
|
originates. The nonzero pattern of a matrix arising from the discretization of a
|
|
PDE is influenced by various factors, such as the shape of the domain, the
|
|
discretization strategy, and the equation/unknown ordering. The matrix itself can be
|
|
interpreted as the adjacency matrix of the graph associated with the discretization
|
|
mesh.
|
|
<!--l. 124--><p class="indent" > The distribution of the coefficient matrix for the linear system is based on the
|
|
“owner computes” rule: the variable associated to each mesh point is assigned to a
|
|
process that will own the corresponding row in the coefficient matrix and will
|
|
carry out all related computations. This allocation strategy is equivalent to a
|
|
partition of the discretization mesh into <span
|
|
class="cmti-10">sub-domains</span>. Our library supports any
|
|
distribution that keeps together the coefficients of each matrix row; there are no
|
|
other constraints on the variable assignment. This choice is consistent with
|
|
simple data distributions such as <span class="obeylines-h"><span class="verb"><span
|
|
class="cmtt-10">CYCLIC(N)</span></span></span> and <span class="obeylines-h"><span class="verb"><span
|
|
class="cmtt-10">BLOCK</span></span></span>, as well as completely
|
|
arbitrary assignments of equation indices to processes. In particular it is
|
|
consistent with the usage of graph partitioning tools commonly available in
|
|
the literature, e.g. METIS <span class="cite">[<a
|
|
href="userhtmlli2.html#XMETIS">14</a>]</span>. Dense vectors conform to sparse matrices,
|
|
that is, the entries of a vector follow the same distribution of the matrix
|
|
rows.
|
|
<!--l. 146--><p class="indent" > We assume that the sparse matrix is built in parallel, where each process generates
|
|
its own portion. We never require that the entire matrix be available on a single
|
|
node. However, it is possible to hold the entire matrix in one process and distribute it
|
|
explicitly<span class="footnote-mark"><a
|
|
href="userhtml5.html#fn1x0"><sup class="textsuperscript">1</sup></a></span><a
|
|
id="x4-3002f1"></a> ,
|
|
even though the resulting memory bottleneck would make this option unattractive in
|
|
most cases.
|
|
<div class="subsectionTOCS">
|
|
 <span class="subsectionToc" >2.1 <a
|
|
href="userhtmlsu1.html#x6-40002.1">Basic Nomenclature</a></span>
|
|
<br />  <span class="subsectionToc" >2.2 <a
|
|
href="userhtmlsu2.html#x8-50002.2">Library contents</a></span>
|
|
<br />  <span class="subsectionToc" >2.3 <a
|
|
href="userhtmlsu3.html#x9-60002.3">Application structure</a></span>
|
|
<br />   <span class="subsubsectionToc" >2.3.1 <a
|
|
href="userhtmlsu3.html#x9-70002.3.1">User-defined index mappings</a></span>
|
|
<br />  <span class="subsectionToc" >2.4 <a
|
|
href="userhtmlsu4.html#x11-80002.4">Programming model</a></span>
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--l. 1--><div class="crosslinks"><p class="noindent">[<a
|
|
href="userhtmlsu7.html" >next</a>] [<a
|
|
href="userhtmlse1.html" >prev</a>] [<a
|
|
href="userhtmlse1.html#tailuserhtmlse1.html" >prev-tail</a>] [<a
|
|
href="userhtmlse2.html" >front</a>] [<a
|
|
href="userhtml.html#userhtmlse2.html" >up</a>] </p></div>
|
|
<!--l. 1--><p class="indent" > <a
|
|
id="tailuserhtmlse2.html"></a>
|
|
</body></html>
|