|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
|
|
|
|
|
|
<!--Converted with LaTeX2HTML 2008 (1.71)
|
|
|
|
original version by: Nikos Drakos, CBLU, University of Leeds
|
|
|
|
* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan
|
|
|
|
* with significant contributions from:
|
|
|
|
Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
|
|
|
|
<HTML>
|
|
|
|
<HEAD>
|
|
|
|
<TITLE>Application structure</TITLE>
|
|
|
|
<META NAME="description" CONTENT="Application structure">
|
|
|
|
<META NAME="keywords" CONTENT="userhtml">
|
|
|
|
<META NAME="resource-type" CONTENT="document">
|
|
|
|
<META NAME="distribution" CONTENT="global">
|
|
|
|
|
|
|
|
<META NAME="Generator" CONTENT="LaTeX2HTML v2008">
|
|
|
|
<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">
|
|
|
|
|
|
|
|
<LINK REL="STYLESHEET" HREF="userhtml.css">
|
|
|
|
|
|
|
|
<LINK REL="next" HREF="node7.html">
|
|
|
|
<LINK REL="previous" HREF="node5.html">
|
|
|
|
<LINK REL="up" HREF="node3.html">
|
|
|
|
<LINK REL="next" HREF="node7.html">
|
|
|
|
</HEAD>
|
|
|
|
|
|
|
|
<BODY >
|
|
|
|
<!--Navigation Panel-->
|
|
|
|
<A NAME="tex2html234"
|
|
|
|
HREF="node7.html">
|
|
|
|
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="next.png"></A>
|
|
|
|
<A NAME="tex2html230"
|
|
|
|
HREF="node3.html">
|
|
|
|
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="up.png"></A>
|
|
|
|
<A NAME="tex2html224"
|
|
|
|
HREF="node5.html">
|
|
|
|
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="prev.png"></A>
|
|
|
|
<A NAME="tex2html232"
|
|
|
|
HREF="node1.html">
|
|
|
|
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="contents.png"></A>
|
|
|
|
<BR>
|
|
|
|
<B> Next:</B> <A NAME="tex2html235"
|
|
|
|
HREF="node7.html">Programming model</A>
|
|
|
|
<B> Up:</B> <A NAME="tex2html231"
|
|
|
|
HREF="node3.html">General overview</A>
|
|
|
|
<B> Previous:</B> <A NAME="tex2html225"
|
|
|
|
HREF="node5.html">Library contents</A>
|
|
|
|
<B> <A NAME="tex2html233"
|
|
|
|
HREF="node1.html">Contents</A></B>
|
|
|
|
<BR>
|
|
|
|
<BR>
|
|
|
|
<!--End of Navigation Panel-->
|
|
|
|
|
|
|
|
<H2><A NAME="SECTION00033000000000000000">
|
|
|
|
Application structure</A>
|
|
|
|
</H2>
|
|
|
|
|
|
|
|
<P>
|
|
|
|
The main underlying principle of the PSBLAS library is that the
|
|
|
|
library objects are created and exist with reference to a discretized
|
|
|
|
space to which there corresponds an index space and a matrix sparsity
|
|
|
|
pattern. As an example, consider a cell-centered finite-volume
|
|
|
|
discretization of the Navier-Stokes equations on a simulation domain;
|
|
|
|
the index space <IMG
|
|
|
|
WIDTH="46" HEIGHT="14" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img15.png"
|
|
|
|
ALT="$1\dots n$"> is isomorphic to the set of cell centers,
|
|
|
|
whereas the pattern of the associated linear system matrix is
|
|
|
|
isomorphic to the adjacency graph imposed on the discretization mesh
|
|
|
|
by the discretization stencil.
|
|
|
|
|
|
|
|
<P>
|
|
|
|
Thus the first order of business is to establish an index space, and
|
|
|
|
this is done with a call to <code>psb_cdall</code> in which we specify the
|
|
|
|
size of the index space <IMG
|
|
|
|
WIDTH="13" HEIGHT="14" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img16.png"
|
|
|
|
ALT="$n$"> and the allocation of the elements of the
|
|
|
|
index space to the various processes making up the MPI (virtual)
|
|
|
|
parallel machine.
|
|
|
|
|
|
|
|
<P>
|
|
|
|
The index space is partitioned among processes, and this creates a
|
|
|
|
mapping from the ``global'' numbering <IMG
|
|
|
|
WIDTH="46" HEIGHT="14" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img15.png"
|
|
|
|
ALT="$1\dots n$"> to a numbering
|
|
|
|
``local'' to each process; each process <IMG
|
|
|
|
WIDTH="9" HEIGHT="17" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img4.png"
|
|
|
|
ALT="$i$"> will own a certain subset
|
|
|
|
<!-- MATH
|
|
|
|
$1\dots n_{\hbox{row}_i}$
|
|
|
|
-->
|
|
|
|
<IMG
|
|
|
|
WIDTH="77" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
|
|
|
|
SRC="img17.png"
|
|
|
|
ALT="$1\dots n_{\hbox{row}_i}$">, each element of which corresponds to a certain
|
|
|
|
element of <IMG
|
|
|
|
WIDTH="46" HEIGHT="14" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img15.png"
|
|
|
|
ALT="$1\dots n$">. The user does not set explicitly this mapping;
|
|
|
|
when the application needs to indicate to which element of the index
|
|
|
|
space a certain item is related, such as the row and column index of a
|
|
|
|
matrix coefficient, it does so in the ``global'' numbering, and the
|
|
|
|
library will translate into the appropriate ``local'' numbering.
|
|
|
|
|
|
|
|
<P>
|
|
|
|
For a given index space <IMG
|
|
|
|
WIDTH="46" HEIGHT="14" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img15.png"
|
|
|
|
ALT="$1\dots n$"> there are many possible associated
|
|
|
|
topologies, i.e. many different discretization stencils; thus the
|
|
|
|
description of the index space is not completed until the user has
|
|
|
|
defined a sparsity pattern, either explicitly through <code>psb_cdins</code>
|
|
|
|
or implicitly through <code>psb_spins</code>. The descriptor is finalized
|
|
|
|
with a call to <code>psb_cdasb</code> and a sparse matrix with a call to
|
|
|
|
<code>psb_spasb</code>. After <code>psb_cdasb</code> each process <IMG
|
|
|
|
WIDTH="9" HEIGHT="17" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img4.png"
|
|
|
|
ALT="$i$"> will have
|
|
|
|
defined a set of ``halo'' (or ``ghost'') indices
|
|
|
|
<!-- MATH
|
|
|
|
$n_{\hbox{row}_i}+1\dots n_{\hbox{col}_i}$
|
|
|
|
-->
|
|
|
|
<IMG
|
|
|
|
WIDTH="130" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
|
|
|
|
SRC="img18.png"
|
|
|
|
ALT="$n_{\hbox{row}_i}+1\dots n_{\hbox{col}_i}$">, denoting elements of the index
|
|
|
|
space that are <I>not</I> assigned to process <IMG
|
|
|
|
WIDTH="9" HEIGHT="17" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img4.png"
|
|
|
|
ALT="$i$">; however the
|
|
|
|
variables associated with them are needed to complete computations
|
|
|
|
associated with the sparse matrix <IMG
|
|
|
|
WIDTH="16" HEIGHT="14" ALIGN="BOTTOM" BORDER="0"
|
|
|
|
SRC="img1.png"
|
|
|
|
ALT="$A$">, and thus they have to be
|
|
|
|
fetched from (neighbouring) processes. The descriptor of the index
|
|
|
|
space is built exactly for the purpose of properly sequencing the
|
|
|
|
communication steps required to achieve this objective.
|
|
|
|
|
|
|
|
<P>
|
|
|
|
A simple application structure will walk through the index space
|
|
|
|
allocation, matrix/vector creation and linear system solution as
|
|
|
|
follows:
|
|
|
|
|
|
|
|
<OL>
|
|
|
|
<LI>Initialize parallel environment with <code>psb_init</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Initialize index space with <code>psb_cdall</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Allocate sparse matrix and dense vectors with <code>psb_spall</code>
|
|
|
|
and <code>psb_geall</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Loop over all local rows, generate matrix and vector entries,
|
|
|
|
and insert them with <code>psb_spins</code> and <code>psb_geins</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Assemble the various entities:
|
|
|
|
|
|
|
|
<OL>
|
|
|
|
<LI><code>psb_cdasb</code>
|
|
|
|
</LI>
|
|
|
|
<LI><code>psb_spasb</code>
|
|
|
|
</LI>
|
|
|
|
<LI><code>psb_geasb</code>
|
|
|
|
</LI>
|
|
|
|
</OL>
|
|
|
|
</LI>
|
|
|
|
<LI>Choose the preconditioner to be used with <code>psb_precset</code> and
|
|
|
|
build it with <code>psb_precbld</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Call the iterative method of choice, e.g. <code>psb_bicgstab</code>
|
|
|
|
</LI>
|
|
|
|
</OL>
|
|
|
|
This is the structure of the sample program
|
|
|
|
<code>test/pargen/ppde90.f90</code>.
|
|
|
|
|
|
|
|
<P>
|
|
|
|
For a simulation in which the same discretization mesh is used over
|
|
|
|
multiple time steps, the following structure may be more appropriate:
|
|
|
|
|
|
|
|
<OL>
|
|
|
|
<LI>Initialize parallel environment with <code>psb_init</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Initialize index space with <code>psb_cdall</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Loop over the topology of the discretization mesh and build the
|
|
|
|
descriptor with <code>psb_cdins</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Assemble the descriptor with <code>psb_cdasb</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Allocate the sparse matrices and dense vectors with
|
|
|
|
<code>psb_spall</code> and <code>psb_geall</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Loop over the time steps:
|
|
|
|
|
|
|
|
<OL>
|
|
|
|
<LI>If after first time step,
|
|
|
|
reinitialize the sparse matrix with <code>psb_sprn</code>; also zero out
|
|
|
|
the dense vectors;
|
|
|
|
</LI>
|
|
|
|
<LI>Loop over the mesh, generate the coefficients and insert/update
|
|
|
|
them with <code>psb_spins</code> and <code>psb_geins</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Assemble with <code>psb_spasb</code> and <code>psb_geasb</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Choose and build preconditioner with <code>psb_precset</code> and
|
|
|
|
<code>psb_precbld</code>
|
|
|
|
</LI>
|
|
|
|
<LI>Call the iterative method of choice, e.g. <code>psb_bicgstab</code>
|
|
|
|
|
|
|
|
</LI>
|
|
|
|
</OL>
|
|
|
|
</LI>
|
|
|
|
</OL>
|
|
|
|
The insertion routines will be called as many times as needed;
|
|
|
|
they only need to be called on the data that is actually
|
|
|
|
allocated to the current process, i.e. each process generates its own
|
|
|
|
data.
|
|
|
|
|
|
|
|
<P>
|
|
|
|
In principle there is no specific order in the calls to
|
|
|
|
<code>psb_spins</code>, nor is there a requirement to build a matrix row in
|
|
|
|
its entirety before calling the routine; this allows the application
|
|
|
|
programmer to walk through the discretization mesh element by element,
|
|
|
|
generating the main part of a given matrix row but also contributions
|
|
|
|
to the rows corresponding to neighbouring elements.
|
|
|
|
|
|
|
|
<P>
|
|
|
|
From a functional point of view it is even possible to execute one
|
|
|
|
call for each nonzero coefficient; however this would have a
|
|
|
|
substantial computational overhead. It is therefore advisable to pack
|
|
|
|
a certain amount of data into each call to the insertion routine, say
|
|
|
|
touching on a few tens of rows; the best performng value would depend
|
|
|
|
on both the architecture of the computer being used and on the problem
|
|
|
|
structure.
|
|
|
|
At the opposite extreme, it would be possible to generate the entire
|
|
|
|
part of a coefficient matrix residing on a process and pass it in a
|
|
|
|
single call to <code>psb_spins</code>; this, however, would entail a
|
|
|
|
doubling of memory occupation, and thus would be almost always far
|
|
|
|
from optimal.
|
|
|
|
|
|
|
|
<P>
|
|
|
|
<HR>
|
|
|
|
<!--Navigation Panel-->
|
|
|
|
<A NAME="tex2html234"
|
|
|
|
HREF="node7.html">
|
|
|
|
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="next.png"></A>
|
|
|
|
<A NAME="tex2html230"
|
|
|
|
HREF="node3.html">
|
|
|
|
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="up.png"></A>
|
|
|
|
<A NAME="tex2html224"
|
|
|
|
HREF="node5.html">
|
|
|
|
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="prev.png"></A>
|
|
|
|
<A NAME="tex2html232"
|
|
|
|
HREF="node1.html">
|
|
|
|
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="contents.png"></A>
|
|
|
|
<BR>
|
|
|
|
<B> Next:</B> <A NAME="tex2html235"
|
|
|
|
HREF="node7.html">Programming model</A>
|
|
|
|
<B> Up:</B> <A NAME="tex2html231"
|
|
|
|
HREF="node3.html">General overview</A>
|
|
|
|
<B> Previous:</B> <A NAME="tex2html225"
|
|
|
|
HREF="node5.html">Library contents</A>
|
|
|
|
<B> <A NAME="tex2html233"
|
|
|
|
HREF="node1.html">Contents</A></B>
|
|
|
|
<!--End of Navigation Panel-->
|
|
|
|
|
|
|
|
</BODY>
|
|
|
|
</HTML>
|