You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
184 lines
9.3 KiB
TeX
184 lines
9.3 KiB
TeX
\section{Introduction}
|
|
|
|
The PSBLAS library, developed with the aim to facilitate the
|
|
parallelization of computationally intensive scientific applications,
|
|
is designed to address parallel implementation of iterative solvers
|
|
for sparse linear systems through the distributed memory paradigm. It
|
|
includes routines for multiplying sparse matrices by dense matrices,
|
|
solving block diagonal systems with triangular diagonal entries,
|
|
preprocessing sparse matrices, and contains additional routines for
|
|
dense matrix operations. The current implementation of PSBLAS
|
|
addresses a distributed memory execution model operating with message
|
|
passing. However, the overall design does not preclude different
|
|
implementation paradigms, such as those based on a shared memory
|
|
model.
|
|
|
|
The PSBLAS library is internally implemented in a mixture of
|
|
Fortran~77 and Fortran~95~\cite{metcalf} programming languages. A
|
|
similar approach has been advocated by a number of authors,
|
|
e.g.~\cite{machiels}. Moreover, the Fortran~95 facilities for dynamic
|
|
memory management and interface overloading greatly enhance the usability of the PSBLAS
|
|
subroutines. In this way, the library can take care of runtime memory
|
|
requirements that are quite difficult or even impossible to predict at
|
|
implementation or compilation time. The following presentation of the
|
|
PSBLAS library follows the general structure of the proposal for
|
|
serial Sparse BLAS~\cite{sblas97}, which in its turn is based on the
|
|
proposal for BLAS on dense matrices~\cite{BLAS1,BLAS2,BLAS3}.
|
|
|
|
The applicability of sparse iterative solvers to many different areas
|
|
causes some terminology problems because the same concept may be
|
|
denoted through different names depending on the application area. The
|
|
PSBLAS features presented in this section will be discussed mainly in terms of finite
|
|
difference discretizations of Partial Differential Equations (PDEs).
|
|
However, the scope of the library is wider than that: for example, it
|
|
can be applied to finite element discretizations of PDEs, and even to
|
|
different classes of problems such as nonlinear optimization, for
|
|
example in optimal control problems.
|
|
|
|
The design of a solver for sparse linear systems is driven by many
|
|
conflicting objectives, such as limiting occupation of storage
|
|
resources, exploiting regularities in the input data, exploiting
|
|
hardware characteristics of the parallel platform. To achieve an
|
|
optimal communication to computation ratio on distributed memory
|
|
machines it is essential to keep the {\em data locality} as high as
|
|
possible; this can be done through an appropriate data allocation
|
|
strategy. The choice of the preconditioner is another very important
|
|
factor that affects efficiency of the implemented application. Optimal
|
|
data distribution requirements for a given preconditioner may conflict
|
|
with distribution requirements of the rest of the solver. Finding the
|
|
optimal trade-off may be very difficult because it is application
|
|
dependent. Possible solution to these problems and other important
|
|
inputs to the development of the PSBLAS software package has come from
|
|
an established experience in applying the PSBLAS solvers to
|
|
computational fluid dynamics applications.
|
|
|
|
\section{General overview}
|
|
\label{sec:overview}
|
|
The PSBLAS library is designed to handle the implementation of
|
|
iterative solvers for sparse linear systems on distributed memory
|
|
parallel computers. The system coefficient matrix $A$ must be square;
|
|
it may be real or complex, nonsymmetric, and its sparsity pattern
|
|
needs not to be symmetric. The serial computation parts are based on
|
|
the serial sparse BLAS, so that any extension made to the data
|
|
structures of the serial kernels is available to the parallel
|
|
version. The overall design and parallelization strategy have been
|
|
influenced by the structure of the ScaLAPACK parallel
|
|
library~\cite{scalapack}. The layered structure of the PSBLAS library
|
|
is shown in figure~\ref{fig:psblas} ; lower layers of the library
|
|
indicate an encapsulation relationship with upper layers. The ongoing
|
|
discussion focuses on the Fortran~95 layer immediately below the
|
|
application layer.
|
|
The serial parts of the computation on each process are executed through
|
|
calls to the serial sparse BLAS subroutines. In a similar way, the
|
|
inter-process message exchanges are implemented through the Basic
|
|
Linear Algebra Communication Subroutines (BLACS) library~\cite{BLACS}
|
|
that guarantees a portable and efficient communication layer. The
|
|
Message Passing Interface code is encapsulated within the BLACS
|
|
layer. However, in some cases, MPI routines are directly used either
|
|
to improve efficiency or to implement communication patterns for which
|
|
the BLACS package doesn't provide any method.
|
|
|
|
\begin{figure}[h] \begin{center}
|
|
\includegraphics[scale=0.45]{figures/psblas}
|
|
\end{center}
|
|
\caption{PSBLAS library components hierarchy.\label{fig:psblas}}
|
|
\end{figure}
|
|
|
|
The PSBLAS library consists of two classes of subroutines that is, the
|
|
{\em computational routines} and the {\em auxiliary routines}. The
|
|
computational routine set includes:
|
|
|
|
\begin{itemize}
|
|
\item Sparse matrix by dense matrix product; \item Sparse triangular
|
|
systems solution for block diagonal matrices;
|
|
\item Vector and matrix norms;
|
|
\item Dense matrix sums;
|
|
\item Dot products.
|
|
\end{itemize}
|
|
The auxiliary routine set includes:
|
|
\begin{itemize}
|
|
\item Communication descriptors allocation;
|
|
\item Dense and sparse matrix allocation;
|
|
\item Dense and sparse matrix build and update;
|
|
\item Sparse matrix and data distribution preprocessing.
|
|
\end{itemize}
|
|
|
|
The following naming scheme has been adopted for all the symbols
|
|
internally defined in the PSBLAS software package:
|
|
\begin{itemize}
|
|
\item all the symbols (i.e. subroutine names, data types...) are
|
|
prefixed by \verb|psb_|
|
|
\item all the data type names are suffixed by \verb|_type|
|
|
\item all the constant values are suffixed by \verb|_|
|
|
\item all the subroutine names follow the rule \verb|psb_xxname| where
|
|
\verb|xx| can be either:
|
|
\begin{itemize}
|
|
\item \verb|ge|: the routine is related to dense data,
|
|
\item \verb|sp|: the routine is related to sparse data,
|
|
\item \verb|cd|: the routine is related to communication descriptor (see~\ref{sec:datastruct}).
|
|
\end{itemize}
|
|
For example the \verb|psb_geins|, \verb|psb_spins| and
|
|
\verb|psb_cdins| perform the same action (see~\ref{sec:toolsrout}) on
|
|
dense matrices, sparse matrices and communication descriptors
|
|
respectively.
|
|
Interface overloading allows the usage of the same subroutine
|
|
interfaces for both real and complex data.
|
|
\end{itemize}
|
|
In the description of the subroutines, arguments or argument entries
|
|
are classified as:
|
|
\begin{description}
|
|
\item[global] For input arguments, the value must be the same on all processes
|
|
participating in the subroutine call; for output arguments the value
|
|
is guaranteed to be the same.
|
|
\item[local] Each process has its own value(s) independently.
|
|
\end{description}
|
|
|
|
\subsection{Application structure}
|
|
|
|
The main underlying principle of the PSBLAS library is that the
|
|
library objects are created and exist with reference to a discretized
|
|
space to which there corresponds and index space and a matrix sparsity
|
|
pattern. As an example, consider a cell-centered finite-volume
|
|
discretization of the Navier-Stokes equations on a simulation domain;
|
|
the index space $1\dots n$ is isomorphic to the set of cell centers,
|
|
whereas the pattern of the associated linear system matrix is
|
|
isomorphic to the adjacency graph imposed on the discretization mesh
|
|
by the discretization stencil.
|
|
|
|
Thus the first order of business is to establish an index space, and
|
|
this is done with a call to \verb|psb_cdall| in which we specify the
|
|
size of the index space $n$ and the allocation of the elements of the
|
|
index space to the various processes making up the MPI (virtual)
|
|
parallel machine.
|
|
|
|
The index space is partitioned among processes, and this creates a
|
|
mapping from the ``global'' numbering $1\dots n$ to a numbering
|
|
``local'' to each process; each process $i$ will own a certain subset
|
|
$1\dots n_{\hbox{row}_i}$, each element of which corresponds to a certain
|
|
element of $1\dots n$. The user does not set explicitly this mapping;
|
|
when the application needs to indicate to which element of the index
|
|
space a certain item is related, such as the row and column index of a
|
|
matrix coefficient, it does so in the ``global'' numbering, and the
|
|
library will translate into the appropriate ``local'' numbering.
|
|
|
|
For a given index space $1\dots n$ there are many possible associated
|
|
topologies, i.e. many different discretization stencils; thus the
|
|
description of the index space is not completed until the user has
|
|
defined a sparsity pattern, either explicitly through \verb|psb_cdins|
|
|
or implicitly through \verb|psb_spins|. The descriptor is finalized
|
|
with a call to \verb|psb_cdasb| and a sparse matrix with a call to
|
|
\verb|psb_spasb|. After \verb|psb_cdasb| each process $i$ will have
|
|
defined a set of ``halo'' (or ``ghost'') indices
|
|
$n_{\hbox{row}_i}+1\dots n_{\hbox{col}_i}$, denoting elements of the index
|
|
space that are \emph{not} assigned to process $i$; however the
|
|
variables associated with them are needed to complete computations
|
|
associated with the sparse matrix $A$, and thus they have to be
|
|
fetched from (neighbouring) processes. The descriptor of the index
|
|
space is built exactly for the purpose of properly sequencing the
|
|
communication steps required to achieve this objective.
|
|
|
|
%%% Local Variables:
|
|
%%% mode: latex
|
|
%%% TeX-master: "userguide"
|
|
%%% End:
|