Go to file

sfilippone 49e99a3e82 Fix conversion and product to enable overlap with GPU		1 year ago
base	Fix conversion and product to enable overlap with GPU	1 year ago
cbind	Fix Makefiles for parallel make	3 years ago
config	Modify configry to only use HAVE_CUDA, since SPGU is recompiled.	1 year ago
cuda	Attempt at fixing CSRG in CUDA 10.2. Not complete yet.	1 year ago
docs	Fix typo in docs	3 years ago
ext	Merge psblas-ext into psblas, step 1: ext storage formats.	1 year ago
krylov	Fix Makefiles for parallel make	3 years ago
prec	Fix SHFT implementation, step 2	1 year ago
test	Merge branch 'repackage' into non-diag	1 year ago
util	added parallel double precision spspmm implementations	2 years ago
.gitignore	Fix gitignore	2 years ago
.travis.yml	First integration with travis-ci, with Dockerfile	4 years ago
Changelog	Update ChangeLog and README.	3 years ago
Dockerfile	First integration with travis-ci, with Dockerfile	4 years ago
LICENSE	Update ChangeLog and README.	3 years ago
Make.inc.in	Modify configry to only use HAVE_CUDA, since SPGU is recompiled.	1 year ago
Makefile	Rename GPU into cuda, and merge SPGPU code.	1 year ago
Makefile.am	psblas3:	15 years ago
README.md	Updated docs.	1 year ago
ReleaseNews	Update docs	3 years ago
autogen.sh	psblas2-dev	17 years ago
compile	Missing "compile" for configure.	2 years ago
configure	Check CUDA version for -dopt=on only from 11.7	1 year ago
configure.ac	Check CUDA version for -dopt=on only from 11.7	1 year ago
install-sh	Make sure to use install -p	2 years ago
missing	psblas2-dev:	17 years ago
notes.txt	Fix wording on notes.	4 years ago

README.md

PSBLAS library, version 3.9

The architecture of the Fortran 2003 sparse BLAS is described in:

S. Filippone, A. Buttari. Object-Oriented Techniques for Sparse Matrix Computations in Fortran 2003, ACM Trans. on Math. Software, vol. 38, No. 4, 2012.

The ideas are explored further with the paper:

V. Cardellini, S. Filippone and D. Rouson. Design Patterns for sparse-matrix computations on hybrid CPU/GPU platforms, Scientific Programming, 22(2014), pp.1-19.

Version 1.0 of the library is described in:

S. Filippone, M. Colajanni. PSBLAS: A library for parallel linear algebra computation on sparse matrices, ACM Trans. on Math. Software, 26(4), Dec. 2000, pp. 527-550.

UTILITIES

The test/util directory contains some utilities to convert to/from Harwell-Boeing and MatrixMarket file formats.

DOCUMENTATION

See docs/psblas-3.8.pdf; an HTML version of the same document is available in docs/html. Please consult the sample programs, especially test/pargen/psb_[sd]_pde[23]d.f90

OTHER SOFTWARE CREDITS

We originally included a modified implementation of some of the Sparker (serial sparse BLAS) material; this has been completely rewritten, way beyond the intention(s) and responsibilities of the original developers. The main reference for the serial sparse BLAS is:

Duff, I., Marrone, M., Radicati, G., and Vittoli, C. Level 3 basic linear algebra subprograms for sparse matrices: a user level interface, ACM Trans. Math. Softw., 23(3), 379-401, 1997.

CUDA and GPU support

This version of PSBLAS incorporates into a single package three entities that were previouslty separated:

PSBLAS -- the base library
PSBLAS-EXT -- a library providing additional storage formats
SPGPU -- a package of kernels for NVIDIA GPUs originally written by Davide Barbieri and Salvatore Filippone; see the license file cuda/License-spgpu.md

INSTALLING

To compile and run our software you will need the following prerequisites (see also SERIAL below):

A working version of MPI
A version of the BLAS; if you don't have a specific version for your platform you may try ATLAS available from http://math-atlas.sourceforge.net/
We have had good results with the METIS library, from http://www-users.cs.umn.edu/~karypis/metis/metis/main.html. This is optional; it is used in the util and test/fileread directories but only if you specify --with-metis.
If you have the AMD package of Davis, Duff and Amestoy, you can specify --with-amd (see ./configure --help for more details). We use the C interface to AMD.
If you have CUDA available, use --with-cuda= to specify the CUDA toolkit location --with-cudacc=XX,YY,ZZ to specify a list of target CCs (compute capabilities) to compile the CUDA code for.

The configure script will generate a Make.inc file suitable for building the library. The script is capable of recognizing the needed libraries with their default names; if they are in unusual places consider adding the paths with --with-libs, or explicitly specifying the names in --with-blas, etc. Please note that a common way for the configure script to fail is to specify inconsistent MPI vs. plain compilers, either directly or indirectly via environment variables; e.g. specifying the Intel compiler with FC=ifort while at the same time having an MPIFC=mpif90 which points to GNU Fortran. The best way to avoid this situation is (in our opinion) to use the environment modules package (see http://modules.sourceforge.net/), and load the relevant variables with (e.g.)

module load gnu46 openmpi

This will delegate to the modules setup to make sure that the version of openmpi in use is the one compiled with the gnu46 compilers. After the configure script has completed you can always tweak the Make.inc file yourself.

After you have Make.inc fixed, run

make

to compile the library; go to the test directory and its subdirectories to get test programs done. If you specify --prefix=/path you can do make install and the libraries will be installed under /path/lib, while the module files will be installed under /path/modules. The regular and experimental C interface header files are under /path/include.

SERIAL

Configuring with --enable-serial will provide a fake MPI stub library that enables running in pure serial mode; no MPI installation is needed in this case (but note that the fake MPI stubs are only guaranteed to cover what we use internally, it's not a complete replacement).

INTEGER SIZES

We have two kind of integers: IPK for local indices, and LPK for global indices. They can be specified independently at configure time, e.g. --with-ipk=4 --with-lpk=8 which is asking for 4-bytes local indices, and 8-bytes global indices (this is the default).

TODO

Fix all reamining bugs. Bugs? We dont' have any ! ;-)

The PSBLAS team.

Project lead: Salvatore Filippone

Contributors (roughly reverse cronological order):

Dimitri Walther Andea Di Iorio Stefano Petrilli Soren Rasmussen Zaak Beekman Ambra Abdullahi Hassan Pasqua D'Ambra Alfredo Buttari Daniela di Serafino Michele Martone Michele Colajanni Fabio Cerioni Stefano Maiolatesi Dario Pascucci

If you are looking for more sophisticated preconditioners, you may be interested in the package AMG4PSBLAS from http://github.com/sfilippone/amg4psblas

Contact: https://github.com/sfilippone/psblas3