Merge branch 'development' of github.com:sfilippone/amg4psblas into development

gpucinterfaces
sfilippone 4 months ago
commit 28ccefa4bb

@ -129,6 +129,21 @@ available for AMG4PSBLAS either and the operation will be purely on CPU/MPI. See
to production, storage and management of clean, decarbonized energy. to production, storage and management of clean, decarbonized energy.
Among them you have the possibility of running PSBLAS+AMG4PSBLAS on some test problems Among them you have the possibility of running PSBLAS+AMG4PSBLAS on some test problems
to become familiar with using the software. to become familiar with using the software.
## MPI and Compilers
The library has been successfully compiled and tested with the same compilers
and MPI implementations as PSBLAS 3.9, which include:
- MPICH 4.2.3, 4.3.0, 4.3.2
- OpenMPI 4.1.8. 5.0.7, 5.0.8, 5.0.9
combined with
- GNU compilers 10.5.0, 11.5.0, 12.5.0, 13.3.0, 14.2.0 14.3.0, 15.2.0
- LLVM 20.1.0 and 21.1.0 (except OpenMPI 4.1.8 which does not build with LLVM)
Moreover, it has been tested with the Intel OneAPI toolchain versions 2025.2 and 2025.3
As of this release, the NVIDIA compiler 25.7 fails to handle our code.
Cray, IBM and NAg compilers have been used for testing in the past, but not on this version.
## TODO and bugs ## TODO and bugs

12
configure vendored

@ -4250,7 +4250,7 @@ ac_compile='$FC -c $FCFLAGS $ac_fcflags_srcext conftest.$ac_ext >&5'
ac_link='$FC -o conftest$ac_exeext $FCFLAGS $LDFLAGS $ac_fcflags_srcext conftest.$ac_ext $LIBS >&5' ac_link='$FC -o conftest$ac_exeext $FCFLAGS $LDFLAGS $ac_fcflags_srcext conftest.$ac_ext $LIBS >&5'
ac_compiler_gnu=$ac_cv_fc_compiler_gnu ac_compiler_gnu=$ac_cv_fc_compiler_gnu
if test -n "$ac_tool_prefix"; then if test -n "$ac_tool_prefix"; then
for ac_prog in ftn xlf2003_r xlf2003 xlf95_r xlf95 xlf90 xlf pgf95 pgf90 ifx ifort ifc nagfor gfortran for ac_prog in ftn xlf2003_r xlf2003 xlf95_r xlf95 xlf90 xlf pgf95 pgf90 flang ifx ifort ifc nagfor gfortran
do do
# Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args.
set dummy $ac_tool_prefix$ac_prog; ac_word=$2 set dummy $ac_tool_prefix$ac_prog; ac_word=$2
@ -4300,7 +4300,7 @@ fi
fi fi
if test -z "$FC"; then if test -z "$FC"; then
ac_ct_FC=$FC ac_ct_FC=$FC
for ac_prog in ftn xlf2003_r xlf2003 xlf95_r xlf95 xlf90 xlf pgf95 pgf90 ifx ifort ifc nagfor gfortran for ac_prog in ftn xlf2003_r xlf2003 xlf95_r xlf95 xlf90 xlf pgf95 pgf90 flang ifx ifort ifc nagfor gfortran
do do
# Extract the first word of "$ac_prog", so it can be a program name with args. # Extract the first word of "$ac_prog", so it can be a program name with args.
set dummy $ac_prog; ac_word=$2 set dummy $ac_prog; ac_word=$2
@ -4738,7 +4738,7 @@ ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
ac_compiler_gnu=$ac_cv_c_compiler_gnu ac_compiler_gnu=$ac_cv_c_compiler_gnu
if test -n "$ac_tool_prefix"; then if test -n "$ac_tool_prefix"; then
for ac_prog in cc xlc pgcc icx icc gcc for ac_prog in cc xlc pgcc clang icx icc gcc
do do
# Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args.
set dummy $ac_tool_prefix$ac_prog; ac_word=$2 set dummy $ac_tool_prefix$ac_prog; ac_word=$2
@ -4788,7 +4788,7 @@ fi
fi fi
if test -z "$CC"; then if test -z "$CC"; then
ac_ct_CC=$CC ac_ct_CC=$CC
for ac_prog in cc xlc pgcc icx icc gcc for ac_prog in cc xlc pgcc clang icx icc gcc
do do
# Extract the first word of "$ac_prog", so it can be a program name with args. # Extract the first word of "$ac_prog", so it can be a program name with args.
set dummy $ac_prog; ac_word=$2 set dummy $ac_prog; ac_word=$2
@ -5442,7 +5442,7 @@ if test -z "$CXX"; then
CXX=$CCC CXX=$CCC
else else
if test -n "$ac_tool_prefix"; then if test -n "$ac_tool_prefix"; then
for ac_prog in CC xlc++ icpx icpc g++ for ac_prog in CC xlc++ clang++ icpx icpc g++
do do
# Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args.
set dummy $ac_tool_prefix$ac_prog; ac_word=$2 set dummy $ac_tool_prefix$ac_prog; ac_word=$2
@ -5492,7 +5492,7 @@ fi
fi fi
if test -z "$CXX"; then if test -z "$CXX"; then
ac_ct_CXX=$CXX ac_ct_CXX=$CXX
for ac_prog in CC xlc++ icpx icpc g++ for ac_prog in CC xlc++ clang++ icpx icpc g++
do do
# Extract the first word of "$ac_prog", so it can be a program name with args. # Extract the first word of "$ac_prog", so it can be a program name with args.
set dummy $ac_prog; ac_word=$2 set dummy $ac_prog; ac_word=$2

@ -129,10 +129,10 @@ dnl We set our own FC flags, ignore those from AC_PROG_FC but not those from the
dnl environment variable. Same for C dnl environment variable. Same for C
dnl dnl
save_FCFLAGS="$FCFLAGS"; save_FCFLAGS="$FCFLAGS";
AC_PROG_FC([ftn xlf2003_r xlf2003 xlf95_r xlf95 xlf90 xlf pgf95 pgf90 ifx ifort ifc nagfor gfortran]) AC_PROG_FC([ftn xlf2003_r xlf2003 xlf95_r xlf95 xlf90 xlf pgf95 pgf90 flang ifx ifort ifc nagfor gfortran])
FCFLAGS="$save_FCFLAGS"; FCFLAGS="$save_FCFLAGS";
save_CFLAGS="$CFLAGS"; save_CFLAGS="$CFLAGS";
AC_PROG_CC([cc xlc pgcc icx icc gcc ]) AC_PROG_CC([cc xlc pgcc clang icx icc gcc ])
if test "x$ac_cv_prog_cc_stdc" == "xno" ; then if test "x$ac_cv_prog_cc_stdc" == "xno" ; then
AC_MSG_ERROR([Problem : Need a C99 compiler ! ]) AC_MSG_ERROR([Problem : Need a C99 compiler ! ])
else else
@ -140,7 +140,7 @@ else
fi fi
CFLAGS="$save_CFLAGS"; CFLAGS="$save_CFLAGS";
save_CXXFLAGS="$CXXFLAGS"; save_CXXFLAGS="$CXXFLAGS";
AC_PROG_CXX([CC xlc++ icpx icpc g++]) AC_PROG_CXX([CC xlc++ clang++ icpx icpc g++])
CXXFLAGS="$save_CXXFLAGS"; CXXFLAGS="$save_CXXFLAGS";
dnl AC_PROG_CXX dnl AC_PROG_CXX

File diff suppressed because it is too large Load Diff

@ -1,976 +0,0 @@
dnl $Id$
dnl Process this file with autoconf to produce a configure script.
dnl
dnl usage : aclocal -I config/ && autoconf && ./configure && make
dnl then : VAR=VAL ./configure
dnl In some configurations (AIX) the next line is needed:
dnl MPIFC=mpxlf95 ./configure
dnl then : ./configure VAR=VAL
dnl then : ./configure --help=short
dnl then : ./configure --help
dnl the PSBLAS modules get this task difficult to accomplish!
dnl SEE : --module-path --include-path
dnl NOTE : There is no cross compilation support.
dnl NOTE : missing ifort and kl* library handling..
dnl NOTE : odd configurations like ifc + gcc still await in the mist of the unknown
###############################################################################
###############################################################################
#
# This script is used by the PSBLAS to determine the compilers, linkers, and
# libraries to build its libraries executable code.
# Its behaviour is driven on the compiler it finds or it is dictated to work
# with.
#
###############################################################################
###############################################################################
# NOTE: the literal for version (the second argument to AC_INIT should be a literal!)
AC_INIT([AMG4PSBLAS],1.1.0, [https://github.com/sfilippone/amg4psblas/issues])
# VERSION is the file containing the PSBLAS version code
# FIXME
amg4psblas_cv_version="1.1.0"
# A sample source file
AC_CONFIG_SRCDIR([amgprec/amg_prec_type.f90])
# Our custom M4 macros are in the 'config' directory
AC_CONFIG_MACRO_DIR([config])
AC_MSG_NOTICE([
--------------------------------------------------------------------------------
Welcome to the $PACKAGE_NAME $amg4psblas_cv_version configure Script.
This creates Make.inc, but if you read carefully the
documentation, you can make your own by hand for your needs.
./configure --with-psblas=/path/to/psblas
See ./configure --help=short fore more info.
--------------------------------------------------------------------------------
])
###############################################################################
# FLAGS and LIBS user customization
###############################################################################
dnl NOTE : no spaces before the comma, and no brackets before the second argument!
PAC_ARG_WITH_PSBLAS
PSBLAS_DIR="$pac_cv_psblas_dir";
PSBLAS_INCDIR="$pac_cv_psblas_incdir";
PSBLAS_MODDIR="$pac_cv_psblas_moddir";
PSBLAS_LIBDIR="$pac_cv_psblas_libdir";
AC_MSG_CHECKING([for PSBLAS install dir])
if test "X$PSBLAS_DIR" != "X" ; then
case $PSBLAS_DIR in
/*) ;;
*) AC_MSG_ERROR([The PSBLAS installation dir must be an absolute pathname
specified with --with-psblas=/path/to/psblas])
esac
if test ! -d "$PSBLAS_DIR" ; then
AC_MSG_ERROR([Could not find PSBLAS build dir $PSBLAS_DIR!])
fi
AC_MSG_RESULT([$PSBLAS_DIR])
fi
AM_INIT_AUTOMAKE
dnl Specify required version of autoconf.
AC_PREREQ(2.59)
#
# Installation.
#
#
AC_PROG_INSTALL
AC_MSG_CHECKING([where to install])
case $prefix in
\/* ) eval "INSTALL_DIR=$prefix";;
* ) eval "INSTALL_DIR=/usr/local/amg4psblas";;
esac
case $libdir in
\/* ) eval "INSTALL_LIBDIR=$libdir";;
* ) eval "INSTALL_LIBDIR=$INSTALL_DIR/lib";;
esac
case $includedir in
\/* ) eval "INSTALL_INCLUDEDIR=$includedir";;
* ) eval "INSTALL_INCLUDEDIR=$INSTALL_DIR/include";;
esac
INSTALL_MODULESDIR=$INSTALL_DIR/modules
case $docsdir in
\/* ) eval "INSTALL_DOCSDIR=$docsdir";;
* ) eval "INSTALL_DOCSDIR=$INSTALL_DIR/docs";;
esac
case $samplesdir in
\/* ) eval "INSTALL_SAMPLESDIR=$samplesdir";;
* ) eval "INSTALL_SAMPLESDIR=$INSTALL_DIR/samples";;
esac
AC_MSG_RESULT([$INSTALL_DIR $INSTALL_INCLUDEDIR $INSTALL_MODULESDIR $INSTALL_LIBDIR $INSTALL_DOCSDIR $INSTALL_SAMPLESDIR])
dnl
dnl We set our own FC flags, ignore those from AC_PROG_FC but not those from the
dnl environment variable. Same for C
dnl
save_FCFLAGS="$FCFLAGS";
AC_PROG_FC([ftn xlf2003_r xlf2003 xlf95_r xlf95 xlf90 xlf pgf95 pgf90 ifort ifc nagfor gfortran])
FCFLAGS="$save_FCFLAGS";
save_CFLAGS="$CFLAGS";
AC_PROG_CC([cc xlc pgcc icc gcc ])
CFLAGS="$save_CFLAGS";
save_CXXFLAGS="$CXXFLAGS";
AC_PROG_CXX([CC xlc++ icpc g++])
CXXFLAGS="$save_CXXFLAGS";
dnl AC_PROG_CXX
dnl AC_PROG_F90 doesn't exist, at the time of writing this !
dnl AC_PROG_F90
# Sanity checks, although redundant (useful when debugging this configure.ac)!
if test "X$FC" == "X" ; then
AC_MSG_ERROR([Problem : No Fortran compiler specified nor found!])
fi
if eval "$FC -qversion 2>&1 | grep XL 2>/dev/null" ; then
# Some configurations of the XLF want "-WF," prepended to -D.. flags.
# TODO : discover the exact conditions when the usage of -WF is needed.
amg_cv_define_prepend="-WF,"
if eval "$MPIFC -qversion 2>&1 | grep -e\"Version: 10\.\" 2>/dev/null"; then
FDEFINES="$amg_cv_define_prepend-DXLF_10 $FDEFINES"
fi
# Note : there could be problems with old xlf compiler versions ( <10.1 )
# since (as far as it is known to us) -WF, is not used in earlier versions.
# More problems could be undocumented yet.
fi
if test "X$CC" == "X" ; then
AC_MSG_ERROR([Problem : No C compiler specified nor found!])
fi
AC_PROG_CC_STDC()
if test "x$ac_cv_prog_cc_stdc" == "xno" ; then
AC_MSG_ERROR([Problem : Need a C99 compiler ! ])
else
C99OPT="$ac_cv_prog_cc_stdc";
fi
###############################################################################
# Suitable MPI compilers detection
###############################################################################
# Note: Someday we will contemplate a fake MPI - configured version of PSBLAS
###############################################################################
# First check whether the user required our serial (fake) mpi.
PAC_ARG_SERIAL_MPI
#Note : we miss the name of the Intel C compiler
if test x"$pac_cv_serial_mpi" == x"yes" ; then
FAKEMPI="fakempi.o";
MPIFC="$FC";
MPICC="$CC";
MPICXX="$CXX";
CXXDEFINES="-DSERIAL_MPI $CXXDEFINES";
else
AC_LANG([C])
if test "X$MPICC" = "X" ; then
# This is our MPICC compiler preference: it will override ACX_MPI's first try.
AC_CHECK_PROGS([MPICC],[mpxlc mpiicc mpcc mpicc cc])
fi
ACX_MPI([], [AC_MSG_ERROR([[Cannot find any suitable MPI implementation for C]])])
AC_LANG([Fortran])
if test "X$MPIFC" = "X" ; then
# This is our MPIFC compiler preference: it will override ACX_MPI's first try.
AC_CHECK_PROGS([MPIFC],[mpxlf2003_r mpxlf2003 mpxlf95_r mpxlf90 mpiifort mpf95 mpf90 mpifort mpif95 mpif90 ftn ])
fi
ACX_MPI([], [AC_MSG_ERROR([[Cannot find any suitable MPI implementation for Fortran]])])
AC_LANG([C++])
if test "X$MPICXX" = "X" ; then
# This is our MPICC compiler preference: it will override ACX_MPI's first try.
AC_CHECK_PROGS([MPICXX],[mpxlc++ mpiicpc mpicxx])
fi
ACX_MPI([], [AC_MSG_ERROR([[Cannot find any suitable MPI implementation for C++]])])
AC_LANG([Fortran])
FC="$MPIFC" ;
CC="$MPICC";
CXX="$MPICXX";
fi
AC_LANG([C])
dnl Now on, MPIFC should be set, and MPICC
###############################################################################
# Sanity checks, although redundant (useful when debugging this configure.ac)!
###############################################################################
if test "X$MPIFC" == "X" ; then
AC_MSG_ERROR([Problem : No MPI Fortran compiler specified nor found!])
fi
if test "X$MPICC" == "X" ; then
AC_MSG_ERROR([Problem : No MPI C compiler specified nor found!])
fi
###############################################################################
# FLAGS and LIBS user customization
###############################################################################
dnl NOTE : no spaces before the comma, and no brackets before the second argument!
PAC_ARG_WITH_FLAGS(ccopt,CCOPT)
PAC_ARG_WITH_FLAGS(cxxopt,CXXOPT)
PAC_ARG_WITH_FLAGS(fcopt,FCOPT)
PAC_ARG_WITH_LIBS
PAC_ARG_WITH_FLAGS(clibs,CLIBS)
PAC_ARG_WITH_FLAGS(flibs,FLIBS)
PAC_ARG_WITH_FLAGS(cxxlibs,CXXLIBS)
dnl candidates for removal:
PAC_ARG_WITH_FLAGS(library-path,LIBRARYPATH)
PAC_ARG_WITH_FLAGS(include-path,INCLUDEPATH)
PAC_ARG_WITH_FLAGS(module-path,MODULE_PATH)
# we just gave the user the chance to append values to these variables
PAC_ARG_WITH_EXTRA_LIBS
###############################################################################
# Sanity checks, although redundant (useful when debugging this configure.ac)!
###############################################################################
###############################################################################
# Compiler identification (sadly, it is necessary)
###############################################################################
psblas_cv_fc=""
dnl Do we use gfortran & co ? Compiler identification.
dnl NOTE : in /autoconf/autoconf/fortran.m4 there are plenty of better tests!
PAC_CHECK_HAVE_GFORTRAN(
[psblas_cv_fc="gcc"],
)
PAC_CHECK_HAVE_CRAYFTN(
[psblas_cv_fc="cray"],
)
if test x"$psblas_cv_fc" == "x" ; then
if eval "$MPIFC -qversion 2>&1 | grep XL 2>/dev/null" ; then
psblas_cv_fc="xlf"
# Some configurations of the XLF want "-WF," prepended to -D.. flags.
# TODO : discover the exact conditions when the usage of -WF is needed.
psblas_cv_define_prepend="-WF,"
if eval "$MPIFC -qversion 2>&1 | grep -e\"Version: 10\.\" 2>/dev/null"; then
FDEFINES="$psblas_cv_define_prepend-DXLF_10 $FDEFINES"
fi
# Note : there could be problems with old xlf compiler versions ( <10.1 )
# since (as far as it is known to us) -WF, is not used in earlier versions.
# More problems could be undocumented yet.
elif eval "$MPIFC -V 2>&1 | grep Sun 2>/dev/null" ; then
# Sun compiler detection
psblas_cv_fc="sun"
elif eval "$MPIFC -V 2>&1 | grep Portland 2>/dev/null" ; then
# Portland group compiler detection
psblas_cv_fc="pg"
elif eval "$MPIFC -V 2>&1 | grep Intel.*Fortran.*Compiler 2>/dev/null" ; then
# Intel compiler identification
psblas_cv_fc="ifc"
elif eval "$MPIFC -v 2>&1 | grep NAG 2>/dev/null" ; then
psblas_cv_fc="nag"
FC="$MPIFC"
else
psblas_cv_fc=""
# unsupported MPI Fortran compiler
AC_MSG_NOTICE([[Unknown Fortran compiler, proceeding with fingers crossed !]])
fi
fi
if test "X$psblas_cv_fc" == "Xgcc" ; then
PAC_HAVE_MODERN_GFORTRAN(
[],
[AC_MSG_ERROR([Bailing out.])]
)
fi
###############################################################################
# Linking, symbol mangling, and misc tests
###############################################################################
# Note : This is functional to Make.inc rules and structure (see below).
AC_LANG([C])
AC_CHECK_SIZEOF(void *)
# Define for platforms with 64 bit (void * ) pointers
if test X"$ac_cv_sizeof_void_p" == X"8" ; then
CDEFINES="-DPtr64Bits $CDEFINES"
fi
AC_LANG([Fortran])
__AC_FC_NAME_MANGLING
if test "X$psblas_cv_fc" == X"pg" ; then
FC=$save_FC
fi
AC_LANG([C])
dnl AC_MSG_NOTICE([Fortran name mangling: $ac_cv_fc_mangling])
[pac_fc_case=${ac_cv_fc_mangling%%,*}]
[pac_fc_under=${ac_cv_fc_mangling#*,}]
[pac_fc_sec_under=${pac_fc_under#*,}]
[pac_fc_sec_under=${pac_fc_sec_under# }]
[pac_fc_under=${pac_fc_under%%,*}]
[pac_fc_under=${pac_fc_under# }]
AC_MSG_CHECKING([defines for C/Fortran name interfaces])
if test "x$pac_fc_case" == "xlower case"; then
if test "x$pac_fc_under" == "xunderscore"; then
if test "x$pac_fc_sec_under" == "xno extra underscore"; then
pac_f_c_names="-DLowerUnderscore"
elif test "x$pac_fc_sec_under" == "xextra underscore"; then
pac_f_c_names="-DLowerDoubleUnderscore"
else
pac_f_c_names="-DUNKNOWN"
dnl AC_MSG_NOTICE([Fortran name mangling extra underscore unknown case])
fi
elif test "x$pac_fc_under" == "xno underscore"; then
pac_f_c_names="-DLowerCase"
else
pac_f_c_names="-DUNKNOWN"
dnl AC_MSG_NOTICE([Fortran name mangling underscore unknown case])
fi
elif test "x$pac_fc_case" == "xupper case"; then
if test "x$pac_fc_under" == "xunderscore"; then
if test "x$pac_fc_sec_under" == "xno extra underscore"; then
pac_f_c_names="-DUpperUnderscore"
elif test "x$pac_fc_sec_under" == "xextra underscore"; then
pac_f_c_names="-DUpperDoubleUnderscore"
else
pac_f_c_names="-DUNKNOWN"
dnl AC_MSG_NOTICE([Fortran name mangling extra underscore unknown case])
fi
elif test "x$pac_fc_under" == "xno underscore"; then
pac_f_c_names="-DUpperCase"
else
pac_f_c_names="-DUNKNOWN"
dnl AC_MSG_NOTICE([Fortran name mangling underscore unknown case])
fi
dnl AC_MSG_NOTICE([Fortran name mangling UPPERCASE not handled])
else
pac_f_c_names="-DUNKNOWN"
dnl AC_MSG_NOTICE([Fortran name mangling unknown case])
fi
CDEFINES="$pac_f_c_names $CDEFINES"
AC_MSG_RESULT([ $pac_f_c_names ])
###############################################################################
# Make.inc generation logic
###############################################################################
# Honor CFLAGS if they were specified explicitly, but --with-ccopt take precedence
if test "X$CCOPT" == "X" ; then
CCOPT="$CFLAGS";
fi
if test "X$CCOPT" == "X" ; then
if test "X$psblas_cv_fc" == "Xgcc" ; then
# note that no space should be placed around the equality symbol in assignements
# Note : 'native' is valid _only_ on GCC/x86 (32/64 bits)
CCOPT="-O3 $CCOPT"
elif test "X$psblas_cv_fc" == X"xlf" ; then
# XL compiler : consider using -qarch=auto
CCOPT="-O3 -qarch=auto $CCOPT"
elif test "X$psblas_cv_fc" == X"ifc" ; then
# other compilers ..
CCOPT="-O3 $CCOPT"
elif test "X$psblas_cv_fc" == X"pg" ; then
# other compilers ..
CCOPT="-fast $CCOPT"
# NOTE : PG & Sun use -fast instead -O3
elif test "X$psblas_cv_fc" == X"sun" ; then
# other compilers ..
CCOPT="-fast $CCOPT"
elif test "X$psblas_cv_fc" == X"cray" ; then
CCOPT="-O3 $CCOPT"
MPICC="cc"
elif test "X$psblas_cv_fc" == X"nag" ; then
# using GCC in conjunction with NAG.
CCOPT="-O2"
else
CCOPT="-O2 $CCOPT"
fi
fi
#CFLAGS="${CCOPT}"
if test "X$CXXOPT" == "X" ; then
CXXOPT="$CXXFLAGS";
fi
if test "X$CXXOPT" == "X" ; then
if test "X$psblas_cv_fc" == "Xgcc" ; then
# note that no space should be placed around the equality symbol in assignements
# Note : 'native' is valid _only_ on GCC/x86 (32/64 bits)
CXXOPT="-g -O3 $CXXOPT"
elif test "X$psblas_cv_fc" == X"xlf" ; then
# XL compiler : consider using -qarch=auto
CXXOPT="-O3 -qarch=auto $CXXOPT"
elif test "X$psblas_cv_fc" == X"ifc" ; then
# other compilers ..
CXXOPT="-O3 $CXXOPT"
elif test "X$psblas_cv_fc" == X"pg" ; then
# other compilers ..
CXXCOPT="-fast $CXXOPT"
# NOTE : PG & Sun use -fast instead -O3
elif test "X$psblas_cv_fc" == X"sun" ; then
# other compilers ..
CXXOPT="-fast $CXXOPT"
elif test "X$psblas_cv_fc" == X"cray" ; then
CXXOPT="-O3 $CXXOPT"
MPICXX="CC"
else
CXXOPT="-g -O3 $CXXOPT"
fi
fi
# Honor FCFLAGS if they were specified explicitly, but --with-fcopt take precedence
if test "X$FCOPT" == "X" ; then
FCOPT="$FCFLAGS";
fi
if test "X$FCOPT" == "X" ; then
if test "X$psblas_cv_fc" == "Xgcc" ; then
# note that no space should be placed around the equality symbol in assignations
# Note : 'native' is valid _only_ on GCC/x86 (32/64 bits)
FCOPT="-O3 $FCOPT"
elif test "X$psblas_cv_fc" == X"xlf" ; then
# XL compiler : consider using -qarch=auto
FCOPT="-O3 -qarch=auto -qlanglvl=extended -qxlf2003=polymorphic:autorealloc $FCOPT"
FCFLAGS="-qhalt=e -qlanglvl=extended -qxlf2003=polymorphic:autorealloc $FCFLAGS"
elif test "X$psblas_cv_fc" == X"ifc" ; then
# other compilers ..
FCOPT="-O3 $FCOPT"
elif test "X$psblas_cv_fc" == X"pg" ; then
# other compilers ..
FCOPT="-fast $FCOPT"
# NOTE : PG & Sun use -fast instead -O3
elif test "X$psblas_cv_fc" == X"sun" ; then
# other compilers ..
FCOPT="-fast $FCOPT"
elif test "X$psblas_cv_fc" == X"cray" ; then
FCOPT="-O3 -em $FCOPT"
elif test "X$psblas_cv_fc" == X"nag" ; then
# NAG compiler ..
FCOPT="-O2 "
# NOTE : PG & Sun use -fast instead -O3
else
FCOPT="-O2 $FCOPT"
fi
fi
if test "X$psblas_cv_fc" == X"nag" ; then
# Add needed options
FCOPT="$FCOPT -dcfuns -f2003 -wmismatch=mpi_scatterv,mpi_alltoallv,mpi_gatherv,mpi_allgatherv"
EXTRA_OPT="-mismatch_all"
fi
# COPT,FCOPT are aliases for CFLAGS,FCFLAGS .
##############################################################################
# Compilers variables selection
##############################################################################
FC=${FC}
CC=${CC}
CXX=${CXX}
CCOPT="$CCOPT $C99OPT"
##############################################################################
# Choice of our compilers, needed by Make.inc
##############################################################################
if test "X$psblas_cv_fc" == X"cray"
then
MODEXT=".mod"
FMFLAG="-I"
FIFLAG="-I"
BASEMODNAME=PSB_BASE_MOD
PRECMODNAME=PSB_PREC_MOD
METHDMODNAME=PSB_KRYLOV_MOD
UTILMODNAME=PSB_UTIL_MOD
else
AX_F90_MODULE_EXTENSION
AX_F90_MODULE_FLAG
MODEXT=".$ax_cv_f90_modext"
FMFLAG="${ax_cv_f90_modflag%%[ ]*}"
FIFLAG=-I
BASEMODNAME=psb_base_mod
PRECMODNAME=psb_prec_mod
METHDMODNAME=psb_krylov_mod
UTILMODNAME=psb_util_mod
fi
##############################################################################
# Choice of our compilers, needed by Make.inc
##############################################################################
if test "X$FLINK" == "X" ; then
FLINK=${MPF90}
fi
# Custom test : do we have a module or include for MPI Fortran interface?
if test x"$pac_cv_serial_mpi" == x"yes" ; then
FDEFINES="$psblas_cv_define_prepend-DSERIAL_MPI $psblas_cv_define_prepend-DMPI_MOD $FDEFINES";
else
PAC_FORTRAN_CHECK_HAVE_MPI_MOD_F08()
if test x"$pac_cv_mpi_f08" == x"yes" ; then
dnl FDEFINES="$psblas_cv_define_prepend-DMPI_MOD_F08 $FDEFINES";
FDEFINES="$psblas_cv_define_prepend-DMPI_MOD $FDEFINES";
else
PAC_FORTRAN_CHECK_HAVE_MPI_MOD(
[FDEFINES="$psblas_cv_define_prepend-DMPI_MOD $FDEFINES"],
[FDEFINES="$psblas_cv_define_prepend-DMPI_H $FDEFINES"])
fi
fi
FLINK="$MPIFC"
PAC_ARG_OPENMP()
if test x"$pac_cv_openmp" == x"yes" ; then
FDEFINES="$psblas_cv_define_prepend-DOPENMP $FDEFINES";
CDEFINES="-DOPENMP $CDEFINES";
FCOPT="$FCOPT $pac_cv_openmp_fcopt";
CCOPT="$CCOPT $pac_cv_openmp_ccopt";
FLINK="$FLINK $pac_cv_openmp_fcopt";
fi
PAC_FORTRAN_HAVE_PSBLAS([AC_MSG_RESULT([yes.])],
[AC_MSG_ERROR([no. Could not find working version of PSBLAS.])])
PAC_FORTRAN_PSBLAS_VERSION()
if test "x$pac_cv_psblas_major" == "xunknown"; then
AC_MSG_ERROR([PSBLAS version major "$pac_cv_psblas_major".])
fi
if test "x$pac_cv_psblas_minor" == "xunknown"; then
AC_MSG_ERROR([PSBLAS version minor "$pac_cv_psblas_minor".])
fi
if test "x$pac_cv_psblas_patchlevel" == "xunknown"; then
AC_MSG_ERROR([PSBLAS patchlevel "$pac_cv_psblas_patchlevel".])
fi
if (( $pac_cv_psblas_major < 3 )) ||
( (( $pac_cv_psblas_major == 3 )) && (( $pac_cv_psblas_minor < 8 ))) ; then
AC_MSG_ERROR([I need at least PSBLAS version 3.8.0])
else
AC_MSG_NOTICE([Am configuring with PSBLAS version $pac_cv_psblas_major.$pac_cv_psblas_minor.$pac_cv_psblas_patchlevel.])
fi
PAC_FORTRAN_PSBLAS_INTEGER_SIZES()
AC_MSG_NOTICE([PSBLAS size of LPK "$pac_cv_psblas_lpk".])
if test x"$pac_cv_psblas_lpk" == x8"" ; then
CXXDEFINES="-DBIT64 $CXXDEFINES";
fi
###############################################################################
# Parachute rules for ar and ranlib ... (could cause problems)
###############################################################################
if test "X$AR" == "X" ; then
AR="ar"
fi
if test "X$RANLIB" == "X" ; then
RANLIB="ranlib"
fi
# This should be portable
AR="${AR} -cur"
###############################################################################
# NOTE :
# Missing stuff :
# In the case the detected fortran compiler is ifort, icc or gcc
# should be valid options.
# The same for pg (Portland Group compilers).
###############################################################################
#
# Tests for support of various Fortran features; some of them are critical,
# some optional
#
#
# Critical features
#
PAC_FORTRAN_TEST_EXTENDS(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for EXTENDS.
Please get a Fortran compiler that supports it, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_TEST_CLASS_TBP(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for CLASS and type bound procedures.
Please get a Fortran compiler that supports them, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_TEST_SOURCE(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for SOURCE= allocation.
Please get a Fortran compiler that supports it, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_HAVE_MOVE_ALLOC(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for MOVE_ALLOC.
Please get a Fortran compiler that supports it, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_TEST_ISO_C_BIND(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for ISO_C_BINDING.
Please get a Fortran compiler that supports it, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_TEST_SAME_TYPE(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for SAME_TYPE_AS.
Please get a Fortran compiler that supports it, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_TEST_EXTENDS_TYPE(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for EXTENDS_TYPE_OF.
Please get a Fortran compiler that supports it, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_TEST_MOLD(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for MOLD= allocation.
Please get a Fortran compiler that supports it, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_TEST_VOLATILE(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for VOLATILE])]
)
PAC_FORTRAN_TEST_ISO_C_BIND(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for ISO_C_BINDING.
Please get a Fortran compiler that supports it, e.g. GNU Fortran 4.8.])]
)
PAC_FORTRAN_TEST_ISO_FORTRAN_ENV(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for ISO_FORTRAN_ENV])]
)
PAC_FORTRAN_TEST_FINAL(
[],
[AC_MSG_ERROR([Sorry, cannot build PSBLAS without support for FINAL])]
)
#
# Optional features
#
PAC_FORTRAN_TEST_GENERICS(
[],
[FDEFINES="$psblas_cv_define_prepend-DHAVE_BUGGY_GENERICS $FDEFINES"]
)
PAC_FORTRAN_TEST_FLUSH(
[FDEFINES="$psblas_cv_define_prepend-DHAVE_FLUSH_STMT $FDEFINES"],
)
###############################################################################
# Additional pathname stuff (yes, it is redundant and confusing...)
###############################################################################
# -I
if test x"$INCLUDEPATH" != "x" ; then
FINCLUDES="$FINCLUDES $INCLUDEPATH"
CINCLUDES="$CINCLUDES $INCLUDEPATH"
fi
# -L
if test x"$LIBRARYPATH" != "x" ; then
FINCLUDES="$FINCLUDES $LIBRARYPATH"
fi
# -I
if test x"$MODULE_PATH" != "x" ; then
FINCLUDES="$FINCLUDES $MODULE_PATH"
fi
###############################################################################
# BLAS library presence checks
###############################################################################
# Note : The libmkl.a (Intel Math Kernel Library) library could be used, too.
# It is sufficient to specify it as -lmkl in the CLIBS or FLIBS or LIBS
# and specify its path adjusting -L/path in CFLAGS.
# Right now it is a matter of user's taste when linking custom applications.
# But PSBLAS examples could take advantage of these libraries, too.
AC_LANG([Fortran])
###############################################################################
# BLAS library presence checks
###############################################################################
# Note : The libmkl.a (Intel Math Kernel Library) library could be used, too.
# It is sufficient to specify it as -lmkl in the CLIBS or FLIBS or LIBS
# and specify its path adjusting -L/path in CFLAGS.
# Right now it is a matter of user's taste when linking custom applications.
# But PSBLAS examples could take advantage of these libraries, too.
PAC_BLAS([], [AC_MSG_ERROR([[Cannot find BLAS library, specify a path using --with-blas=DIR/LIB (for example --with-blas=/usr/path/lib/libcxml.a)]])])
PAC_LAPACK(
[FDEFINES="$psblas_cv_define_prepend-DHAVE_LAPACK $FDEFINES"],
)
AC_LANG([C])
###############################################################################
# BLACS library presence checks
###############################################################################
#AC_LANG([C])
#if test x"$pac_cv_serial_mpi" == x"no" ; then
#save_FC="$FC";
#save_CC="$CC";
#FC="$MPIFC";
#CC="$MPICC";
#PAC_CHECK_BLACS
#FC="$save_FC";
#CC="$save_CC";
#fi
PAC_MAKE_IS_GNUMAKE
###############################################################################
# Auxiliary packages
###############################################################################
PAC_CHECK_METIS
AC_MSG_CHECKING([Compatibility between metis and LPK])
if test "x$pac_cv_lpk_size" == "x4" ; then
if test "x$pac_cv_metis_idx" == "x64" ; then
dnl mismatch between metis size and PSBLAS LPK
psblas_cv_have_metis="no";
dnl
fi
fi
if test "x$pac_cv_lpk_size" == "x8" ; then
if test "x$pac_cv_metis_idx" == "x32" ; then
dnl mismatch between metis size and PSBLAS LPK
psblas_cv_have_metis="no";
fi
fi
AC_MSG_RESULT([$psblas_cv_have_metis])
if test "x$pac_cv_metis_idx" == "xunknown" ; then
dnl mismatch between metis size and PSBLAS LPK
AC_MSG_NOTICE([Unknown METIS bitsize.])
$psblas_cv_have_metis = "no";
fi
if test "x$pac_cv_metis_real" == "xunknown" ; then
dnl mismatch between metis size and PSBLAS LPK
AC_MSG_NOTICE([Unknown METIS REAL bitsize.])
$psblas_cv_have_metis = "no";
fi
if test "x$psblas_cv_have_metis" == "xyes" ; then
FDEFINES="$psblas_cv_define_prepend-DHAVE_METIS $psblas_cv_define_prepend-DMETIS_$pac_cv_metis_idx $psblas_cv_define_prepend-DMETIS_REAL_$pac_cv_metis_real $FDEFINES"
CDEFINES="-DHAVE_METIS_ $psblas_cv_metis_includes $CDEFINES -DMETIS_$pac_cv_metis_idx -DMETIS_REAL_$pac_cv_metis_real"
METISINCFILE=$psblas_cv_metisincfile
fi
PAC_CHECK_MUMPS
#
# 1. Enable even with LPK=8, internally it will check if
# the problem size fits into 4 bytes, very likely since we
# are mostly using MUMPS at coarse level.
#
dnl if test "x$amg4psblas_cv_have_mumps" == "xyes" ; then
dnl if test "x$pac_cv_psblas_ipk" == "x8" ; then
dnl AC_MSG_NOTICE([PSBLAS defines PSB_IPK_ as $pac_cv_psblas_ipk. MUMPS interfacing disabled. ])
dnl MUMPS_FLAGS="";
dnl MUMPS_LIBS="";
dnl amg4psblas_cv_have_mumps=no;
dnl fi
dnl fi
if test "x$amg4psblas_cv_have_mumps" == "xyes" ; then
if test "x$pac_cv_psblas_lpk" == "x8" ; then
AC_MSG_NOTICE([PSBLAS defines PSB_LPK_ as $pac_cv_psblas_lpk. MUMPS interfacing will fail when called in global mode on very large matrices. ])
fi
if test "x$pac_mumps_fmods_ok" == "xyes" ; then
FDEFINES="$amg_cv_define_prepend-DHAVE_MUMPS_ $amg_cv_define_prepend-DHAVE_MUMPS_MODULES_ $MUMPS_MODULES $FDEFINES"
MUMPS_FLAGS="-DHave_MUMPS_ $MUMPS_MODULES"
elif test "x$pac_mumps_fincs_ok" == "xyes" ; then
FDEFINES="$amg_cv_define_prepend-DHAVE_MUMPS_ $amg_cv_define_prepend-DHAVE_MUMPS_INCLUDES_ $MUMPS_FINCLUDES $FDEFINES"
MUMPS_FLAGS="-DHave_MUMPS_ $MUMPS_INCLUDES"
else
# This should not happen
MUMPS_FLAGS=""
MUMPS_LIBS=""
fi
else
MUMPS_FLAGS=""
MUMPS_LIBS=""
fi
PAC_CHECK_UMFPACK
if test "x$amg4psblas_cv_have_umfpack" == "xyes" ; then
UMF_FLAGS="-DHave_UMF_ $UMF_INCLUDES"
FDEFINES="$amg_cv_define_prepend-DHAVE_UMF_ $FDEFINES"
else
UMF_FLAGS=""
fi
PAC_CHECK_SUPERLU
if test "x$amg4psblas_cv_have_superlu" == "xyes" ; then
SLU_FLAGS="-DHave_SLU_ -DSLU_VERSION_$pac_slu_version $SLU_INCLUDES"
FDEFINES="$amg_cv_define_prepend-DHAVE_SLU_ $FDEFINES"
else
SLU_FLAGS=""
fi
PAC_CHECK_SUPERLUDIST()
if test "x$amg4psblas_cv_have_superludist" == "xyes" ; then
pac_sludist_version="$amg4psblas_cv_superludist_major$amg4psblas_cv_superludist_minor";
AC_MSG_NOTICE([Configuring with SuperLU_DIST version flag $pac_sludist_version])
SLUDIST_FLAGS=""
SLUDIST_FLAGS="-DHave_SLUDist_ -DSLUD_VERSION_="$pac_sludist_version" $SLUDIST_INCLUDES"
FDEFINES="$amg_cv_define_prepend-DHAVE_SLUDIST_ $FDEFINES"
else
SLUDIST_FLAGS=""
fi
##############################################
FINCLUDES="$PSBLAS_INCLUDES"
AMGFDEFINES="$FDEFINES"
AMGCDEFINES="$CDEFINES"
AMGCXXDEFINES="$CXXDEFINES"
LIBDIR=lib
BASELIBNAME=libpsb_base.a
PRECLIBNAME=libpsb_prec.a
METHDLIBNAME=libpsb_krylov.a
UTILLIBNAME=libpsb_util.a
AMGLIBNAME=libamg_prec.a
COMPILERULES='
PSBLDLIBS=$(LAPACK) $(BLAS) $(METIS_LIB) $(AMD_LIB) $(LIBS)
CXXDEFINES=$(PSBCXXDEFINES)
CDEFINES=$(PSBCDEFINES)
FDEFINES=$(PSBFDEFINES)
# These should be portable rules, arent they?
.c.o:
$(CC) $(CCOPT) $(CINCLUDES) $(CDEFINES) -c $< -o $@
.f90.o:
$(FC) $(FCOPT) $(FINCLUDES) -c $< -o $@
.F90.o:
$(FC) $(FCOPT) $(FINCLUDES) $(FDEFINES) -c $< -o $@
.cpp.o:
$(CXX) $(CXXOPT) $(CXXINCLUDES) $(CXXDEFINES) -c $< -o $@'
###############################################################################
# Variable substitutions : the Make.inc.in will have these @VARIABLES@
# substituted.
AC_SUBST(PSBLAS_DIR)
AC_SUBST(PSBLAS_INCDIR)
AC_SUBST(PSBLAS_MODDIR)
AC_SUBST(PSBLAS_LIBDIR)
AC_SUBST(PSBLAS_INCLUDES)
dnl AC_SUBST(PSBLAS_INSTALL_MAKEINC)
AC_SUBST(PSBLAS_LIBS)
AC_SUBST(PSBLAS_RULES)
AC_SUBST(INSTALL)
AC_SUBST(INSTALL_DATA)
AC_SUBST(INSTALL_DIR)
AC_SUBST(INSTALL_LIBDIR)
AC_SUBST(INSTALL_INCLUDEDIR)
AC_SUBST(INSTALL_MODULESDIR)
AC_SUBST(INSTALL_DOCSDIR)
AC_SUBST(INSTALL_SAMPLESDIR)
AC_SUBST(EXTRA_LIBS)
AC_SUBST(BLAS_LIBS)
AC_SUBST(LAPACK_LIBS)
AC_SUBST(METIS_LIBS)
AC_SUBST(MUMPS_FLAGS)
AC_SUBST(MUMPS_LIBS)
AC_SUBST(SLU_FLAGS)
AC_SUBST(SLU_LIBS)
AC_SUBST(UMF_FLAGS)
AC_SUBST(UMF_LIBS)
AC_SUBST(SLUDIST_FLAGS)
AC_SUBST(SLUDIST_LIBS)
AC_SUBST(AMGFDEFINES)
AC_SUBST(AMGCDEFINES)
AC_SUBST(AMGCXXDEFINES)
AC_SUBST(MODEXT)
AC_SUBST(COMPILERULES)
AC_SUBST(FDEFINES)
AC_SUBST(CDEFINES)
AC_SUBST(BASEMODNAME)
AC_SUBST(PRECMODNAME)
AC_SUBST(METHDMODNAME)
AC_SUBST(UTILMODNAME)
AC_SUBST(AMGLIBNAME)
AC_SUBST(MPIFC)
AC_SUBST(MPICC)
AC_SUBST(MPICXX)
AC_SUBST(FCOPT)
AC_SUBST(CCOPT)
AC_SUBST(CXXOPT)
AC_SUBST(EXTRA_OPT)
AC_SUBST(FAKEMPI)
AC_SUBST(FIFLAG)
AC_SUBST(FMFLAG)
AC_SUBST(MODEXT)
AC_SUBST(FLINK)
AC_SUBST(LIBS)
AC_SUBST(AR)
AC_SUBST(RANLIB)
AC_SUBST(MPIFC)
AC_SUBST(MPIFCC)
###############################################################################
# the following files will be created by Automake
AC_CONFIG_FILES([Make_n.inc])
AC_OUTPUT()
###############################################################################
dnl Please note that brackets around variable identifiers are absolutely needed for compatibility..
AC_MSG_NOTICE([
${PACKAGE_NAME} ${amg4psblas_cv_version} has been configured as follows:
PSBLAS library : ${PSBLAS_DIR}
MUMPS detected : ${amg4psblas_cv_have_mumps}
SuperLU detected : ${amg4psblas_cv_have_superlu}
SuperLU_Dist detected : ${amg4psblas_cv_have_superludist}
UMFPack detected : ${amg4psblas_cv_have_umfpack}
If you are satisfied, run 'make' to build ${PACKAGE_NAME} and its documentation; otherwise
type ./configure --help=short for a complete list of configure options specific to ${PACKAGE_NAME}.
dnl To install the program and its documentation, run 'make install' if you are root,
dnl or run 'su -c "make install"' if you are not root.
])
###############################################################################

Binary file not shown.

@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR</span><br
class="newline" /> <span class="newline" /> <span
class="cmr-12">Software version: 1.2</span><br class="cmr-12">Software version: 1.2</span><br
class="newline" /><span class="newline" /><span
class="cmr-12">December 31st, 2025</span> class="cmr-12">December 23rd, 2025</span>

@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR</span><br
class="newline" /> <span class="newline" /> <span
class="cmr-12">Software version: 1.2</span><br class="cmr-12">Software version: 1.2</span><br
class="newline" /><span class="newline" /><span
class="cmr-12">December 31st, 2025</span> class="cmr-12">December 23rd, 2025</span>

@ -72,32 +72,34 @@ class="small-caps">n</span></span>
class="cmcsc-10x-x-120">PSBLAS</span><span class="cmcsc-10x-x-120">PSBLAS</span><span
class="cmr-12">) is a package of parallel algebraic multilevel preconditioners included in the</span> class="cmr-12">) is a package of parallel algebraic multilevel preconditioners included in the</span>
<span <span
class="cmr-12">PSCToolkit (Parallel Sparse Computation Toolkit) software framework. It is a progress</span> class="cmr-12">PSCToolkit (Parallel Sparse Computation Toolkit) software framework. It</span>
<span <span
class="cmr-12">of a software development project started in 2007, named MLD2P4, which originally</span> class="cmr-12">is an evolutiuon of a software development project started in 2007, named</span>
<span <span
class="cmr-12">implemented a multilevel version of some domain decomposition preconditioners of</span> class="cmr-12">MLD2P4, which originally implemented a multilevel version of some domain</span>
<span <span
class="cmr-12">additive-Schwarz type, and was based on a parallel decoupled version of the well known</span> class="cmr-12">decomposition preconditioners of additive-Schwarz type, and was based on a parallel</span>
<span <span
class="cmr-12">smoothed aggregation method to generate the multilevel hierarchy of coarser</span> class="cmr-12">decoupled version of the well known smoothed aggregation method to generate the</span>
<span <span
class="cmr-12">matrices. In the last years, within the context of the EU-H2020 EoCoE project</span> class="cmr-12">multilevel hierarchy of coarser matrices. In the last few years the package</span>
<span <span
class="cmr-12">(Energy Oriented Center of Excellence), the package was extended for including</span> class="cmr-12">was extended for including new algorithms and functionalities for the setup</span>
<span <span
class="cmr-12">new algorithms and functionalities for the setup and application new AMG</span> class="cmr-12">and application new AMG preconditioners with the final aims of improving</span>
<span <span
class="cmr-12">preconditioners with the final aims of improving efficiency and scalability when tens of</span> class="cmr-12">efficiency and scalability when tens of thousands cores are used, and of boosting</span>
<span <span
class="cmr-12">thousands cores are used, and of boosting reliability in dealing with general</span> class="cmr-12">reliability in dealing with general symmetric positive definite linear systems; these</span>
<span <span
class="cmr-12">symmetric positive definite linear systems. Due to the significant number</span> class="cmr-12">developments have been supported in the context of the EU-H2020 EoCoE</span>
<span
class="cmr-12">project (Energy Oriented Center of Excellence). Due to the significant number</span>
<span <span
class="cmr-12">of changes and the increase in scope, we decided to rename the package as</span> class="cmr-12">of changes and the increase in scope, we decided to rename the package as</span>
<span <span
class="cmr-12">AMG4PSBLAS.</span> class="cmr-12">AMG4PSBLAS.</span>
<!--l. 16--><p class="indent" > <span <!--l. 27--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS has been designed to provide scalable and easy-to-use</span> class="cmr-12">AMG4PSBLAS has been designed to provide scalable and easy-to-use</span>
<span <span
class="cmr-12">preconditioners in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra</span> class="cmr-12">preconditioners in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra</span>
@ -111,14 +113,14 @@ class="cmr-12">algebraic approach; therefore users level interfaces assume that
class="cmr-12">and preconditioners are represented as PSBLAS distributed sparse matrices.</span> class="cmr-12">and preconditioners are represented as PSBLAS distributed sparse matrices.</span>
<span <span
class="cmr-12">AMG4PSBLAS enables the user to easily specify different features of an algebraic</span> class="cmr-12">AMG4PSBLAS enables the user to easily specify different features of an algebraic</span>
<span
class="cmr-12">multilevel preconditioner, thus allowing to experiment with different preconditioners for</span>
<span
class="cmr-12">multilevel preconditioner, thus allowing to experiment with different preconditioners for</span>
<span <span
class="cmr-12">the problem and parallel computers at hand.</span> class="cmr-12">the problem and parallel computers at hand.</span>
<!--l. 27--><p class="indent" > <span <!--l. 39--><p class="indent" > <span
class="cmr-12">The package employs object-oriented design techniques in Fortran</span><span class="cmr-12">The package employs object-oriented design techniques in Fortran</span><span
class="cmr-12">&#x00A0;2003, with</span> class="cmr-12">&#x00A0;2003, with</span>
<span <span
@ -132,7 +134,7 @@ class="cmr-12">parallel implementation is based on a Single Program Multiple Dat
class="cmr-12">paradigm; the inter-process communication is based on MPI and is managed mainly</span> class="cmr-12">paradigm; the inter-process communication is based on MPI and is managed mainly</span>
<span <span
class="cmr-12">through PSBLAS.</span> class="cmr-12">through PSBLAS.</span>
<!--l. 35--><p class="indent" > <span <!--l. 47--><p class="indent" > <span
class="cmr-12">This guide provides a brief description of the functionalities and the user interface</span> class="cmr-12">This guide provides a brief description of the functionalities and the user interface</span>
<span <span
class="cmr-12">of AMG4PSBLAS.</span> class="cmr-12">of AMG4PSBLAS.</span>

@ -100,18 +100,18 @@ src="userhtml0x.png" alt="Ax = b,
id="x4-3001r1"></a></div> id="x4-3001r1"></a></div>
</td><td class="equation-label"><span </td><td class="equation-label"><span
class="cmr-12">(1)</span></td></tr></table> class="cmr-12">(1)</span></td></tr></table>
<!--l. 11--><p class="nopar" ><span <!--l. 13--><p class="nopar" ><span
class="cmr-12">where </span><span class="cmr-12">where </span><span
class="cmmi-12">A </span><span class="cmmi-12">A </span><span
class="cmr-12">is a square, real or complex, sparse symmetric positive definite (s.p.d)</span> class="cmr-12">is a square, real or complex, sparse symmetric positive definite (s.p.d)</span>
<span <span
class="cmr-12">matrix.</span> class="cmr-12">matrix.</span>
<!--l. 19--><p class="indent" > <span <!--l. 21--><p class="indent" > <span
class="cmr-12">The preconditioners implemented in AMG4PSBLAS are obtained by combining 3</span> class="cmr-12">The preconditioners implemented in AMG4PSBLAS are obtained by combining 3</span>
<span <span
class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. Available</span> class="cmr-12">different types of AMG cycles with smoothers and coarsest-level solvers. We provide a</span>
<span <span
class="cmr-12">multigrid cycles include the V-, W-, and a version of a Krylov-type cycle</span> class="cmr-12">number of multigrid cycles, including the V-, W-, and a version of a Krylov-type cycle</span>
<span <span
class="cmr-12">(K-cycle)</span><span class="cmr-12">(K-cycle)</span><span
class="cmr-12">&#x00A0;</span><span class="cite"><span class="cmr-12">&#x00A0;</span><span class="cite"><span
@ -140,7 +140,7 @@ href="userhtmlli3.html#XDDF2020"><span
class="cmr-12">14</span></a><span class="cmr-12">14</span></a><span
class="cmr-12">]</span></span><span class="cmr-12">]</span></span><span
class="cmr-12">.</span> class="cmr-12">.</span>
<!--l. 30--><p class="indent" > <span <!--l. 34--><p class="indent" > <span
class="cmr-12">An algebraic approach is used to generate a hierarchy of coarse-level matrices and</span> class="cmr-12">An algebraic approach is used to generate a hierarchy of coarse-level matrices and</span>
<span <span
class="cmr-12">operators, without explicitly using any information on the geometry of the original</span> class="cmr-12">operators, without explicitly using any information on the geometry of the original</span>
@ -150,7 +150,7 @@ class="cmr-12">problem, e.g., the discretization of a PDE. To this end, two diff
class="cmr-12">strategies, based on aggregation, are available:</span> class="cmr-12">strategies, based on aggregation, are available:</span>
<ul class="itemize1"> <ul class="itemize1">
<li class="itemize"> <li class="itemize">
<!--l. 35--><p class="noindent" ><span <!--l. 39--><p class="noindent" ><span
class="cmr-12">a decoupled version of the smoothed aggregation procedure proposed in</span><span class="cmr-12">a decoupled version of the smoothed aggregation procedure proposed in</span><span
class="cmr-12">&#x00A0;</span><span class="cite"><span class="cmr-12">&#x00A0;</span><span class="cite"><span
class="cmr-12">[</span><a class="cmr-12">[</span><a
@ -178,7 +178,7 @@ class="cmr-12">;</span>
</li> </li>
<li class="itemize"> <li class="itemize">
<!--l. 39--><p class="noindent" ><span <!--l. 43--><p class="noindent" ><span
class="cmr-12">a coupled, parallel implementation of the Coarsening based on Compatible</span> class="cmr-12">a coupled, parallel implementation of the Coarsening based on Compatible</span>
<span <span
class="cmr-12">Weighted Matching introduced in</span><span class="cmr-12">Weighted Matching introduced in</span><span
@ -198,7 +198,7 @@ href="userhtmlli3.html#XDDF2020"><span
class="cmr-12">14</span></a><span class="cmr-12">14</span></a><span
class="cmr-12">]</span></span><span class="cmr-12">]</span></span><span
class="cmr-12">;</span></li></ul> class="cmr-12">;</span></li></ul>
<!--l. 43--><p class="noindent" ><span <!--l. 47--><p class="noindent" ><span
class="cmr-12">Either exact or approximate solvers can be used on the coarsest-level system. We provide</span> class="cmr-12">Either exact or approximate solvers can be used on the coarsest-level system. We provide</span>
<span <span
class="cmr-12">interfaces to various parallel and sequential sparse LU factorizations from external</span> class="cmr-12">interfaces to various parallel and sequential sparse LU factorizations from external</span>
@ -210,7 +210,7 @@ class="cmr-12">parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solve
class="cmr-12">preconditioned Krylov methods; all smoothers can be also exploited as one-level</span> class="cmr-12">preconditioned Krylov methods; all smoothers can be also exploited as one-level</span>
<span <span
class="cmr-12">preconditioners.</span> class="cmr-12">preconditioners.</span>
<!--l. 50--><p class="indent" > <span <!--l. 55--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS is written in Fortran</span><span class="cmr-12">AMG4PSBLAS is written in Fortran</span><span
class="cmr-12">&#x00A0;2003, following an object-oriented design</span> class="cmr-12">&#x00A0;2003, following an object-oriented design</span>
<span <span
@ -225,12 +225,12 @@ class="cmr-12">Single and double precision implementations of AMG4PSBLAS are ava
class="cmr-12">for both the real and the complex case, which can be used through a single</span> class="cmr-12">for both the real and the complex case, which can be used through a single</span>
<span <span
class="cmr-12">interface.</span> class="cmr-12">interface.</span>
<!--l. 60--><p class="indent" > <span <!--l. 65--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS has been designed to implement scalable and easy-to-use</span> class="cmr-12">AMG4PSBLAS has been designed to implement scalable and easy-to-use multilevel</span>
<span <span
class="cmr-12">multilevel preconditioners in the context of the PSBLAS (Parallel Sparse BLAS)</span> class="cmr-12">preconditioners in the context of the PSBLAS (Parallel Sparse BLAS) computational</span>
<span <span
class="cmr-12">computational framework</span><span class="cmr-12">framework</span><span
class="cmr-12">&#x00A0;</span><span class="cite"><span class="cmr-12">&#x00A0;</span><span class="cite"><span
class="cmr-12">[</span><a class="cmr-12">[</span><a
href="userhtmlli3.html#Xpsblas_00"><span href="userhtmlli3.html#Xpsblas_00"><span
@ -240,37 +240,35 @@ class="cmr-12">&#x00A0;</span><a
href="userhtmlli3.html#XPSBLAS3"><span href="userhtmlli3.html#XPSBLAS3"><span
class="cmr-12">22</span></a><span class="cmr-12">22</span></a><span
class="cmr-12">]</span></span><span class="cmr-12">]</span></span><span
class="cmr-12">. PSBLAS provides basic linear algebra operators</span> class="cmr-12">. PSBLAS provides basic linear algebra operators and data</span>
<span <span
class="cmr-12">and data management facilities for distributed sparse matrices, kernels for</span> class="cmr-12">management facilities for distributed sparse matrices, kernels for sequential incomplete</span>
<span <span
class="cmr-12">sequential incomplete factorizations needed for the parallel block-Jacobi and</span> class="cmr-12">factorizations needed for the parallel block-Jacobi and additive Schwarz smoothers, and</span>
<span <span
class="cmr-12">additive Schwarz smoothers, and parallel Krylov solvers which can be used with</span> class="cmr-12">parallel Krylov solvers which can be used with the AMG4PSBLAS preconditioners.</span>
<span <span
class="cmr-12">the AMG4PSBLAS preconditioners. The choice of PSBLAS has been mainly</span> class="cmr-12">The choice of PSBLAS has been mainly motivated by the need of having a portable</span>
<span <span
class="cmr-12">motivated by the need of having a portable and efficient software infrastructure</span> class="cmr-12">and efficient software infrastructure implementing &#8220;de facto&#8221; standard parallel sparse</span>
<span <span
class="cmr-12">implementing &#8220;de facto&#8221; standard parallel sparse linear algebra kernels, to</span> class="cmr-12">linear algebra kernels, to pursue goals such as performance, portability, modularity</span>
<span <span
class="cmr-12">pursue goals such as performance, portability, modularity ed extensibility</span> class="cmr-12">ed extensibility in the development of the preconditioner package. On the</span>
<span <span
class="cmr-12">in the development of the preconditioner package. On the other hand, the</span> class="cmr-12">other hand, the implementation of AMG4PSBLAS, which was driven by the</span>
<span <span
class="cmr-12">implementation of AMG4PSBLAS, which was driven by the need to face the exascale</span> class="cmr-12">need to face the exascale challenge, has led to some important revisions and</span>
<span <span
class="cmr-12">challenge, has led to some important revisions and extentions of the PSBLAS</span> class="cmr-12">extentions of the PSBLAS infrastructure. The inter-process comunication</span>
<span <span
class="cmr-12">infrastructure. The inter-process comunication required by AMG4PSBLAS</span> class="cmr-12">required by AMG4PSBLAS is encapsulated in the PSBLAS routines; therefore,</span>
<span <span
class="cmr-12">is encapsulated in the PSBLAS routines; therefore, AMG4PSBLAS can be</span> class="cmr-12">AMG4PSBLAS can be run on any parallel machine where PSBLAS implementations</span>
<span <span
class="cmr-12">run on any parallel machine where PSBLAS implementations are available.</span> class="cmr-12">are available. The most recent version of PSBLAS (release 3.9) includes a plug-in for</span>
<span <span
class="cmr-12">In the most recent version of PSBLAS (release 3.7), a plug-in for GPU is</span> class="cmr-12">GPU; it contains CUDA versions of main vector operations and of sparse</span>
<span
class="cmr-12">included; it includes CUDA versions of main vector operations and of sparse</span>
<span <span
class="cmr-12">matrix-vector multiplication, so that Krylov methods coupled with AMG4PSBLAS</span> class="cmr-12">matrix-vector multiplication, so that Krylov methods coupled with AMG4PSBLAS</span>
<span <span
@ -279,17 +277,17 @@ class="cmr-12">preconditioners relying on Jacobi and block-Jacobi smoothers with
class="cmr-12">approximate inverses on the blocks can be efficiently executed on cluster of</span> class="cmr-12">approximate inverses on the blocks can be efficiently executed on cluster of</span>
<span <span
class="cmr-12">GPUs.</span> class="cmr-12">GPUs.</span>
<!--l. 85--><p class="indent" > <span <!--l. 90--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS has a layered and modular software architecture where three main</span> class="cmr-12">AMG4PSBLAS has a layered and modular software architecture where three main</span>
<span <span
class="cmr-12">layers can be identified. The lower layer consists of the PSBLAS kernels, the middle</span> class="cmr-12">layers can be identified. The lower layer consists of the PSBLAS kernels, the middle</span>
<span <span
class="cmr-12">one implements the construction and application phases of the preconditioners, and the</span> class="cmr-12">one implements the construction and application phases of the preconditioners, and the</span>
<span
class="cmr-12">upper one provides a uniform interface to all the preconditioners. This architecture</span>
<span
class="cmr-12">upper one provides a uniform interface to all the preconditioners. This architecture</span>
<span <span
class="cmr-12">allows for different levels of use of the package: few black-box routines at the upper</span> class="cmr-12">allows for different levels of use of the package: few black-box routines at the upper</span>
<span <span
@ -304,7 +302,7 @@ class="cmr-12">&#x00A0;</span><a
href="userhtmlse6.html#x9-310006"><span href="userhtmlse6.html#x9-310006"><span
class="cmr-12">6</span><!--tex4ht:ref: sec:adding --></a><span class="cmr-12">6</span><!--tex4ht:ref: sec:adding --></a><span
class="cmr-12">).</span> class="cmr-12">).</span>
<!--l. 96--><p class="indent" > <span <!--l. 102--><p class="indent" > <span
class="cmr-12">This guide is organized as follows. General information on the distribution of the</span> class="cmr-12">This guide is organized as follows. General information on the distribution of the</span>
<span <span
class="cmr-12">source code is reported in Section</span><span class="cmr-12">source code is reported in Section</span><span

@ -58,10 +58,10 @@ class="cmr-12">. Most Fortran compilers provide this feature; in particular, thi
<span <span
class="cmr-12">supported by the GNU Fortran compiler, for which we recommend to use at least</span> class="cmr-12">supported by the GNU Fortran compiler, for which we recommend to use at least</span>
<span <span
class="cmr-12">version 4.8. The software defines data types and interfaces for real and complex data,</span> class="cmr-12">version 12. The software defines data types and interfaces for real and complex data, in</span>
<span <span
class="cmr-12">in both single and double precision.</span> class="cmr-12">both single and double precision.</span>
<!--l. 20--><p class="indent" > <span <!--l. 19--><p class="indent" > <span
class="cmr-12">Building AMG4PSBLAS requires some base libraries (see Section</span><span class="cmr-12">Building AMG4PSBLAS requires some base libraries (see Section</span><span
class="cmr-12">&#x00A0;</span><a class="cmr-12">&#x00A0;</span><a
href="#x6-80003.1"><span href="#x6-80003.1"><span
@ -3911,7 +3911,7 @@ class="cmtt-12">issues</span></span><span style="color:#000000"><span
class="cmtt-12">&#x003E;.</span></span> class="cmtt-12">&#x003E;.</span></span>
</pre> </pre>
<!--l. 160--><p class="noindent" ><span <!--l. 160--><p class="noindent" ><span
class="cmr-12">For instance, if a user has built and installed PSBLAS 3.7 under the </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">For instance, if a user has built and installed PSBLAS 3.9 under the </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">/opt</span></span></span> <span class="cmtt-12">/opt</span></span></span> <span
class="cmr-12">directory and is</span> class="cmr-12">directory and is</span>
<span <span
@ -3922,7 +3922,7 @@ class="cmr-12">might be configured with:</span>
<pre class="verbatim" id="verbatim-4"> <pre class="verbatim" id="verbatim-4">
./configure&#x00A0;--with-psblas=/opt/psblas-3.7/&#x00A0;\ ./configure&#x00A0;--with-psblas=/opt/psblas-3.9/&#x00A0;\
--with-umfpackincdir=/usr/include/suitesparse/ --with-umfpackincdir=/usr/include/suitesparse/
</pre> </pre>
<!--l. 172--><p class="nopar" > <span <!--l. 172--><p class="nopar" > <span

@ -351,7 +351,7 @@ class="content">Preconditioner types, corresponding strings and default choices.
</div><hr class="endfloat" /> </div><hr class="endfloat" />
</div> </div>
<!--l. 98--><p class="indent" > <span <!--l. 97--><p class="indent" > <span
class="cmr-12">Note that the module </span><code class="lstinline"><span style="color:#000000">amg_prec_mod</span></code><span class="cmr-12">Note that the module </span><code class="lstinline"><span style="color:#000000">amg_prec_mod</span></code><span
class="cmr-12">, containing the definition of the preconditioner</span> class="cmr-12">, containing the definition of the preconditioner</span>
<span <span
@ -370,7 +370,7 @@ class="cmr-12">4.1</span><!--tex4ht:ref: sec:examples --></a><span
class="cmr-12">).</span> class="cmr-12">).</span>
<br <br
class="newline" /> class="newline" />
<!--l. 105--><p class="indent" > <span <!--l. 104--><p class="indent" > <span
class="cmbx-12">Remark 1. </span><span class="cmbx-12">Remark 1. </span><span
class="cmr-12">Coarsest-level solvers based on the LU factorization, such as those</span> class="cmr-12">Coarsest-level solvers based on the LU factorization, such as those</span>
<span <span
@ -385,11 +385,24 @@ class="cmr-12">problems. However, this does not necessarily correspond to the sh
<span <span
class="cmr-12">on parallel</span><span class="cmr-12">on parallel</span><span
class="cmr-12">&#x00A0;computers.</span> class="cmr-12">&#x00A0;computers.</span>
<!--l. 112--><p class="indent" > <span
class="cmbx-12">Remark 2. </span><span
class="cmr-12">Memory allocation on GPUs is a costly operation implying a</span>
<span
class="cmr-12">synchronization; therefore, it is convenient to preallocate internal preconditioner</span>
<span
class="cmr-12">workspace with the method </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">prec%allocate_wrk(info)</span></span></span> <span
class="cmr-12">before invoking an iterative</span>
<span
class="cmr-12">method, and release it upon exit with </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">prec%deallocate_wrk(info)</span></span></span><span
class="cmr-12">.</span>
<h4 class="subsectionHead"><span class="titlemark"><span <h4 class="subsectionHead"><span class="titlemark"><span
class="cmr-12">4.1 </span></span> <a class="cmr-12">4.1 </span></span> <a
id="x7-140004.1"></a><span id="x7-140004.1"></a><span
class="cmr-12">Examples</span></h4> class="cmr-12">Examples</span></h4>
<!--l. 116--><p class="noindent" ><span <!--l. 121--><p class="noindent" ><span
class="cmr-12">The code reported in Figure</span><span class="cmr-12">The code reported in Figure</span><span
class="cmr-12">&#x00A0;</span><a class="cmr-12">&#x00A0;</span><a
href="#x7-14001r1"><span href="#x7-14001r1"><span
@ -418,7 +431,7 @@ class="cmr-12">and </span><code class="lstinline"><span style="color:#000000">ps
class="cmr-12">must be used by the example</span> class="cmr-12">must be used by the example</span>
<span <span
class="cmr-12">program.</span> class="cmr-12">program.</span>
<!--l. 126--><p class="indent" > <span <!--l. 131--><p class="indent" > <span
class="cmr-12">The part of the code dealing with reading and assembling the sparse matrix and the</span> class="cmr-12">The part of the code dealing with reading and assembling the sparse matrix and the</span>
<span <span
class="cmr-12">right-hand side vector and the deallocation of the relevant data structures, performed</span> class="cmr-12">right-hand side vector and the deallocation of the relevant data structures, performed</span>
@ -451,7 +464,7 @@ href="userhtmlli3.html#XPSBLASGUIDE"><span
class="cmr-12">21</span></a><span class="cmr-12">21</span></a><span
class="cmr-12">]</span></span><span class="cmr-12">]</span></span><span
class="cmr-12">.</span> class="cmr-12">.</span>
<!--l. 138--><p class="indent" > <span <!--l. 143--><p class="indent" > <span
class="cmr-12">The setup and application of the default multilevel preconditioner for the real single</span> class="cmr-12">The setup and application of the default multilevel preconditioner for the real single</span>
<span <span
class="cmr-12">precision and the complex, single and double precision, versions are obtained</span> class="cmr-12">precision and the complex, single and double precision, versions are obtained</span>
@ -461,6 +474,9 @@ class="cmr-12">&#x00A0;</span><a
href="userhtmlse5.html#x8-160005"><span href="userhtmlse5.html#x8-160005"><span
class="cmr-12">5</span><!--tex4ht:ref: sec:userinterface --></a> <span class="cmr-12">5</span><!--tex4ht:ref: sec:userinterface --></a> <span
class="cmr-12">for</span> class="cmr-12">for</span>
<span <span
class="cmr-12">details). If these versions are installed, the corresponding codes are available in</span> class="cmr-12">details). If these versions are installed, the corresponding codes are available in</span>
<span class="obeylines-h"><span class="verb"><span <span class="obeylines-h"><span class="verb"><span
@ -470,7 +486,7 @@ class="cmr-12">.</span>
<!--l. 144--><p class="indent" > <a <!--l. 148--><p class="indent" > <a
id="x7-14001r1"></a><hr class="float"><div class="float" id="x7-14001r1"></a><hr class="float"><div class="float"
> >
@ -478,7 +494,7 @@ class="cmr-12">.</span>
<div class="center" <div class="center"
> >
<!--l. 145--><p class="noindent" > <!--l. 149--><p class="noindent" >
@ -535,7 +551,7 @@ class="cmr-12">.</span>
&#x00A0;&#x00A0;call&#x00A0;psb_exit(ctxt) &#x00A0;&#x00A0;call&#x00A0;psb_exit(ctxt)
&#x00A0;&#x00A0;stop &#x00A0;&#x00A0;stop
</pre> </pre>
<!--l. 255--><p class="nopar" > </div> <!--l. 259--><p class="nopar" > </div>
@ -548,7 +564,7 @@ class="content">setup and application of the default multilevel preconditioner (
</div><hr class="endfloat" /> </div><hr class="endfloat" />
<!--l. 264--><p class="indent" > <span <!--l. 267--><p class="indent" > <span
class="cmr-12">Different versions of the multilevel preconditioner can be obtained by changing the</span> class="cmr-12">Different versions of the multilevel preconditioner can be obtained by changing the</span>
<span <span
class="cmr-12">default values of the preconditioner parameters. The code reported in Figure</span><span class="cmr-12">default values of the preconditioner parameters. The code reported in Figure</span><span
@ -557,42 +573,40 @@ href="#x7-14002r2"><span
class="cmr-12">2</span><!--tex4ht:ref: fig:ex2 --></a> <span class="cmr-12">2</span><!--tex4ht:ref: fig:ex2 --></a> <span
class="cmr-12">shows</span> class="cmr-12">shows</span>
<span <span
class="cmr-12">how to set a V-cycle preconditioner which applies 1 block-Jacobi sweep as pre-</span> class="cmr-12">how to set a V-cycle preconditioner which applies 1 block-Jacobi sweep as pre- and</span>
<span <span
class="cmr-12">and post-smoother, and solves the coarsest-level system with 8 block-Jacobi</span> class="cmr-12">post-smoother, and solves the coarsest-level system with 8 block-Jacobi sweeps. Note</span>
<span <span
class="cmr-12">sweeps. Note that the ILU(0) factorization (plus triangular solve) is used as</span> class="cmr-12">that the ILU(0) factorization (plus triangular solve) is used as local solver for the</span>
<span <span
class="cmr-12">local solver for the block-Jacobi sweeps, since this is the default associated</span> class="cmr-12">block-Jacobi sweeps, since this is the default associated with block-Jacobi and set</span>
<span <span
class="cmr-12">with block-Jacobi and set by</span><span class="cmr-12">by</span><span
class="cmr-12">&#x00A0;</span><code class="lstinline"><span style="color:#000000">P</span><span style="color:#000000">%</span><span style="color:#000000">init</span></code><span class="cmr-12">&#x00A0;</span><code class="lstinline"><span style="color:#000000">P</span><span style="color:#000000">%</span><span style="color:#000000">init</span></code><span
class="cmr-12">. Furthermore, specifying block-Jacobi as</span> class="cmr-12">. Furthermore, specifying block-Jacobi as coarsest-level solver implies that</span>
<span
class="cmr-12">coarsest-level solver implies that the coarsest-level matrix is distributed among</span>
<span <span
class="cmr-12">the processes. Figure</span><span class="cmr-12">the coarsest-level matrix is distributed among the processes. Figure</span><span
class="cmr-12">&#x00A0;</span><a class="cmr-12">&#x00A0;</span><a
href="#x7-14003r3"><span href="#x7-14003r3"><span
class="cmr-12">3</span><!--tex4ht:ref: fig:ex3 --></a> <span class="cmr-12">3</span><!--tex4ht:ref: fig:ex3 --></a> <span
class="cmr-12">shows how to set a W-cycle preconditioner using the</span> class="cmr-12">shows how</span>
<span <span
class="cmr-12">Coarsening based on Compatible Weighted Matching, aggregates of size at</span> class="cmr-12">to set a W-cycle preconditioner using the Coarsening based on Compatible</span>
<span <span
class="cmr-12">most 8 and smoothed prolongators. It applies 2 hybrid Gauss-Seidel sweeps as</span> class="cmr-12">Weighted Matching, aggregates of size at most 8 and smoothed prolongators. It</span>
<span <span
class="cmr-12">pre- and post-smoother, and solves the coarsest-level system with the parallel</span> class="cmr-12">applies 2 hybrid Gauss-Seidel sweeps as pre- and post-smoother, and solves the</span>
<span <span
class="cmr-12">flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi</span> class="cmr-12">coarsest-level system with the parallel flexible Conjugate Gradient method (KRM)</span>
<span <span
class="cmr-12">preconditioner having ILU(0) on the blocks. Default parameters are used for stopping</span> class="cmr-12">coupled with the block-Jacobi preconditioner having ILU(0) on the blocks, with</span>
<span <span
class="cmr-12">criterion of the coarsest solver. Note that, also in this case, specifying KRM as</span> class="cmr-12">default parameters used for the coarsest solver. Note that specifying KRM as</span>
<span <span
class="cmr-12">coarsest-level solver implies that the coarsest-level matrix is distributed among the</span> class="cmr-12">coarsest-level solver implies that the coarsest-level matrix is distributed among the</span>
<span <span
class="cmr-12">processes.</span> class="cmr-12">processes.</span>
<!--l. 291--><p class="indent" > <span <!--l. 299--><p class="indent" > <span
class="cmr-12">The code fragments shown in Figures</span><span class="cmr-12">The code fragments shown in Figures</span><span
class="cmr-12">&#x00A0;</span><a class="cmr-12">&#x00A0;</span><a
href="#x7-14002r2"><span href="#x7-14002r2"><span
@ -605,7 +619,7 @@ class="cmr-12">are included in the example program</span>
class="cmr-12">file </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">file </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">amg_dexample_ml.f90</span></span></span> <span class="cmtt-12">amg_dexample_ml.f90</span></span></span> <span
class="cmr-12">too.</span> class="cmr-12">too.</span>
<!--l. 294--><p class="indent" > <span <!--l. 302--><p class="indent" > <span
class="cmr-12">Finally, Figure</span><span class="cmr-12">Finally, Figure</span><span
class="cmr-12">&#x00A0;</span><a class="cmr-12">&#x00A0;</span><a
href="#x7-14004r4"><span href="#x7-14004r4"><span
@ -620,7 +634,7 @@ class="cmr-12">nonsymmetric. The corresponding example program is available in t
<span class="obeylines-h"><span class="verb"><span <span class="obeylines-h"><span class="verb"><span
class="cmtt-12">amg_dexample_1lev.f90</span></span></span><span class="cmtt-12">amg_dexample_1lev.f90</span></span></span><span
class="cmr-12">.</span> class="cmr-12">.</span>
<!--l. 301--><p class="indent" > <span <!--l. 309--><p class="indent" > <span
class="cmr-12">For all the previous preconditioners, example programs where the sparse matrix</span> class="cmr-12">For all the previous preconditioners, example programs where the sparse matrix</span>
<span <span
class="cmr-12">and the right-hand side are generated by discretizing a PDE with Dirichlet</span> class="cmr-12">and the right-hand side are generated by discretizing a PDE with Dirichlet</span>
@ -631,7 +645,7 @@ class="cmr-12">.</span>
<!--l. 304--><p class="indent" > <a <!--l. 312--><p class="indent" > <a
id="x7-14002r2"></a><hr class="float"><div class="float" id="x7-14002r2"></a><hr class="float"><div class="float"
> >
@ -639,7 +653,7 @@ class="cmr-12">.</span>
<div class="center" <div class="center"
> >
<!--l. 318--><p class="noindent" > <!--l. 326--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-8"> <div class="minipage"><pre class="verbatim" id="verbatim-8">
...&#x00A0;... ...&#x00A0;...
!&#x00A0;build&#x00A0;a&#x00A0;V-cycle&#x00A0;preconditioner&#x00A0;with&#x00A0;1&#x00A0;block-Jacobi&#x00A0;sweep&#x00A0;(with !&#x00A0;build&#x00A0;a&#x00A0;V-cycle&#x00A0;preconditioner&#x00A0;with&#x00A0;1&#x00A0;block-Jacobi&#x00A0;sweep&#x00A0;(with
@ -653,7 +667,7 @@ class="cmr-12">.</span>
&#x00A0;&#x00A0;call&#x00A0;P%smoothers_build(A,desc_A,info) &#x00A0;&#x00A0;call&#x00A0;P%smoothers_build(A,desc_A,info)
...&#x00A0;... ...&#x00A0;...
</pre> </pre>
<!--l. 333--><p class="nopar" > </div></div> <!--l. 341--><p class="nopar" > </div></div>
<br /><div class="caption" <br /><div class="caption"
><span class="id">Listing 2: </span><span ><span class="id">Listing 2: </span><span
class="content">setup of a multilevel preconditioner based on the default decoupled coarsening</span></div><!--tex4ht:label?: x7-14002r2 --> class="content">setup of a multilevel preconditioner based on the default decoupled coarsening</span></div><!--tex4ht:label?: x7-14002r2 -->
@ -664,7 +678,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
<!--l. 340--><p class="indent" > <a <!--l. 348--><p class="indent" > <a
id="x7-14003r3"></a><hr class="float"><div class="float" id="x7-14003r3"></a><hr class="float"><div class="float"
> >
@ -672,7 +686,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
<div class="center" <div class="center"
> >
<!--l. 362--><p class="noindent" > <!--l. 370--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-9"> <div class="minipage"><pre class="verbatim" id="verbatim-9">
...&#x00A0;... ...&#x00A0;...
!&#x00A0;build&#x00A0;a&#x00A0;W-cycle&#x00A0;preconditioner&#x00A0;with&#x00A0;2&#x00A0;hybrid&#x00A0;Gauss-Seidel&#x00A0;sweeps !&#x00A0;build&#x00A0;a&#x00A0;W-cycle&#x00A0;preconditioner&#x00A0;with&#x00A0;2&#x00A0;hybrid&#x00A0;Gauss-Seidel&#x00A0;sweeps
@ -692,7 +706,7 @@ class="content">setup of a multilevel preconditioner based on the default decoup
&#x00A0;&#x00A0;call&#x00A0;P%smoothers_build(A,desc_A,info) &#x00A0;&#x00A0;call&#x00A0;P%smoothers_build(A,desc_A,info)
...&#x00A0;... ...&#x00A0;...
</pre> </pre>
<!--l. 383--><p class="nopar" > </div></div> <!--l. 391--><p class="nopar" > </div></div>
<br /> <div class="caption" <br /> <div class="caption"
><span class="id">Listing 3: </span><span ><span class="id">Listing 3: </span><span
class="content">setup of a multilevel preconditioner based on the coupled coarsening using class="content">setup of a multilevel preconditioner based on the coupled coarsening using
@ -704,7 +718,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
<!--l. 390--><p class="indent" > <a <!--l. 398--><p class="indent" > <a
id="x7-14004r4"></a><hr class="float"><div class="float" id="x7-14004r4"></a><hr class="float"><div class="float"
> >
@ -712,7 +726,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
<div class="center" <div class="center"
> >
<!--l. 402--><p class="noindent" > <!--l. 410--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-10"> <div class="minipage"><pre class="verbatim" id="verbatim-10">
...&#x00A0;... ...&#x00A0;...
!&#x00A0;set&#x00A0;RAS&#x00A0;with&#x00A0;overlap&#x00A0;2&#x00A0;and&#x00A0;ILU(0)&#x00A0;on&#x00A0;the&#x00A0;local&#x00A0;blocks !&#x00A0;set&#x00A0;RAS&#x00A0;with&#x00A0;overlap&#x00A0;2&#x00A0;and&#x00A0;ILU(0)&#x00A0;on&#x00A0;the&#x00A0;local&#x00A0;blocks
@ -723,7 +737,7 @@ weighted matching</span></div><!--tex4ht:label?: x7-14003r3 -->
!&#x00A0;solve&#x00A0;Ax=b&#x00A0;with&#x00A0;preconditioned&#x00A0;BiCGSTAB !&#x00A0;solve&#x00A0;Ax=b&#x00A0;with&#x00A0;preconditioned&#x00A0;BiCGSTAB
&#x00A0;&#x00A0;call&#x00A0;psb_krylov(&#8217;BICGSTAB&#8217;,A,P,b,x,tol,desc_A,info) &#x00A0;&#x00A0;call&#x00A0;psb_krylov(&#8217;BICGSTAB&#8217;,A,P,b,x,tol,desc_A,info)
</pre> </pre>
<!--l. 414--><p class="nopar" > </div></div> <!--l. 422--><p class="nopar" > </div></div>
<br /> <div class="caption" <br /> <div class="caption"
><span class="id">Listing 4: </span><span ><span class="id">Listing 4: </span><span
class="content">setup of a one-level Schwarz preconditioner.</span></div><!--tex4ht:label?: x7-14004r4 --> class="content">setup of a one-level Schwarz preconditioner.</span></div><!--tex4ht:label?: x7-14004r4 -->
@ -735,7 +749,7 @@ class="content">setup of a one-level Schwarz preconditioner.</span></div><!--tex
class="cmr-12">4.2 </span></span> <a class="cmr-12">4.2 </span></span> <a
id="x7-150004.2"></a><span id="x7-150004.2"></a><span
class="cmr-12">GPU example</span></h4> class="cmr-12">GPU example</span></h4>
<!--l. 426--><p class="noindent" ><span <!--l. 434--><p class="noindent" ><span
class="cmr-12">The code discussed here shows how to set up a program exploiting the combined GPU</span> class="cmr-12">The code discussed here shows how to set up a program exploiting the combined GPU</span>
<span <span
class="cmr-12">capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the</span> class="cmr-12">capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the</span>
@ -743,14 +757,14 @@ class="cmr-12">capabilities of PSBLAS and AMG4PSBLAS. The code example is availa
class="cmr-12">source distribution directory </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">source distribution directory </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">amg4psblas/examples/gpu</span></span></span><span class="cmtt-12">amg4psblas/examples/gpu</span></span></span><span
class="cmr-12">.</span> class="cmr-12">.</span>
<!--l. 431--><p class="indent" > <span <!--l. 439--><p class="indent" > <span
class="cmr-12">First of all, we need to include the appropriate modules and declare some auxiliary</span> class="cmr-12">First of all, we need to include the appropriate modules and declare some auxiliary</span>
<span <span
class="cmr-12">variables:</span> class="cmr-12">variables:</span>
<!--l. 433--><p class="indent" > <a <!--l. 441--><p class="indent" > <a
id="x7-15001r5"></a><hr class="float"><div class="float" id="x7-15001r5"></a><hr class="float"><div class="float"
> >
@ -758,7 +772,7 @@ class="cmr-12">variables:</span>
<div class="center" <div class="center"
> >
<!--l. 452--><p class="noindent" > <!--l. 460--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-11"> <div class="minipage"><pre class="verbatim" id="verbatim-11">
program&#x00A0;amg_dexample_gpu program&#x00A0;amg_dexample_gpu
&#x00A0;&#x00A0;use&#x00A0;psb_base_mod &#x00A0;&#x00A0;use&#x00A0;psb_base_mod
@ -777,7 +791,7 @@ program&#x00A0;amg_dexample_gpu
&#x00A0; &#x00A0;
</pre> </pre>
<!--l. 471--><p class="nopar" > </div></div> <!--l. 479--><p class="nopar" > </div></div>
<br /> <div class="caption" <br /> <div class="caption"
><span class="id">Listing 5: </span><span ><span class="id">Listing 5: </span><span
class="content">setup of a GPU-enabled test program part one.</span></div><!--tex4ht:label?: x7-15001r5 --> class="content">setup of a GPU-enabled test program part one.</span></div><!--tex4ht:label?: x7-15001r5 -->
@ -785,7 +799,7 @@ class="content">setup of a GPU-enabled test program part one.</span></div><!--te
</div><hr class="endfloat" /> </div><hr class="endfloat" />
<!--l. 478--><p class="indent" > <span <!--l. 486--><p class="indent" > <span
class="cmr-12">In this particular example we are choosing to employ a </span><span class="obeylines-h"><span class="verb"><span class="cmr-12">In this particular example we are choosing to employ a </span><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">HLG</span></span></span> <span class="cmtt-12">HLG</span></span></span> <span
class="cmr-12">data structure for</span> class="cmr-12">data structure for</span>
@ -793,14 +807,14 @@ class="cmr-12">data structure for</span>
class="cmr-12">sparse matrices on GPUs; for more information please refer to the PSBLAS users&#8217;</span> class="cmr-12">sparse matrices on GPUs; for more information please refer to the PSBLAS users&#8217;</span>
<span <span
class="cmr-12">guide.</span> class="cmr-12">guide.</span>
<!--l. 482--><p class="indent" > <span <!--l. 490--><p class="indent" > <span
class="cmr-12">We then have to initialize the GPU environment, and pass the appropriate MOLD</span> class="cmr-12">We then have to initialize the GPU environment, and pass the appropriate MOLD</span>
<span <span
class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217; guide).</span> class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217; guide).</span>
<!--l. 485--><p class="indent" > <a <!--l. 493--><p class="indent" > <a
id="x7-15002r6"></a><hr class="float"><div class="float" id="x7-15002r6"></a><hr class="float"><div class="float"
> >
@ -808,7 +822,7 @@ class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217;
<div class="center" <div class="center"
> >
<!--l. 501--><p class="noindent" > <!--l. 509--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-12"> <div class="minipage"><pre class="verbatim" id="verbatim-12">
&#x00A0;&#x00A0;call&#x00A0;psb_init(ctxt) &#x00A0;&#x00A0;call&#x00A0;psb_init(ctxt)
&#x00A0;&#x00A0;call&#x00A0;psb_info(ctxt,iam,np) &#x00A0;&#x00A0;call&#x00A0;psb_info(ctxt,iam,np)
@ -823,7 +837,7 @@ class="cmr-12">variables to the build methods (see also the PSBLAS users&#8217;
&#x00A0; &#x00A0;
</pre> </pre>
<!--l. 516--><p class="nopar" > </div></div> <!--l. 524--><p class="nopar" > </div></div>
<br /> <div class="caption" <br /> <div class="caption"
><span class="id">Listing 6: </span><span ><span class="id">Listing 6: </span><span
class="content">setup of a GPU-enabled test program part two.</span></div><!--tex4ht:label?: x7-15002r6 --> class="content">setup of a GPU-enabled test program part two.</span></div><!--tex4ht:label?: x7-15002r6 -->
@ -831,7 +845,7 @@ class="content">setup of a GPU-enabled test program part two.</span></div><!--te
</div><hr class="endfloat" /> </div><hr class="endfloat" />
<!--l. 523--><p class="indent" > <span <!--l. 531--><p class="indent" > <span
class="cmr-12">Finally, we convert the input matrix, the descriptor and the vectors to use a</span> class="cmr-12">Finally, we convert the input matrix, the descriptor and the vectors to use a</span>
<span <span
class="cmr-12">GPU-enabled internal storage format. We then preallocate the preconditioner</span> class="cmr-12">GPU-enabled internal storage format. We then preallocate the preconditioner</span>
@ -842,7 +856,7 @@ class="cmr-12">GPU environment</span>
<!--l. 527--><p class="indent" > <a <!--l. 535--><p class="indent" > <a
id="x7-15003r7"></a><hr class="float"><div class="float" id="x7-15003r7"></a><hr class="float"><div class="float"
> >
@ -850,7 +864,7 @@ class="cmr-12">GPU environment</span>
<div class="center" <div class="center"
> >
<!--l. 557--><p class="noindent" > <!--l. 565--><p class="noindent" >
<div class="minipage"><pre class="verbatim" id="verbatim-13"> <div class="minipage"><pre class="verbatim" id="verbatim-13">
&#x00A0;&#x00A0;call&#x00A0;desc_a%cnv(mold=igmold) &#x00A0;&#x00A0;call&#x00A0;desc_a%cnv(mold=igmold)
&#x00A0;&#x00A0;call&#x00A0;a%cscnv(info,mold=agmold) &#x00A0;&#x00A0;call&#x00A0;a%cscnv(info,mold=agmold)
@ -877,7 +891,7 @@ class="cmr-12">GPU environment</span>
&#x00A0; &#x00A0;
</pre> </pre>
<!--l. 584--><p class="nopar" > </div></div> <!--l. 592--><p class="nopar" > </div></div>
<br /> <div class="caption" <br /> <div class="caption"
><span class="id">Listing 7: </span><span ><span class="id">Listing 7: </span><span
class="content">setup of a GPU-enabled test program part three.</span></div><!--tex4ht:label?: x7-15003r7 --> class="content">setup of a GPU-enabled test program part three.</span></div><!--tex4ht:label?: x7-15003r7 -->
@ -885,7 +899,7 @@ class="content">setup of a GPU-enabled test program part three.</span></div><!--
</div><hr class="endfloat" /> </div><hr class="endfloat" />
<!--l. 592--><p class="indent" > <span <!--l. 600--><p class="indent" > <span
class="cmr-12">It is very important to employ smoothers and coarsest solvers that are suited to the</span> class="cmr-12">It is very important to employ smoothers and coarsest solvers that are suited to the</span>
<span <span
class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that</span> class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that</span>
@ -893,30 +907,30 @@ class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kern
class="cmr-12">satisfy this constraint include:</span> class="cmr-12">satisfy this constraint include:</span>
<ul class="itemize1"> <ul class="itemize1">
<li class="itemize"> <li class="itemize">
<!--l. 596--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span <!--l. 604--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">JACOBI</span></span></span> class="cmtt-12">JACOBI</span></span></span>
</li> </li>
<li class="itemize"> <li class="itemize">
<!--l. 597--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span <!--l. 605--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">BJAC</span></span></span> <span class="cmtt-12">BJAC</span></span></span> <span
class="cmr-12">with the following methods on the local blocks:</span> class="cmr-12">with the following methods on the local blocks:</span>
<ul class="itemize2"> <ul class="itemize2">
<li class="itemize"> <li class="itemize">
<!--l. 599--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span <!--l. 607--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">INVK</span></span></span> class="cmtt-12">INVK</span></span></span>
</li> </li>
<li class="itemize"> <li class="itemize">
<!--l. 600--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span <!--l. 608--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">INVT</span></span></span> class="cmtt-12">INVT</span></span></span>
</li> </li>
<li class="itemize"> <li class="itemize">
<!--l. 601--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span <!--l. 609--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">AINV</span></span></span></li></ul> class="cmtt-12">AINV</span></span></span></li></ul>
</li> </li>
<li class="itemize"> <li class="itemize">
<!--l. 603--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span <!--l. 611--><p class="noindent" ><span class="obeylines-h"><span class="verb"><span
class="cmtt-12">POLY</span></span></span></li></ul> class="cmtt-12">POLY</span></span></span></li></ul>
<!--l. 605--><p class="noindent" ><span <!--l. 613--><p class="noindent" ><span
class="cmr-12">and their </span><span class="cmr-12">and their </span><span
class="cmmi-12">&#x2113;</span><sub><span class="cmmi-12">&#x2113;</span><sub><span
class="cmr-8">1</span></sub> <span class="cmr-8">1</span></sub> <span

File diff suppressed because it is too large Load Diff

@ -64,6 +64,10 @@ class="cmr-12">.</span>
<!--l. 148--><p class="indent" >

@ -38,46 +38,47 @@ class="cmr-12">AMG4PSBLAS is freely distributable under the following copyright
<pre class="verbatim" id="verbatim-15"> <pre class="verbatim" id="verbatim-15">
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;AMG4PSBLAS&#x00A0;&#x00A0;version&#x00A0;1.0
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Algebraic&#x00A0;MultiGrid&#x00A0;Preconditioners&#x00A0;Package
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;based&#x00A0;on&#x00A0;PSBLAS&#x00A0;(Parallel&#x00A0;Sparse&#x00A0;BLAS&#x00A0;version&#x00A0;3.7)
&#x00A0;&#x00A0;(C)&#x00A0;Copyright&#x00A0;2021 &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;AMG4PSBLAS&#x00A0;version&#x00A0;1.2
&#x00A0;&#x00A0;&#x00A0;&#x00A0;Algebraic&#x00A0;Multigrid&#x00A0;Package
&#x00A0;&#x00A0;Pasqua&#x00A0;D&#8217;Ambra&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;IAC-CNR,&#x00A0;IT &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;based&#x00A0;on&#x00A0;PSBLAS&#x00A0;(Parallel&#x00A0;Sparse&#x00A0;BLAS&#x00A0;version&#x00A0;3.9)
&#x00A0;&#x00A0;Fabio&#x00A0;Durastante&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;University&#x00A0;of&#x00A0;Pisa&#x00A0;and&#x00A0;IAC-CNR,&#x00A0;IT
&#x00A0;&#x00A0;Salvatore&#x00A0;Filippone&#x00A0;&#x00A0;&#x00A0;&#x00A0;University&#x00A0;of&#x00A0;Rome&#x00A0;Tor-Vergata&#x00A0;and&#x00A0;IAC-CNR,&#x00A0;IT &#x00A0;&#x00A0;&#x00A0;&#x00A0;(C)&#x00A0;Copyright&#x00A0;2025
&#x00A0;&#x00A0;Redistribution&#x00A0;and&#x00A0;use&#x00A0;in&#x00A0;source&#x00A0;and&#x00A0;binary&#x00A0;forms,&#x00A0;with&#x00A0;or&#x00A0;without &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Salvatore&#x00A0;Filippone
&#x00A0;&#x00A0;modification,&#x00A0;are&#x00A0;permitted&#x00A0;provided&#x00A0;that&#x00A0;the&#x00A0;following&#x00A0;conditions &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Pasqua&#x00A0;D&#8217;Ambra
&#x00A0;&#x00A0;are&#x00A0;met: &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;Fabio&#x00A0;Durastante
&#x00A0;&#x00A0;&#x00A0;&#x00A0;1.&#x00A0;Redistributions&#x00A0;of&#x00A0;source&#x00A0;code&#x00A0;must&#x00A0;retain&#x00A0;the&#x00A0;above&#x00A0;copyright
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer. &#x00A0;&#x00A0;&#x00A0;&#x00A0;Redistribution&#x00A0;and&#x00A0;use&#x00A0;in&#x00A0;source&#x00A0;and&#x00A0;binary&#x00A0;forms,&#x00A0;with&#x00A0;or&#x00A0;without
&#x00A0;&#x00A0;&#x00A0;&#x00A0;2.&#x00A0;Redistributions&#x00A0;in&#x00A0;binary&#x00A0;form&#x00A0;must&#x00A0;reproduce&#x00A0;the&#x00A0;above&#x00A0;copyright &#x00A0;&#x00A0;&#x00A0;&#x00A0;modification,&#x00A0;are&#x00A0;permitted&#x00A0;provided&#x00A0;that&#x00A0;the&#x00A0;following&#x00A0;conditions
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions,&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer&#x00A0;in&#x00A0;the &#x00A0;&#x00A0;&#x00A0;&#x00A0;are&#x00A0;met:
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;documentation&#x00A0;and/or&#x00A0;other&#x00A0;materials&#x00A0;provided&#x00A0;with&#x00A0;the&#x00A0;distribution. &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;1.&#x00A0;Redistributions&#x00A0;of&#x00A0;source&#x00A0;code&#x00A0;must&#x00A0;retain&#x00A0;the&#x00A0;above&#x00A0;copyright
&#x00A0;&#x00A0;&#x00A0;&#x00A0;3.&#x00A0;The&#x00A0;name&#x00A0;of&#x00A0;the&#x00A0;MLD2P4&#x00A0;group&#x00A0;or&#x00A0;the&#x00A0;names&#x00A0;of&#x00A0;its&#x00A0;contributors&#x00A0;may &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer.
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;not&#x00A0;be&#x00A0;used&#x00A0;to&#x00A0;endorse&#x00A0;or&#x00A0;promote&#x00A0;products&#x00A0;derived&#x00A0;from&#x00A0;this &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;2.&#x00A0;Redistributions&#x00A0;in&#x00A0;binary&#x00A0;form&#x00A0;must&#x00A0;reproduce&#x00A0;the&#x00A0;above&#x00A0;copyright
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;software&#x00A0;without&#x00A0;specific&#x00A0;written&#x00A0;permission. &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;notice,&#x00A0;this&#x00A0;list&#x00A0;of&#x00A0;conditions,&#x00A0;and&#x00A0;the&#x00A0;following&#x00A0;disclaimer&#x00A0;in&#x00A0;the
&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;documentation&#x00A0;and/or&#x00A0;other&#x00A0;materials&#x00A0;provided&#x00A0;with&#x00A0;the&#x00A0;distribution.
&#x00A0;&#x00A0;THIS&#x00A0;SOFTWARE&#x00A0;IS&#x00A0;PROVIDED&#x00A0;BY&#x00A0;THE&#x00A0;COPYRIGHT&#x00A0;HOLDERS&#x00A0;AND&#x00A0;CONTRIBUTORS &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;3.&#x00A0;The&#x00A0;name&#x00A0;of&#x00A0;the&#x00A0;AMG4PSBLAS&#x00A0;group&#x00A0;or&#x00A0;the&#x00A0;names&#x00A0;of&#x00A0;its&#x00A0;contributors&#x00A0;may
&#x00A0;&#x00A0;&#8216;&#8216;AS&#x00A0;IS&#8217;&#8217;&#x00A0;AND&#x00A0;ANY&#x00A0;EXPRESS&#x00A0;OR&#x00A0;IMPLIED&#x00A0;WARRANTIES,&#x00A0;INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;not&#x00A0;be&#x00A0;used&#x00A0;to&#x00A0;endorse&#x00A0;or&#x00A0;promote&#x00A0;products&#x00A0;derived&#x00A0;from&#x00A0;this
&#x00A0;&#x00A0;TO,&#x00A0;THE&#x00A0;IMPLIED&#x00A0;WARRANTIES&#x00A0;OF&#x00A0;MERCHANTABILITY&#x00A0;AND&#x00A0;FITNESS&#x00A0;FOR&#x00A0;A&#x00A0;PARTICULAR &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;&#x00A0;software&#x00A0;without&#x00A0;specific&#x00A0;written&#x00A0;permission.
&#x00A0;&#x00A0;PURPOSE&#x00A0;ARE&#x00A0;DISCLAIMED.&#x00A0;IN&#x00A0;NO&#x00A0;EVENT&#x00A0;SHALL&#x00A0;THE&#x00A0;MLD2P4&#x00A0;GROUP&#x00A0;OR&#x00A0;ITS&#x00A0;CONTRIBUTORS
&#x00A0;&#x00A0;BE&#x00A0;LIABLE&#x00A0;FOR&#x00A0;ANY&#x00A0;DIRECT,&#x00A0;INDIRECT,&#x00A0;INCIDENTAL,&#x00A0;SPECIAL,&#x00A0;EXEMPLARY,&#x00A0;OR &#x00A0;&#x00A0;&#x00A0;&#x00A0;THIS&#x00A0;SOFTWARE&#x00A0;IS&#x00A0;PROVIDED&#x00A0;BY&#x00A0;THE&#x00A0;COPYRIGHT&#x00A0;HOLDERS&#x00A0;AND&#x00A0;CONTRIBUTORS
&#x00A0;&#x00A0;CONSEQUENTIAL&#x00A0;DAMAGES&#x00A0;(INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED&#x00A0;TO,&#x00A0;PROCUREMENT&#x00A0;OF &#x00A0;&#x00A0;&#x00A0;&#x00A0;&#8216;&#8216;AS&#x00A0;IS&#8217;&#8217;&#x00A0;AND&#x00A0;ANY&#x00A0;EXPRESS&#x00A0;OR&#x00A0;IMPLIED&#x00A0;WARRANTIES,&#x00A0;INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED
&#x00A0;&#x00A0;SUBSTITUTE&#x00A0;GOODS&#x00A0;OR&#x00A0;SERVICES;&#x00A0;LOSS&#x00A0;OF&#x00A0;USE,&#x00A0;DATA,&#x00A0;OR&#x00A0;PROFITS;&#x00A0;OR&#x00A0;BUSINESS &#x00A0;&#x00A0;&#x00A0;&#x00A0;TO,&#x00A0;THE&#x00A0;IMPLIED&#x00A0;WARRANTIES&#x00A0;OF&#x00A0;MERCHANTABILITY&#x00A0;AND&#x00A0;FITNESS&#x00A0;FOR&#x00A0;A&#x00A0;PARTICULAR
&#x00A0;&#x00A0;INTERRUPTION)&#x00A0;HOWEVER&#x00A0;CAUSED&#x00A0;AND&#x00A0;ON&#x00A0;ANY&#x00A0;THEORY&#x00A0;OF&#x00A0;LIABILITY,&#x00A0;WHETHER&#x00A0;IN &#x00A0;&#x00A0;&#x00A0;&#x00A0;PURPOSE&#x00A0;ARE&#x00A0;DISCLAIMED.&#x00A0;IN&#x00A0;NO&#x00A0;EVENT&#x00A0;SHALL&#x00A0;THE&#x00A0;AMG4PSBLAS&#x00A0;GROUP&#x00A0;OR&#x00A0;ITS&#x00A0;CONTRIBUTORS
&#x00A0;&#x00A0;CONTRACT,&#x00A0;STRICT&#x00A0;LIABILITY,&#x00A0;OR&#x00A0;TORT&#x00A0;(INCLUDING&#x00A0;NEGLIGENCE&#x00A0;OR&#x00A0;OTHERWISE) &#x00A0;&#x00A0;&#x00A0;&#x00A0;BE&#x00A0;LIABLE&#x00A0;FOR&#x00A0;ANY&#x00A0;DIRECT,&#x00A0;INDIRECT,&#x00A0;INCIDENTAL,&#x00A0;SPECIAL,&#x00A0;EXEMPLARY,&#x00A0;OR
&#x00A0;&#x00A0;ARISING&#x00A0;IN&#x00A0;ANY&#x00A0;WAY&#x00A0;OUT&#x00A0;OF&#x00A0;THE&#x00A0;USE&#x00A0;OF&#x00A0;THIS&#x00A0;SOFTWARE,&#x00A0;EVEN&#x00A0;IF&#x00A0;ADVISED&#x00A0;OF&#x00A0;THE &#x00A0;&#x00A0;&#x00A0;&#x00A0;CONSEQUENTIAL&#x00A0;DAMAGES&#x00A0;(INCLUDING,&#x00A0;BUT&#x00A0;NOT&#x00A0;LIMITED&#x00A0;TO,&#x00A0;PROCUREMENT&#x00A0;OF
&#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE. &#x00A0;&#x00A0;&#x00A0;&#x00A0;SUBSTITUTE&#x00A0;GOODS&#x00A0;OR&#x00A0;SERVICES;&#x00A0;LOSS&#x00A0;OF&#x00A0;USE,&#x00A0;DATA,&#x00A0;OR&#x00A0;PROFITS;&#x00A0;OR&#x00A0;BUSINESS
&#x00A0;&#x00A0;&#x00A0;&#x00A0;INTERRUPTION)&#x00A0;HOWEVER&#x00A0;CAUSED&#x00A0;AND&#x00A0;ON&#x00A0;ANY&#x00A0;THEORY&#x00A0;OF&#x00A0;LIABILITY,&#x00A0;WHETHER&#x00A0;IN
&#x00A0;&#x00A0;&#x00A0;&#x00A0;CONTRACT,&#x00A0;STRICT&#x00A0;LIABILITY,&#x00A0;OR&#x00A0;TORT&#x00A0;(INCLUDING&#x00A0;NEGLIGENCE&#x00A0;OR&#x00A0;OTHERWISE)
&#x00A0;&#x00A0;&#x00A0;&#x00A0;ARISING&#x00A0;IN&#x00A0;ANY&#x00A0;WAY&#x00A0;OUT&#x00A0;OF&#x00A0;THE&#x00A0;USE&#x00A0;OF&#x00A0;THIS&#x00A0;SOFTWARE,&#x00A0;EVEN&#x00A0;IF&#x00A0;ADVISED&#x00A0;OF&#x00A0;THE
&#x00A0;&#x00A0;&#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE.
</pre> </pre>
<!--l. 44--><p class="nopar" > <!--l. 45--><p class="nopar" >
<!--l. 47--><p class="indent" > <span <!--l. 48--><p class="indent" > <span
class="cmr-12">AMG4PSBLAS is an evolution of MLD2P4, whose license we reproduce here to</span> class="cmr-12">AMG4PSBLAS is an evolution of MLD2P4, whose license we reproduce here to</span>
<span <span
class="cmr-12">abide by its terms:</span> class="cmr-12">abide by its terms:</span>
@ -123,7 +124,7 @@ class="cmr-12">abide by its terms:</span>
&#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE. &#x00A0;&#x00A0;POSSIBILITY&#x00A0;OF&#x00A0;SUCH&#x00A0;DAMAGE.
</pre> </pre>
<!--l. 87--><p class="nopar" > <span <!--l. 88--><p class="nopar" > <span
class="cmr-12">AMG4PSBLAS is distributed together with (a small part of) the graph-matching</span> class="cmr-12">AMG4PSBLAS is distributed together with (a small part of) the graph-matching</span>
@ -183,7 +184,7 @@ class="cmr-12">here.</span>
// //
//&#x00A0;************************************************************************ //&#x00A0;************************************************************************
</pre> </pre>
<!--l. 135--><p class="nopar" > <!--l. 136--><p class="nopar" >

@ -4,30 +4,42 @@
\fi \fi
\textsc{AMG4PSBLAS (Algebraic MultiGrid Preconditioners Package \textsc{AMG4PSBLAS (Algebraic MultiGrid Preconditioners Package
based on PSBLAS}) is a package of parallel algebraic multilevel preconditioners included in the PSCToolkit (Parallel Sparse Computation Toolkit) software framework. based on PSBLAS}) is a package of parallel algebraic multilevel
It is a progress of a software development project started in 2007, named MLD2P4, which originally implemented a preconditioners included in the PSCToolkit (Parallel Sparse
multilevel version of some domain decomposition preconditioners of additive-Schwarz type, and was based on a parallel decoupled version of the well known smoothed Computation Toolkit) software framework.
It is an evolutiuon of a software development project started in 2007,
named MLD2P4, which originally implemented a
multilevel version of some domain decomposition preconditioners of
additive-Schwarz type, and was based on a parallel decoupled version
of the well known smoothed
aggregation method to generate the multilevel hierarchy of coarser matrices. aggregation method to generate the multilevel hierarchy of coarser matrices.
In the last years, within the context of the EU-H2020 EoCoE project (Energy Oriented Center of Excellence), the package was extended for including new algorithms and In the last few years the package was extended for
functionalities for the setup and application new AMG preconditioners with the final aims of improving efficiency and scalability when tens of thousands cores are including new algorithms and
used, and of boosting reliability in dealing with general symmetric positive definite linear systems. functionalities for the setup and application new AMG preconditioners
Due to the significant number of changes and the increase in scope, we decided to rename the package as AMG4PSBLAS. with the final aims of improving efficiency and scalability when tens
of thousands cores are used, and of boosting reliability in dealing
with general symmetric positive definite linear systems; these
developments have been supported in the context of the EU-H2020 EoCoE
project (Energy Oriented Center of Excellence).
Due to the significant number of changes and the increase in scope, we
decided to rename the package as AMG4PSBLAS.
AMG4PSBLAS has been designed to provide scalable and easy-to-use preconditioners AMG4PSBLAS has been designed to provide scalable and easy-to-use
in the context of the PSBLAS (Parallel Sparse Basic Linear Algebra Subprograms) preconditioners in the context of the PSBLAS (Parallel Sparse Basic
computational framework and can be used in conjuction with the Krylov solvers Linear Algebra Subprograms) computational framework and can be used in
available in this framework. conjuction with the Krylov solvers available in this framework.
Our package is based on a completely algebraic approach; therefore Our package is based on a completely algebraic approach; therefore
users level interfaces assume that the system matrix and users level interfaces assume that the system matrix and
preconditioners are represented as PSBLAS distributed sparse matrices. preconditioners are represented as PSBLAS distributed sparse matrices.
AMG4PSBLAS enables the user to easily specify different AMG4PSBLAS enables the user to easily specify different
features of an algebraic multilevel preconditioner, thus allowing to experiment features of an algebraic multilevel preconditioner, thus allowing to
with different preconditioners for the problem and parallel computers at hand. experiment with different preconditioners for the problem and parallel
computers at hand.
The package employs object-oriented design techniques in The package employs object-oriented design techniques in
Fortran~2003, with interfaces to additional third party libraries Fortran~2003, with interfaces to additional third party libraries
such as MUMPS, UMFPACK, SuperLU, and SuperLU\_Dist, which such as MUMPS, UMFPACK, SuperLU, and SuperLU\_Dist, which
can be exploited in building multilevel preconditioners. The parallel can be exploited in building multilevel preconditioners. The parallel
implementation is based on a Single Program Multiple Data (SPMD) implementation is based on a Single Program Multiple Data (SPMD)
paradigm; the inter-process communication is based on MPI and paradigm; the inter-process communication is based on MPI and
is managed mainly through PSBLAS. is managed mainly through PSBLAS.

@ -12,20 +12,20 @@ interfaces to external libraries in C; the Fortran compiler
must support the Fortran~2003 standard plus the extension \verb|MOLD=| must support the Fortran~2003 standard plus the extension \verb|MOLD=|
feature, which enhances the usability of \verb|ALLOCATE|. feature, which enhances the usability of \verb|ALLOCATE|.
Most Fortran compilers provide this feature; in particular, this is Most Fortran compilers provide this feature; in particular, this is
supported by the GNU Fortran compiler, for which we supported by the GNU Fortran compiler, for which we recommend to use at least version 12.
recommend to use at least version 4.8.
The software defines data types and interfaces for The software defines data types and interfaces for
real and complex data, in both single and double precision. real and complex data, in both single and double precision.
Building AMG4PSBLAS requires some base libraries (see Building AMG4PSBLAS requires some base libraries (see
Section~\ref{sec:prerequisites}); interfaces to optional third-party Section~\ref{sec:prerequisites}); interfaces to optional third-party
libraries, which extend the functionalities of AMG4PSBLAS (see libraries, which extend the functionalities of AMG4PSBLAS (see
Section~\ref{sec:third-party}), are also available. A number of Linux Section~\ref{sec:third-party}), are also available. A number of Linux
distributions (e.g., Ubuntu, Fedora, CentOS) provide precompiled distributions (e.g., Ubuntu, Fedora, CentOS) provide precompiled
packages for the prerequisite and optional software. In many cases packages for the prerequisite and optional software. In many cases
these packages are split between a runtime part and a ``developer'' these packages are split between a runtime part and a ``developer''
part; in order to build AMG4PSBLAS you need both. A description of the part; in order to build AMG4PSBLAS you need both. A description of the
base and optional software used by AMG4PSBLAS is given in the next sections. base and optional software used by AMG4PSBLAS is given in the next
sections.
\subsection{Prerequisites\label{sec:prerequisites}} \subsection{Prerequisites\label{sec:prerequisites}}
@ -157,17 +157,17 @@ The full set of options may be looked at by issuing the command
\else \else
\lstinputlisting{../configureout.txt} \lstinputlisting{../configureout.txt}
\fi \fi
For instance, if a user has built and installed PSBLAS 3.7 under the For instance, if a user has built and installed PSBLAS 3.9 under the
\verb|/opt| directory and is \verb|/opt| directory and is
using the SuiteSparse package (which includes UMFPACK), then AMG4PSBLAS using the SuiteSparse package (which includes UMFPACK), then AMG4PSBLAS
might be configured with: might be configured with:
\ifpdf \ifpdf
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{console} \begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{console}
./configure --with-psblas=/opt/psblas-3.7/ --with-umfpackincdir=/usr/include/suitesparse/ ./configure --with-psblas=/opt/psblas-3.9/ --with-umfpackincdir=/usr/include/suitesparse/
\end{minted} \end{minted}
\else \else
\begin{verbatim} \begin{verbatim}
./configure --with-psblas=/opt/psblas-3.7/ \ ./configure --with-psblas=/opt/psblas-3.9/ \
--with-umfpackincdir=/usr/include/suitesparse/ --with-umfpackincdir=/usr/include/suitesparse/
\end{verbatim} \end{verbatim}
\fi \fi

@ -94,7 +94,6 @@ Multilevel &\fortinline|'ML'| & V-cycle with one hybrid forward Gauss-
\label{tab:precinit}} \label{tab:precinit}}
\end{center} \end{center}
\end{table} \end{table}
Note that the module \fortinline|amg_prec_mod|, containing the definition of the Note that the module \fortinline|amg_prec_mod|, containing the definition of the
preconditioner data type and the interfaces to the routines of AMG4PSBLAS, preconditioner data type and the interfaces to the routines of AMG4PSBLAS,
must be used in any program calling such routines. must be used in any program calling such routines.
@ -110,6 +109,12 @@ a standard discretization of basic scalar elliptic PDE problems. However,
this does not necessarily correspond to the shortest execution time this does not necessarily correspond to the shortest execution time
on parallel~computers. on parallel~computers.
\textbf{Remark 2.} Memory allocation on GPUs is a costly operation
implying a synchronization; therefore, it is convenient to preallocate
internal preconditioner workspace with the method
\verb|prec%allocate_wrk(info)| before invoking an iterative method,
and release it upon exit with \verb|prec%deallocate_wrk(info)|.
\subsection{Examples\label{sec:examples}} \subsection{Examples\label{sec:examples}}
@ -140,7 +145,6 @@ for the real single precision and the complex, single and double
precision, versions are obtained with straightforward modifications of the previous precision, versions are obtained with straightforward modifications of the previous
example (see Section~\ref{sec:userinterface} for details). If these versions are installed, example (see Section~\ref{sec:userinterface} for details). If these versions are installed,
the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|. the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|.
\begin{listing}[tbp] \begin{listing}[tbp]
\begin{center} \begin{center}
\begin{minipage}{.90\textwidth} \begin{minipage}{.90\textwidth}
@ -260,7 +264,6 @@ stop
\label{fig:ex1}} \label{fig:ex1}}
\end{center} \end{center}
\end{listing} \end{listing}
Different versions of the multilevel preconditioner can be obtained by changing Different versions of the multilevel preconditioner can be obtained by changing
the default values of the preconditioner parameters. The code reported in the default values of the preconditioner parameters. The code reported in
Figure~\ref{fig:ex2} shows how to set a V-cycle preconditioner Figure~\ref{fig:ex2} shows how to set a V-cycle preconditioner
@ -272,10 +275,15 @@ with block-Jacobi and set by~\fortinline|P%init|.
Furthermore, specifying block-Jacobi as coarsest-level Furthermore, specifying block-Jacobi as coarsest-level
solver implies that the coarsest-level matrix is distributed solver implies that the coarsest-level matrix is distributed
among the processes. among the processes.
Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using the Coarsening based on Compatible Weighted Matching, aggregates of size at most $8$ and smoothed prolongators. It applies Figure~\ref{fig:ex3} shows how to set a W-cycle preconditioner using
the Coarsening based on Compatible Weighted Matching, aggregates of
size at most $8$ and smoothed prolongators. It applies
2 hybrid Gauss-Seidel sweeps as pre- and post-smoother, 2 hybrid Gauss-Seidel sweeps as pre- and post-smoother,
and solves the coarsest-level system with the parallel flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi preconditioner having ILU(0) on the blocks. Default parameters are used for stopping criterion of the coarsest solver. and solves the coarsest-level system with the parallel flexible
Note that, also in this case, specifying KRM as coarsest-level Conjugate Gradient method (KRM) coupled with the block-Jacobi
preconditioner having ILU(0) on the blocks, with default parameters
used for the coarsest solver.
Note that specifying KRM as coarsest-level
solver implies that the coarsest-level matrix is distributed solver implies that the coarsest-level matrix is distributed
among the processes. among the processes.
%It is specified that the coarsest-level %It is specified that the coarsest-level

@ -6,40 +6,41 @@
AMG4PSBLAS is freely distributable under the following copyright AMG4PSBLAS is freely distributable under the following copyright
terms: {\small terms: {\small
\begin{verbatim} \begin{verbatim}
AMG4PSBLAS version 1.0
Algebraic MultiGrid Preconditioners Package AMG4PSBLAS version 1.2
based on PSBLAS (Parallel Sparse BLAS version 3.7) Algebraic Multigrid Package
based on PSBLAS (Parallel Sparse BLAS version 3.9)
(C) Copyright 2021
(C) Copyright 2025
Pasqua D'Ambra IAC-CNR, IT
Fabio Durastante University of Pisa and IAC-CNR, IT Salvatore Filippone
Salvatore Filippone University of Rome Tor-Vergata and IAC-CNR, IT Pasqua D'Ambra
Fabio Durastante
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions Redistribution and use in source and binary forms, with or without
are met: modification, are permitted provided that the following conditions
1. Redistributions of source code must retain the above copyright are met:
notice, this list of conditions and the following disclaimer. 1. Redistributions of source code must retain the above copyright
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer.
notice, this list of conditions, and the following disclaimer in the 2. Redistributions in binary form must reproduce the above copyright
documentation and/or other materials provided with the distribution. notice, this list of conditions, and the following disclaimer in the
3. The name of the MLD2P4 group or the names of its contributors may documentation and/or other materials provided with the distribution.
not be used to endorse or promote products derived from this 3. The name of the AMG4PSBLAS group or the names of its contributors may
software without specific written permission. not be used to endorse or promote products derived from this
software without specific written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
POSSIBILITY OF SUCH DAMAGE. ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
\end{verbatim} \end{verbatim}
} }

@ -3,7 +3,9 @@
{\textsc{\ref{sec:overview} General Overview}} {\textsc{\ref{sec:overview} General Overview}}
The \textsc{Algebraic MultiGrid Preconditioners Package based on The \textsc{Algebraic MultiGrid Preconditioners Package based on
PSBLAS} (\textsc{AMG\-4\-PSBLAS}) provides parallel Algebraic MultiGrid (AMG) preconditioners (see, e.g., \cite{Briggs2000,Stuben_01}), PSBLAS} (\textsc{AMG\-4\-PSBLAS}) provides parallel Algebraic
MultiGrid (AMG) preconditioners (see, e.g.,
\cite{Briggs2000,Stuben_01}),
to be used in the iterative solution of linear systems, to be used in the iterative solution of linear systems,
\begin{equation} \begin{equation}
Ax=b, Ax=b,
@ -18,12 +20,14 @@ where $A$ is a square, real or complex, sparse symmetric positive definite (s.p.
The preconditioners implemented in AMG4PSBLAS are obtained by combining The preconditioners implemented in AMG4PSBLAS are obtained by combining
3 different types of AMG cycles with smoothers and coarsest-level 3 different types of AMG cycles with smoothers and coarsest-level
solvers. Available multigrid cycles include the V-, W-, and a version of a Krylov-type cycle solvers. We provide a number of multigrid cycles, including the V-,
W-, and a version of a Krylov-type cycle
(K-cycle)~\cite{Briggs2000,Notay2008}; they can be (K-cycle)~\cite{Briggs2000,Notay2008}; they can be
combined with Jacobi, hybrid combined with Jacobi, hybrid
%\footnote{see Note 2 in Table~\ref{tab:p_coarse}, p.~28.} %\footnote{see Note 2 in Table~\ref{tab:p_coarse}, p.~28.}
forward/backward Gauss-Seidel, block-Jacobi and additive Schwarz forward/backward Gauss-Seidel, block-Jacobi and additive Schwarz
smoothers with various versions of local incomplete factorizations and approximate inverses smoothers with various versions of local incomplete factorizations and
approximate inverses
on the blocks. The Jacobi, block-Jacobi and on the blocks. The Jacobi, block-Jacobi and
Gauss-Seidel smoothers are also available in the $\ell_1$ version~\cite{DDF2020}. Gauss-Seidel smoothers are also available in the $\ell_1$ version~\cite{DDF2020}.
@ -41,7 +45,8 @@ two different coarsening strategies, based on aggregation, are available:
and described in detail in~\cite{DDF2020}; and described in detail in~\cite{DDF2020};
\end{itemize} \end{itemize}
Either exact or approximate solvers can be used on the coarsest-level Either exact or approximate solvers can be used on the coarsest-level
system. We provide interfaces to various parallel and sequential sparse LU factorizations from external system. We provide interfaces to various parallel and sequential
sparse LU factorizations from external
packages, sequential native incomplete LU and approximate inverse factorizations, packages, sequential native incomplete LU and approximate inverse factorizations,
parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solvers and parallel weighted Jacobi, hybrid Gauss-Seidel, block-Jacobi solvers and
calls to preconditioned Krylov methods; all calls to preconditioned Krylov methods; all
@ -74,36 +79,41 @@ important revisions and extentions of the PSBLAS infrastructure.
The inter-process comunication required by AMG4PSBLAS is encapsulated The inter-process comunication required by AMG4PSBLAS is encapsulated
in the PSBLAS routines; in the PSBLAS routines;
therefore, AMG4PSBLAS can be run on any parallel machine where PSBLAS therefore, AMG4PSBLAS can be run on any parallel machine where PSBLAS
implementations are available. In the most recent version of PSBLAS implementations are available. The most recent version of PSBLAS
(release 3.7), a plug-in for GPU is included; it includes CUDA (release 3.9) includes a plug-in for GPU; it contains CUDA
versions of main vector operations and of sparse matrix-vector versions of main vector operations and of sparse matrix-vector
multiplication, so that Krylov methods coupled with AMG4PSBLAS multiplication, so that Krylov methods coupled with AMG4PSBLAS
preconditioners relying on Jacobi and block-Jacobi smoothers with preconditioners relying on Jacobi and block-Jacobi smoothers with
sparse approximate inverses on the blocks can be efficiently executed sparse approximate inverses on the blocks can be efficiently executed
on cluster of GPUs. on cluster of GPUs.
AMG4PSBLAS has a layered and modular software architecture where three main layers can be AMG4PSBLAS has a layered and modular software architecture where three
identified. The lower layer consists of the PSBLAS kernels, the middle one implements main layers can be identified. The lower layer consists of the PSBLAS
the construction and application phases of the preconditioners, and the upper one kernels, the middle one implements the construction and application
provides a uniform interface to all the preconditioners. phases of the preconditioners, and the upper one provides a uniform
This architecture allows for different levels of use of the package: interface to all the preconditioners. This architecture allows for
few black-box routines at the upper layer allow all users to easily different levels of use of the package: few black-box routines at the
build and apply any preconditioner available in AMG4PSBLAS; upper layer allow all users to easily build and apply any
facilities are also available allowing expert users to extend the set of smoothers preconditioner available in AMG4PSBLAS; facilities are also available
and solvers for building new versions of the preconditioners (see allowing expert users to extend the set of smoothers and solvers for
Section~\ref{sec:adding}). building new versions of the preconditioners (see
Section~\ref{sec:adding}).
This guide is organized as follows. General information on the distribution of the source This guide is organized as follows. General information on the
code is reported in Section~\ref{sec:distribution}, while details on the configuration distribution of the source code is reported in
and installation of the package are given in Section~\ref{sec:building}. The basics for building and applying the Section~\ref{sec:distribution}, while details on the configuration and
preconditioners with the Krylov solvers implemented in PSBLAS are reported installation of the package are given in
in~Section~\ref{sec:started}, where the Fortran codes of a few sample programs Section~\ref{sec:building}. The basics for building and applying the
are also shown. A reference guide for the user interface routines is provided preconditioners with the Krylov solvers implemented in PSBLAS are
in Section~\ref{sec:userinterface}. Information on the extension of the package reported in~Section~\ref{sec:started}, where the Fortran codes of a
through the addition of new smoothers and solvers is reported in Section~\ref{sec:adding}. few sample programs are also shown. A reference guide for the user
The error handling mechanism used by the package interface routines is provided in
is briefly described in Section~\ref{sec:errors}. The copyright terms concerning the Section~\ref{sec:userinterface}. Information on the extension of the
distribution and modification of AMG4PSBLAS are reported in Appendix~\ref{sec:license}. package through the addition of new smoothers and solvers is reported
in Section~\ref{sec:adding}. The error handling mechanism used by the
package is briefly described in Section~\ref{sec:errors}. The
copyright terms concerning the distribution and modification of
AMG4PSBLAS are reported in Appendix~\ref{sec:license}.
%%% Local Variables: %%% Local Variables:
%%% mode: latex %%% mode: latex

@ -154,7 +154,7 @@ Preconditioners Package based on PSBLAS}
\flushright \flushright
\large Software version: 1.2\\ \large Software version: 1.2\\
%\todaym %\todaym
\large December 31st, 2025 \large December 23rd, 2025
\end{minipage}} \end{minipage}}
%\addtolength{\textwidth}{\centeroffset} %\addtolength{\textwidth}{\centeroffset}
\vspace{\stretch{2}} \vspace{\stretch{2}}

@ -114,7 +114,7 @@
%\today %\today
Software version: 1.2\\ Software version: 1.2\\
%\today %\today
December 31st, 2025 December 23rd, 2025
\clearpage \clearpage
\ \\ \ \\
\thispagestyle{empty} \thispagestyle{empty}

@ -160,7 +160,7 @@ the smoothers. However, for simplicity, shortcuts are
provided to set all versions of point-Jacobi, hybrid (forward) Gauss-Seidel, and provided to set all versions of point-Jacobi, hybrid (forward) Gauss-Seidel, and
hybrid backward Gauss-Seidel, i.e., the previous smoothers can be defined hybrid backward Gauss-Seidel, i.e., the previous smoothers can be defined
just by setting \fortinline|'SMOOTHER_TYPE'| to certain specific just by setting \fortinline|'SMOOTHER_TYPE'| to certain specific
values (see Tables~\ref{tab:p_smoother}), without the need to set values (see Table~\ref{tab:p_smoother}), without the need to set
\fortinline|'SUB_SOLVE'| as well. \fortinline|'SUB_SOLVE'| as well.
The smoother and solver objects are arranged in a The smoother and solver objects are arranged in a
@ -182,47 +182,50 @@ the polynomial used. Consequently, the \fortinline|'SMOOTHER_SWEEPS'| option is
the \fortinline|'POLY_DEGREE'| option. This smoother is paired with a base smoother the \fortinline|'POLY_DEGREE'| option. This smoother is paired with a base smoother
object, whose iterations are accelerated using the specified polynomial smoothing technique. object, whose iterations are accelerated using the specified polynomial smoothing technique.
By default, the $\ell_1$-Jacobi smoother serves as the base smoother, offering theoretical By default, the $\ell_1$-Jacobi smoother serves as the base smoother, offering theoretical
guarantees on the resulting convergence factor~\cite{DDFMT2024,LOTTES}. Alternative combinations guarantees on the resulting convergence
are experimental and lack established guarantees.\\ factor~\cite{DDFMT2024,LOTTES}. Alternative combinations are
experimental.\\
% and lack established guarantees.\\
\textbf{Remark 4.} Many of the coarsest-level solvers apply to a \textbf{Remark 4.} Many of the coarsest-level solvers apply to a
specific coarsest-matrix layout; specific coarsest-matrix layout; therefore, setting the solver after
therefore, setting the solver after the layout may change the layout the layout may change the layout to either distributed or replicated,
to either distributed or replicated. and similarly, setting the layout after the solver may change the
Similarly, setting the layout after the solver may change the solver. solver. More specifically, UMFPACK and SuperLU require the coarsest-level
matrix to be replicated, while SuperLU\_Dist and KRM require it to be
More precisely, UMFPACK and SuperLU require the coarsest-level distributed; therefore, setting the coarsest-level solver implies
matrix to be replicated, while SuperLU\_Dist and KRM require it to be distributed. that the layout is redefined according to the solver, ovverriding any
In these cases, setting the coarsest-level solver implies that
the layout is redefined according to the solver, ovverriding any
previous settings. MUMPS, point-Jacobi, previous settings. MUMPS, point-Jacobi,
hybrid Gauss-Seidel and block-Jacobi can be applied to hybrid Gauss-Seidel and block-Jacobi can be applied to
replicated and distributed matrices, thus their choice replicated and distributed matrices, thus their choice
does not modify any previously specified layout. does not modify any previously specified layout.
It is worth noting that, when the matrix is replicated, It is worth noting that, when the matrix is replicated,
the point-Jacobi, hybrid Gauss-Seidel and block-Jacobi solvers and their $\ell_1-$ versions the point-Jacobi, hybrid Gauss-Seidel and block-Jacobi solvers and
reduce to the corresponding local solver objects (see Remark~2). their $\ell_1-$ versions reduce to the corresponding local solver
For the point-Jacobi and Gauss-Seidel solvers, these objects objects (see Remark~2). For the point-Jacobi and Gauss-Seidel solvers,
correspond to a \emph{single} point-Jacobi sweep and a \emph{single} these objects correspond to a \emph{single} point-Jacobi sweep and a
Gauss-Seidel sweep, respectively, which are very poor solvers. \emph{single} Gauss-Seidel sweep, respectively, which are very poor
solvers.
On the other hand, the distributed layout can be used with any solver
but UMFPACK and SuperLU; therefore, if any of these two solvers has already On the other hand, the distributed layout can be used with any solver
been selected, the coarsest-level solver is changed to block-Jacobi, except and SuperLU; therefore, if any of these two solvers has
with the previously chosen solver applied to the local blocks. already been selected, the coarsest-level solver is changed to
Likewise, the replicated layout can be used with any solver but SuperLu\_Dist and KRM; block-Jacobi, with the previously chosen solver applied to the local
therefore, if SuperLu\_Dist or KRM have been previously set, the coarsest-level blocks. Likewise, the replicated layout can be used with any solver
solver is changed to the default sequential solver. but SuperLu\_Dist and KRM; therefore, if SuperLu\_Dist or KRM have
been previously set, the coarsest-level solver is changed to the
In a parallel setting with many cores, we suggest to the users to change the default default sequential solver.
coarsest solver for using the KRM choice, i.e. a parallel distributed iterative solution of the
coarsest system based on Krylov methods. In a parallel setting with many cores, we suggest to the users to
change the default coarsest solver for using the KRM choice, i.e. a
\textbf{Remark 4.} The argument \fortinline|idx| can be used to allow finer parallel distributed iterative solution of the coarsest system based
control for those solvers; for instance, by specifying the keyword on Krylov methods.
\fortinline|'MUMPS_IPAR_ENTRY'| and an appropriate value for \fortinline|idx|, it is
possible to set any entry in the MUMPS integer control array. \textbf{Remark 4.} The argument \fortinline|idx| can be used to allow
See also Sec.~\ref{sec:adding}. finer control for those solvers; for instance, by specifying the
keyword \fortinline|'MUMPS_IPAR_ENTRY'| and an appropriate value for
\fortinline|idx|, it is possible to set any entry in the MUMPS integer
control array. See also Sec.~\ref{sec:adding}.
%The \verb|what,val| pairs described here are those of the predefined %The \verb|what,val| pairs described here are those of the predefined
%moother/solver objects; newly developed solvers may define new pairs %moother/solver objects; newly developed solvers may define new pairs
%according to their needs. %according to their needs.

Loading…
Cancel
Save