diff --git a/docs/amg4psblas_1.0-guide.pdf b/docs/amg4psblas_1.2-guide.pdf similarity index 93% rename from docs/amg4psblas_1.0-guide.pdf rename to docs/amg4psblas_1.2-guide.pdf index 372b3db2..e741019b 100644 Binary files a/docs/amg4psblas_1.0-guide.pdf and b/docs/amg4psblas_1.2-guide.pdf differ diff --git a/docs/html/index.html b/docs/html/index.html index b5db3f13..1d079559 100644 --- a/docs/html/index.html +++ b/docs/html/index.html @@ -45,122 +45,62 @@ class="newline" />
-  Abstract -
 Contents -
 1 General Overview -
 2 Code Distribution -
Contributors -
Citing AMG4PSBLAS -
 3 Configuring and Building AMG4PSBLAS -
3.1 Prerequisites -
3.2 Optional third party libraries -
3.3 Configuration options -
3.4 Bug reporting -
3.5 Example and test programs -
 4 Getting Started -
4.1 Examples -
4.2 GPU example -
 5 User Interface -
5.1 Method init -
5.2 Method set -
5.3 Method hierarchy_build -
5.4 Method smoothers_build -
5.5 Method build -
5.6 Method apply -
5.7 Method free -
5.8 Method descr -
5.9 Auxiliary Methods -
 6 Adding new smoother and solver objects to AMG4PSBLAS -
 7 Error Handling -
 A License -
 B Contributor Covenant Code of Conduct -
 References
@@ -171,6 +111,9 @@ class="cmr-12">References + + + diff --git a/docs/html/userhtml.css b/docs/html/userhtml.css index c29347d8..f0aa87fa 100644 --- a/docs/html/userhtml.css +++ b/docs/html/userhtml.css @@ -87,7 +87,6 @@ div.flushleft {text-align: left;} .framebox-r {text-align:right;} span.thank-mark{ vertical-align: super } span.footnote-mark sup.textsuperscript, span.footnote-mark a sup.textsuperscript{ font-size:80%; } -code.verb{font-family:monospace,monospace;} div.tabular, div.center div.tabular {text-align: center; margin-top:0.5em; margin-bottom:0.5em; } table.tabular td p{margin-top:0em;} table.tabular {margin-left: auto; margin-right: auto;} @@ -133,6 +132,7 @@ table.pmatrix {width:100%;} span.bar-css {text-decoration:overline;} img.cdots{vertical-align:middle;} .partToc a, .partToc, .likepartToc a, .likepartToc {line-height: 200%; font-weight:bold; font-size:110%;} +.chapterToc a, .chapterToc, .likechapterToc a, .likechapterToc, .appendixToc a, .appendixToc {line-height: 200%; font-weight:bold;} .index-item, .index-subitem, .index-subsubitem {display:block} div.caption {text-indent:-2em; margin-left:3em; margin-right:1em; text-align:left;} div.caption span.id{font-weight: bold; white-space: nowrap; } @@ -152,10 +152,10 @@ div.author{white-space: nowrap;} div.abstract p {margin-left:5%; margin-right:5%;} div.abstract {width:100%;} .abstracttitle{text-align:center;margin-bottom:1em;} -.subsectionToc, .likesubsectionToc {margin-left:1em;} -.subsubsectionToc, .likesubsubsectionToc {margin-left:2em;} -.paragraphToc, .likeparagraphToc {margin-left:3em;} -.subparagraphToc, .likesubparagraphToc {margin-left:4em;} +.subsectionToc, .likesubsectionToc {margin-left:2em;} +.subsubsectionToc, .likesubsubsectionToc {margin-left:4em;} +.paragraphToc, .likeparagraphToc {margin-left:6em;} +.subparagraphToc, .likesubparagraphToc {margin-left:8em;} .ovalbox { padding-left:3pt; padding-right:3pt; border:solid thin; } .Ovalbox-thick { padding-left:3pt; padding-right:3pt; border:solid thick; } .shadowbox { padding-left:3pt; padding-right:3pt; border:solid thin; border-right:solid thick; border-bottom:solid thick; } diff --git a/docs/html/userhtml.html b/docs/html/userhtml.html index b5db3f13..1d079559 100644 --- a/docs/html/userhtml.html +++ b/docs/html/userhtml.html @@ -45,122 +45,62 @@ class="newline" />
-  Abstract -
 Contents -
 1 General Overview -
 2 Code Distribution -
Contributors -
Citing AMG4PSBLAS -
 3 Configuring and Building AMG4PSBLAS -
3.1 Prerequisites -
3.2 Optional third party libraries -
3.3 Configuration options -
3.4 Bug reporting -
3.5 Example and test programs -
 4 Getting Started -
4.1 Examples -
4.2 GPU example -
 5 User Interface -
5.1 Method init -
5.2 Method set -
5.3 Method hierarchy_build -
5.4 Method smoothers_build -
5.5 Method build -
5.6 Method apply -
5.7 Method free -
5.8 Method descr -
5.9 Auxiliary Methods -
 6 Adding new smoother and solver objects to AMG4PSBLAS -
 7 Error Handling -
 A License -
 B Contributor Covenant Code of Conduct -
 References
@@ -171,6 +111,9 @@ class="cmr-12">References + + + diff --git a/docs/html/userhtmlli1.html b/docs/html/userhtmlli1.html index 27911345..328c05e6 100644 --- a/docs/html/userhtmlli1.html +++ b/docs/html/userhtmlli1.html @@ -77,9 +77,6 @@ class="cmr-12">PSCToolkit (Parallel Sparse Computation Toolkit) software framewo class="cmr-12">of a software development project started in 2007, named MLD2P4, which originally implemented a multilevel version of some domain decomposition preconditioners of - - - additive-Schwarz type, and was based on a parallel decoupled version of the well known and preconditioners are represented as PSBLAS distributed sparse class="cmr-12">AMG4PSBLAS enables the user to easily specify different features of an algebraic multilevel preconditioner, thus allowing to experiment with different preconditioners for + + + the problem and parallel computers at hand.

]

id="x3-2000">Contents
-  1 General Overview -
 2 Code Distribution -
 3 Configuring and Building AMG4PSBLAS -
  3.1 Prerequisites -
  3.2 Optional third party libraries -
  3.3 Configuration options -
  3.4 Bug reporting -
  3.5 Example and test programs -
 4 Getting Started -
  4.1 Examples -
  4.2 GPU example -
 5 User Interface -
  5.1 Method init -
  5.2 Method set -
  5.3 Method hierarchy_build -
  5.4 Method smoothers_build -
  5.5 Method build -
  5.6 Method apply -
  5.7 Method free -
  5.8 Method descr -
  5.9 Auxiliary Methods -
   5.9.1 Method: dump -
   5.9.2 Method: clone -
   5.9.3 Method: sizeof -
   5.9.4 Method: allocate_wrk -
   5.9.5 Method: free_wrk -
 6 Adding new smoother and solver objects to AMG4PSBLAS -
 7 Error Handling -
 A License -
 B Contributor Covenant Code of Conduct
diff --git a/docs/html/userhtmlli3.html b/docs/html/userhtmlli3.html index 40146ec5..eaf2b366 100644 --- a/docs/html/userhtmlli3.html +++ b/docs/html/userhtmlli3.html @@ -1,7 +1,7 @@ -Contributors +References @@ -10,49 +10,739 @@ -

whose current value is 1.0whose current value is 1.0.

-

- Contributors -
Citing AMG4PSBLAS -
+

Contributors

+ + + + +

+

Citing AMG4PSBLAS

+

When use the library, please cite the following: + + + +

+@article{DDF2021,
+       author = {D’Ambra, Pasqua and Durastante, Fabio and Filippone, Salvatore},
+       title = {{{AMG Preconditioners for Linear Solvers towards Extreme Scale}},
+       journal = {arXiv e-preprints},
+       eprint = {2006.16147v3},
+       archivePrefix = {arXiv},
+       year={2021}
+     }
+
+@Misc{psctoolkit-web-page,
+       author = {D’Ambra, Pasqua and Durastante, Fabio and Filippone, Salvatore},
+       title =  {{PSCToolkit} {W}eb page},
+       url =    {https://psctoolkit.github.io/},
+       howpublished = {\url{https://psctoolkit.github.io/}},
+       year = {2021}
+     }
+
+

+ + + + + + + +

+ id="tailuserhtmlse2.html"> diff --git a/docs/html/userhtmlse3.html b/docs/html/userhtmlse3.html index c076066f..837f2426 100644 --- a/docs/html/userhtmlse3.html +++ b/docs/html/userhtmlse3.html @@ -29,12 +29,13 @@ class="cmr-12">up]

3 Configuring and Building AMG4PSBLAS

In order to build AMG4PSBLAS it is necessary to set up a Makefile with appropriate system-dependent variables; this is done by means of the configure system-dependent variables; this is done by means of the configure script. The distribution also includes the autoconf and automake sources employed to generate the @@ -47,10 +48,12 @@ class="cmr-12"> 2003, with some class="cmr-12">interfaces to external libraries in C; the Fortran compiler must support the Fortran 2003 standard plus the extension MOLD=  2003 standard plus the extension MOLD= feature, which enhances the usability of ALLOCATEof ALLOCATE. Most Fortran compilers provide this feature; in particular, this is supported by the GNU Fortran compiler, for which we recommend to use at least @@ -61,7 +64,7 @@ class="cmr-12">in both single and double precision.

Building AMG4PSBLAS requires some base libraries (see Section 3.1); interfaces to optional third-party libraries, which extend the functionalities Section 3.2), are also available. A number of Linux distributions (e.g., Ubuntu, the base and optional software used by AMG4PSBLAS is given in the sections.

-

- 3.1 Prerequisites -
3.2 Optional third party libraries -
3.3 Configuration options -
3.4 Bug reporting -
3.5 Example and test programs -
+

3.1 Prerequisites

+

The following base libraries are needed: +

+

+BLAS

+

[18, 19, 26] Many vendors provide optimized versions of BLAS; if no + vendor version is available for a given platform, the ATLAS software + (math-atlas.sourceforge .net) may be employed. The reference BLAS from + Netlib (www.netlib.org/blas) are meant to define the standard behaviour of + the BLAS interface, so they are not optimized for any particular platform, + and should only be used as a last resort. Note that BLAS computations form + + + + a relatively small part of the AMG4PSBLAS/PSBLAS; however they are + critical when using preconditioners based on the MUMPS, UMFPACK or + SuperLU third party libraries. UMFPACK requires a full LAPACK library; + our experience is that configuring ATLAS for building full LAPACK does + not always work in the expected way. Our advice is first to download the + LAPACK tarfile from www.netlib.org/lapack and install it independently of + ATLAS. In this case, you need to modify the OPTS and NOOPT definitions + for including -fPIC compilation option in the make.inc file of the LAPACK + library. +

+

+MPI

+

[25, 32] A version of MPI is available on most high-performance computing + systems. +

+

+PSBLAS

+

[21, 23] Parallel Sparse BLAS + (PSBLAS) is available from psctoolkit.github.io/ products/psblas/; version + 3.7.0 (or later) is required. Indeed, all the prerequisites listed so far are also + prerequisites of PSBLAS.

+

Please note that the four previous libraries must have Fortran interfaces compatible with +AMG4PSBLAS; usually this means that they should all be built with the same +compiler being used for AMG4PSBLAS. +

If you want to use the PSBLAS support for NVIDIA GPUs, you will also +need a working version of the CUDA Toolkit that is compatible with the +compiler choice made to compile PSBLAS and AMG4PSBLAS. After that +you will need to have configured and compiled the PSBLAS library with the +options: + + + +

+./configure --enable-cuda --with-cudadir=${CUDA_HOME} --with-cudacc=xx,yy,zz
+
+

Previous versions required you to have the auxiliary libraries SPGPU and +PSBLAS-EXT compiled, this is no longer necessary because they have been integrated +into PSBLAS and are compiled by activating the previous flags during configuration. +See also Sec 4.2. +

+

3.2 Optional third party libraries

+

We provide interfaces to the following third-party software libraries; note that these are +optional, but if you enable them some defaults for multilevel preconditioners may +change to reflect their presence. +

+

+

+UMFPACK

+

[16] A sparse LU factorization package included in the SuiteSparse library, + available from faculty.cse.tamu.edu/davis/suitesparse.html; + it provides sequential factorization and triangular system solution for + double precision real and complex data. We tested version 4.5.4 + of SuiteSparse. Note that for configuring SuiteSparse you should + provide the right path to the BLAS and LAPACK libraries in the + SuiteSparse_config/SuiteSparse_config.mk file. +

+

+MUMPS

+

[2] A sparse LU factorization package available from mumps.enseeiht.fr; + it provides sequential and parallel factorizations and triangular system + solution for single and double precision, real and complex data. We tested + versions 4.10.0 and 5.0.1. +

+

+SuperLU

+ + + +

[17] A sparse LU factorization package available from +crd.lbl.gov/~xiaoye/SuperLU/; it provides sequential factorization and + triangular system solution for single and double precision, real and complex + data. We tested versions 4.3 and 5.0. If you installed BLAS from ATLAS, + remember to define the BLASLIB variable in the make.inc file. +

+

+SuperLU_Dist

+

[28] A sparse LU factorization package available from the same site as + SuperLU; it provides parallel factorization and triangular system solution + for double precision real and complex data. We tested versions 3.3 and + 4.2. If you installed BLAS from ATLAS, remember to define the BLASLIB + variable in the make.inc file and to add the -std=c99 option to the C + compiler options. Note that this library requires the ParMETIS library for + parallel graph partitioning and fill-reducing matrix ordering, available from + glaros.dtc.umn.edu/gkhome/metis/parmetis/overview.

+

+

3.3 Configuration options

+

In order to build AMG4PSBLAS, the first step is to use the configure script in the +main directory to generate the necessary makefile. +

As a minimal example consider the following: + + + +

+./configure --with-psblas=PSB-INSTALL-DIR
+
+

which assumes that the various MPI compilers and support libraries are available in +the standard directories on the system, and specifies only the PSBLAS install directory +(note that the latter directory must be specified with an absolute path). The full set of +options may be looked at by issuing the command ./configure --help, which +produces:

configure configures AMG4PSBLAS 1.0.0 to adapt to many kinds of systems. 
+ 
+Usage: ./configure [OPTION]... [VAR=VALUE]... 
+ 
+To assign environment variables (e.g., CC, CFLAGS...), specify them as 
+VAR=VALUE.  See below for descriptions of some of the useful variables. 
+ 
+Defaults for the options are specified in brackets. 
+ 
+Configuration: 
+  -h, --help              display this help and exit 
+      --help=short        display options specific to this package 
+      --help=recursive    display the short help of all the included packages 
+  -V, --version           display version information and exit 
+  -q, --quiet, --silent   do not print checking ...’ messages 
+      --cache-file=FILE   cache test results in FILE [disabled] 
+  -C, --config-cache      alias for ‘--cache-file=config.cache 
+  -n, --no-create         do not create output files 
+      --srcdir=DIR        find the sources in DIR [configure dir or ‘..’] 
+ 
+Installation directories: 
+  --prefix=PREFIX         install architecture-independent files in PREFIX 
+                          [/usr/local] 
+  --exec-prefix=EPREFIX   install architecture-dependent files in EPREFIX 
+                          [PREFIX] 
+ 
+By default, make install will install all the files in 
+‘/usr/local/bin’, ‘/usr/local/lib etc.  You can specify 
+an installation prefix other than ‘/usr/local using ‘--prefix’, 
+for instance ‘--prefix=$HOME’. 
+ 
+For better control, use the options below. 
+ 
+Fine tuning of the installation directories: 
+  --bindir=DIR            user executables [EPREFIX/bin] 
+  --sbindir=DIR           system admin executables [EPREFIX/sbin] 
+  --libexecdir=DIR        program executables [EPREFIX/libexec] 
+  --sysconfdir=DIR        read-only single-machine data [PREFIX/etc] 
+  --sharedstatedir=DIR    modifiable architecture-independent data [PREFIX/com] 
+  --localstatedir=DIR     modifiable single-machine data [PREFIX/var] 
+  --libdir=DIR            object code libraries [EPREFIX/lib] 
+  --includedir=DIR        C header files [PREFIX/include] 
+  --oldincludedir=DIR     C header files for non-gcc [/usr/include] 
+  --datarootdir=DIR       read-only arch.-independent data root [PREFIX/share] 
+  --datadir=DIR           read-only architecture-independent data [DATAROOTDIR] 
+  --infodir=DIR           info documentation [DATAROOTDIR/info] 
+  --localedir=DIR         locale-dependent data [DATAROOTDIR/locale] 
+  --mandir=DIR            man documentation [DATAROOTDIR/man] 
+  --docdir=DIR            documentation root [DATAROOTDIR/doc/amg4psblas] 
+  --htmldir=DIR           html documentation [DOCDIR] 
+  --dvidir=DIR            dvi documentation [DOCDIR] 
+  --pdfdir=DIR            pdf documentation [DOCDIR] 
+  --psdir=DIR             ps documentation [DOCDIR] 
+ 
+Program names: 
+  --program-prefix=PREFIX            prepend PREFIX to installed program names 
+  --program-suffix=SUFFIX            append SUFFIX to installed program names 
+  --program-transform-name=PROGRAM   run sed PROGRAM on installed program names 
+ 
+Optional Features: 
+  --disable-option-checking  ignore unrecognized --enable/--with options 
+  --disable-FEATURE       do not include FEATURE (same as --enable-FEATURE=no) 
+  --enable-FEATURE[=ARG]  include FEATURE [ARG=yes] 
+  --enable-silent-rules   less verbose build output (undo: "make V=1") 
+  --disable-silent-rules  verbose build output (undo: "make V=0") 
+  --enable-dependency-tracking 
+                          do not reject slow dependency extractors 
+  --disable-dependency-tracking 
+                          speeds up one-time build 
+  --enable-serial         Specify whether to enable a fake mpi library to run 
+                          in serial mode. 
+ 
+Optional Packages: 
+  --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes] 
+  --without-PACKAGE       do not use PACKAGE (same as --with-PACKAGE=no) 
+  --with-psblas=DIR       The install directory for PSBLAS, for example, 
+                          --with-psblas=/opt/packages/psblas-3.5 
+  --with-psblas-incdir=DIR 
+                          Specify the directory for PSBLAS C includes. 
+  --with-psblas-moddir=DIR 
+                          Specify the directory for PSBLAS Fortran modules. 
+  --with-psblas-libdir=DIR 
+                          Specify the directory for PSBLAS library. 
+  --with-ccopt            additional [CCOPT] flags to be added: will prepend 
+                          to [CCOPT] 
+  --with-fcopt            additional [FCOPT] flags to be added: will prepend 
+                          to [FCOPT] 
+  --with-libs             List additional link flags here. For example, 
+                          --with-libs=-lspecial_system_lib or 
+                          --with-libs=-L/path/to/libs 
+  --with-clibs            additional [CLIBS] flags to be added: will prepend 
+                          to [CLIBS] 
+  --with-flibs            additional [FLIBS] flags to be added: will prepend 
+                          to [FLIBS] 
+  --with-library-path     additional [LIBRARYPATH] flags to be added: will 
+                          prepend to [LIBRARYPATH] 
+  --with-include-path     additional [INCLUDEPATH] flags to be added: will 
+                          prepend to [INCLUDEPATH] 
+  --with-module-path      additional [MODULE_PATH] flags to be added: will 
+                          prepend to [MODULE_PATH] 
+  --with-extra-libs       List additional link flags here. For example, 
+                          --with-extra-libs=-lspecial_system_lib or 
+                          --with-extra-libs=-L/path/to/libs 
+  --with-blas=<lib>       use BLAS library <lib> 
+  --with-blasdir=<dir>    search for BLAS library in <dir> 
+  --with-lapack=<lib>     use LAPACK library <lib> 
+  --with-mumps=LIBNAME    Specify the libname for MUMPS. Default: autodetect 
+                          with minimum "-lmumps_common -lpord" 
+  --with-mumpsdir=DIR     Specify the directory for MUMPS library and 
+                          includes. Note: you will need to add auxiliary 
+                          libraries with --extra-libs; this depends on how 
+                          MUMPS was configured and installed, at a minimum you 
+                          will need SCALAPACK and BLAS 
+  --with-mumpsincdir=DIR  Specify the directory for MUMPS includes. 
+  --with-mumpsmoddir=DIR  Specify the directory for MUMPS Fortran modules. 
+  --with-mumpslibdir=DIR  Specify the directory for MUMPS library. 
+  --with-umfpack=LIBNAME  Specify the library name for UMFPACK and its support 
+                          libraries. Default: "-lumfpack -lamd" 
+  --with-umfpackdir=DIR   Specify the directory for UMFPACK library and 
+                          includes. 
+  --with-umfpackincdir=DIR 
+                          Specify the directory for UMFPACK includes. 
+  --with-umfpacklibdir=DIR 
+                          Specify the directory for UMFPACK library. 
+  --with-superlu=LIBNAME  Specify the library name for SUPERLU library. 
+                          Default: "-lsuperlu" 
+  --with-superludir=DIR   Specify the directory for SUPERLU library and 
+                          includes. 
+  --with-superluincdir=DIR 
+                          Specify the directory for SUPERLU includes. 
+  --with-superlulibdir=DIR 
+                          Specify the directory for SUPERLU library. 
+  --with-superludist=LIBNAME 
+                          Specify the libname for SUPERLUDIST library. 
+                          Requires you also specify SuperLU. Default: 
+                          "-lsuperlu_dist" 
+  --with-superludistdir=DIR 
+                          Specify the directory for SUPERLUDIST library and 
+                          includes. 
+  --with-superludistincdir=DIR 
+                          Specify the directory for SUPERLUDIST includes. 
+  --with-superludistlibdir=DIR 
+                          Specify the directory for SUPERLUDIST library. 
+ 
+Some influential environment variables: 
+  FC          Fortran compiler command 
+  FCFLAGS     Fortran compiler flags 
+  LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries in a 
+              nonstandard directory <lib dir> 
+  LIBS        libraries to pass to the linker, e.g. -l<library> 
+  CC          C compiler command 
+  CFLAGS      C compiler flags 
+  CPPFLAGS    (Objective) C/C++ preprocessor flags, e.g. -I<include dir> if 
+              you have headers in a nonstandard directory <include dir> 
+  MPICC       MPI C compiler command 
+  MPIFC       MPI Fortran compiler command 
+  CPP         C preprocessor 
+ 
+Use these variables to override the choices made by configure or to help 
+it to find libraries and programs with nonstandard names/locations. 
+ 
+Report bugs to <https://github.com/psctoolkit/psctoolkit/issues>.
+   
+

For instance, if a user has built and installed PSBLAS 3.7 under the /opt directory and is +using the SuiteSparse package (which includes UMFPACK), then AMG4PSBLAS +might be configured with: + + +

+./configure --with-psblas=/opt/psblas-3.7/ \
+--with-umfpackincdir=/usr/include/suitesparse/
+
+

Once the configure script has completed execution, it will have generated the file +Make.inc which will then be used by all Makefiles in the directory tree; this file will be +copied in the install directory under the name Make.inc.AMG4PSBLAS. +

To use the MUMPS solver package, the user has to add the appropriate options to +the configure script; by default we are looking for the libraries -ldmumps -lsmumps + -lzmumps -lcmumps -mumps_common -lpord. MUMPS often uses additional +packages such as ScaLAPACK, ParMETIS, SCOTCH, as well as enabling OpenMP; in +such cases it is necessary to add linker options with the --with-extra-libs configure +option. +

To build the library the user will now enter + + +

+make
+
+

followed (optionally) by + -

4 Getting Started

This section describes the basics for building and applying AMG4PSBLAS one-level @@ -39,7 +39,7 @@ class="cmr-12">and multilevel (i.e., AMG) preconditioners with the Krylov solver class="cmr-12">PSBLAS [21]. @@ -47,25 +47,36 @@ class="cmr-12">. class="cmr-12">The following steps are required:

  1. + class="enumerate" id="x7-13002x1">

    Declare the preconditioner data structure. It is a derived data type, - amg_xprec_ typeamg_xprec_ type, where x may be s, d, c or zmay be s, d, c or z, according to the basic data type of the sparse matrix (s = real single precision; d type of the sparse matrix (s = real single precision; d = real double precision; - c = complex single precision; z c = complex single precision; z = complex double precision). This data structure is accessed by the user only through the AMG4PSBLAS routines, @@ -73,7 +84,7 @@ class="cmr-12">structure is accessed by the user only through the AMG4PSBLAS rou class="cmr-12">following an object-oriented approach.

  2. + class="enumerate" id="x7-13004x2">

    Allocate and initialize the preconditioner data structure, according to a the user. The preconditioner types and the defaults associated wi are given in Table 1, where the strings used by init to identify the @@ -96,7 +107,7 @@ class="cmr-12">preconditioner types are also given. Note that these strings are class="cmr-12">uppercase letters are substituted by corresponding lowercase ones.

  3. + class="enumerate" id="x7-13006x3">

    Modify the selected preconditioner type, by properly setting preconditioner associated with the selected preconditioner type, to obtain a var class="cmr-12">preconditioner. Examples of use of set are given in Section 4.1; a complete - - - list of all the preconditioner parameters and their allowed and default values is provided in Section 5, Tables 2-8. + + +

  4. + class="enumerate" id="x7-13008x4">

    Build the preconditioner for a given matrix. If the selected preconditioner is @@ -142,7 +153,7 @@ class="cmr-12">. If the selected preconditioner is class="cmr-12">multilevel, then two steps must be performed, as specified next.

    1. + class="enumerate" id="x7-13009x0">

      Build the AMG hierarchy for a given matrix. This is performed by the @@ -151,7 +162,7 @@ class="cmr-12">routine .

    2. + class="enumerate" id="x7-13010x0">

      Build the preconditioner for a given matrix. This is performed by the @@ -165,7 +176,7 @@ class="cmr-12">the routine .

    3. + class="enumerate" id="x7-13012x5">

      Apply the preconditioner at each iteration of a Krylov solver. This is performed by @@ -180,7 +191,7 @@ class="cmr-12">implementing the Krylov solver ().

    4. + class="enumerate" id="x7-13014x6">

      Free the preconditioner data structure. This is performed by the routine freeAll the previous routines are available as methods of the precond detailed description of them is given in Section 5. Examples showing the basic use of AMG4PSBLAS are reported in Section 4.1.

      @@ -208,7 +219,7 @@ class="cmr-12">.



      @@ -224,21 +235,21 @@ id="TBL-1-2">


      type

      e

      string

      g

      deioner r




      - No preconditioner

      NONE -

      Considered to use the PSBLAS Krylov - solvers with no preconditioner. - +class="td11">No preconditioner

      NONE

      Considered to use the PSBLAS Krylov +solvers with no preconditioner.




      - Diagonal

      DIAG, - JACOBI, - L1-JACOBI -

      Diagonal preconditioner. For any zero - diagonal entry of the matrix to be - preconditioned, the corresponding entry - of the preconditioner is set to 1. - +class="td11">Diagonal

      DIAG, +JACOBI, +L1-JACOBI

      Diagonal preconditioner. For any zero +diagonal entry of the matrix to be +preconditioned, the corresponding entry +of the preconditioner is set to 1.




      - Gauss-Seidel

      GS, - L1-GS -

      Hybrid Gauss-Seidel (forward), that is, - global block Jacobi with Gauss-Seidel as - local solver. - +class="td11">Gauss-Seidel

      GS, +L1-GS

      Hybrid Gauss-Seidel (forward), that is, +global block Jacobi with Gauss-Seidel as +local solver.




      - Symmetrized Gauss-Seidel

      FBGS, - L1-FBGS -

      Symmetrized hybrid Gauss-Seidel, that - is, forward Gauss-Seidel followed by - backward Gauss-Seidel. - +class="td11">Symmetrized Gauss-Seidel

      FBGS, +L1-FBGS

      Symmetrized hybrid Gauss-Seidel, that +is, forward Gauss-Seidel followed by +backward Gauss-Seidel.




      - Block Jacobi

      BJAC, - L1-BJAC -

      Block-Jacobi with ILU(0) on the local - blocks. - +class="td11">Block Jacobi

      BJAC, +L1-BJAC

      Block-Jacobi with ILU(0) on the local +blocks.




      - Additive Schwarz

      AS -

      Additive Schwarz (AS), with overlap 1 - and ILU(0) on the local blocks. - +class="td11">Additive Schwarz

      AS

      Additive Schwarz (AS), with overlap 1 +and ILU(0) on the local blocks.




      - Multilevel

      ML -

      V-cycle with one hybrid - forward Gauss-Seidel (GS) sweep as - pre-smoother and one hybrid backward - GS sweep as post-smoother, decoupled - smoothed aggregation as coarsening - algorithm, and LU (plus triangular solve) - as coarsest-level solver. See the default - values in Tables 2-8 for further details of - the preconditioner. - +class="td11">Multilevel

      ML

      V-cycle with one hybrid +forward Gauss-Seidel (GS) sweep as +pre-smoother and one hybrid backward +GS sweep as post-smoother, decoupled +smoothed aggregation as coarsening +algorithm, and LU (plus triangular solve) +as coarsest-level solver. See the default +values in Tables 2-8 for further details of +the preconditioner.




      +class="td11">

      Table 1: Preconditioner types, corresponding strings and default choices.
      +class="content">Preconditioner types, corresponding strings and default choices.
      @@ -375,7 +365,7 @@ class="cmr-12">, for interfacing with the Krylov solvers, must be also used (see Section 4.1).
      problems. However, this does not necessarily correspond to the sh on parallel computers. -
      - 4.1 Examples -
      4.2 GPU example -
      +

      4.1 Examples

      +

      The code reported in Figure 1 shows how to set and apply the default multilevel +preconditioner available in the real double precision version of AMG4PSBLAS +(see Table 1). This preconditioner is chosen by simply specifying ML as the +second argument of P%init (a call to P%set is not needed) and is applied +with the CG solver provided by PSBLAS (the matrix of the system to be +solved is assumed to be positive definite). As previously observed, the modules +psb_base_mod, amg_prec_mod and psb_krylov_mod must be used by the example +program. +

      The part of the code dealing with reading and assembling the sparse matrix and the +right-hand side vector and the deallocation of the relevant data structures, performed +through the PSBLAS routines for sparse matrix and vector management, +is not reported here for the sake of conciseness. The complete code can be +found in the example program file amg_dexample_ml.f90, in the directory +samples/simple/fileread of the AMG4PSBLAS implementation (see Section 3.5). A +sample test problem along with the relevant input data is available in +samples/simple/fileread/runs. For details on the use of the PSBLAS routines, see +the PSBLAS User’s Guide [21]. +

      The setup and application of the default multilevel preconditioner for the real single +precision and the complex, single and double precision, versions are obtained +with straightforward modifications of the previous example (see Section 5 for +details). If these versions are installed, the corresponding codes are available in +samples/simple/fileread. + + + +


      + + + +
      +

      + + + +

      +  use psb_base_mod
      +  use amg_prec_mod
      +  use psb_krylov_mod
      +... ...
      +!
      +! sparse matrix
      +  type(psb_dspmat_type) :: A
      +! sparse matrix descriptor
      +  type(psb_desc_type)   :: desc_A
      +! preconditioner
      +  type(amg_dprec_type)  :: P
      +! right-hand side and solution vectors
      +  type(psb_d_vect_type) :: b, x
      +... ...
      +!
      +! initialize the parallel environment
      +  call psb_init(ctxt)
      +  call psb_info(ctxt,iam,np)
      +... ...
      +!
      +! read and assemble the spd matrix A and the right-hand side b
      +! using PSBLAS routines for sparse matrix / vector management
      +... ...
      +!
      +! initialize the default multilevel preconditioner, i.e. V-cycle
      +! with basic smoothed aggregation, 1 hybrid forward/backward
      +! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
      +! solver
      +  call P%init(ctxt,’ML’,info)
      +!
      +! build the preconditioner
      +  call P%hierarchy_build(A,desc_A,info)
      +  call P%smoothers_build(A,desc_A,info)
      +
      +!
      +! set the solver parameters and the initial guess
      +  ... ...
      +!
      +! solve Ax=b with preconditioned FCG
      +  call psb_krylov(’FCG’,A,P,b,x,tol,desc_A,info)
      +  ... ...
      +!
      +! deallocate the preconditioner
      +  call P%free(info)
      +!
      +! deallocate other data structures
      +  ... ...
      +!
      +! exit the parallel environment
      +  call psb_exit(ctxt)
      +  stop
      +
      +

      + + + +
      Listing 1: setup and application of the default multilevel preconditioner (example 1). +
      +
      + +

      +

      Different versions of the multilevel preconditioner can be obtained by changing the +default values of the preconditioner parameters. The code reported in Figure 2 shows +how to set a V-cycle preconditioner which applies 1 block-Jacobi sweep as pre- +and post-smoother, and solves the coarsest-level system with 8 block-Jacobi +sweeps. Note that the ILU(0) factorization (plus triangular solve) is used as +local solver for the block-Jacobi sweeps, since this is the default associated +with block-Jacobi and set by P%init. Furthermore, specifying block-Jacobi as +coarsest-level solver implies that the coarsest-level matrix is distributed among +the processes. Figure 3 shows how to set a W-cycle preconditioner using the +Coarsening based on Compatible Weighted Matching, aggregates of size at +most 8 and smoothed prolongators. It applies 2 hybrid Gauss-Seidel sweeps as +pre- and post-smoother, and solves the coarsest-level system with the parallel +flexible Conjugate Gradient method (KRM) coupled with the block-Jacobi +preconditioner having ILU(0) on the blocks. Default parameters are used for stopping +criterion of the coarsest solver. Note that, also in this case, specifying KRM as +coarsest-level solver implies that the coarsest-level matrix is distributed among the +processes. +

      The code fragments shown in Figures 2 and 3 are included in the example program +file amg_dexample_ml.f90 too. +

      Finally, Figure 4 shows the setup of a one-level additive Schwarz preconditioner, +i.e., RAS with overlap 2. Note also that a Krylov method different from CG +must be used to solve the preconditioned system, since the preconditione in +nonsymmetric. The corresponding example program is available in the file +amg_dexample_1lev.f90. +

      For all the previous preconditioners, example programs where the sparse matrix +and the right-hand side are generated by discretizing a PDE with Dirichlet +boundary conditions are also available in the directory samples/simple/pdegen. + + + +


      + + + +
      +

      +

      +... ...
      +! build a V-cycle preconditioner with 1 block-Jacobi sweep (with
      +! ILU(0) on the blocks) as pre- and post-smoother, and 8  block-Jacobi
      +! sweeps (with ILU(0) on the blocks) as coarsest-level solver
      +  call P%init(ctxt,’ML’,info)
      +  call P%set(’SMOOTHER_TYPE’,’BJAC’,info)
      +  call P%set(’COARSE_SOLVE’,’BJAC’,info)
      +  call P%set(’COARSE_SWEEPS’,8,info)
      +  call P%hierarchy_build(A,desc_A,info)
      +  call P%smoothers_build(A,desc_A,info)
      +... ...
      +
      +

      +
      Listing 2: setup of a multilevel preconditioner based on the default decoupled coarsening
      + + + +

      + + + +


      + + + +
      +

      +

      +... ...
      +! build a W-cycle preconditioner with 2 hybrid Gauss-Seidel sweeps
      +! as pre- and post-smoother, a distributed coarsest
      +! matrix, and MUMPS as coarsest-level solver
      +  call P%init(ctxt,’ML’,info)
      +  call P%set(’PAR_AGGR_ALG’,’COUPLED’,info)
      +  call P%set(’AGGR_TYPE’,’MATCHBOXP’,info)
      +  call P%set(’AGGR_SIZE’,8,info)
      +  call P%set(’ML_CYCLE’,’WCYCLE’,info)
      +  call P%set(’SMOOTHER_TYPE’,’FBGS’,info)
      +  call P%set(’SMOOTHER_SWEEPS’,2,info)
      +  call P%set(’COARSE_SOLVE’,’KRM’,info)
      +  call P%set(’COARSE_MAT’,’DIST’,info)
      +  call P%set(’KRM_METHOD’,’FCG’,info)
      +  call P%hierarchy_build(A,desc_A,info)
      +  call P%smoothers_build(A,desc_A,info)
      +... ...
      +
      +

      +
      Listing 3: setup of a multilevel preconditioner based on the coupled coarsening using +weighted matching
      + + + +

      + + + +


      + + + +
      +

      +

      +... ...
      +! set RAS with overlap 2 and ILU(0) on the local blocks
      +  call P%init(ctxt,’AS’,info)
      +  call P%set(’SUB_OVR’,2,info)
      +  call P%bld(A,desc_A,info)
      +... ...
      +! solve Ax=b with preconditioned BiCGSTAB
      +  call psb_krylov(’BICGSTAB’,A,P,b,x,tol,desc_A,info)
      +
      +

      +
      Listing 4: setup of a one-level Schwarz preconditioner.
      + + + +

      +

      4.2 GPU example

      +

      The code discussed here shows how to set up a program exploiting the combined GPU +capabilities of PSBLAS and AMG4PSBLAS. The code example is available in the +source distribution directory amg4psblas/examples/gpu. +

      First of all, we need to include the appropriate modules and declare some auxiliary +variables: + + + +


      + + + +
      +

      +

      +program amg_dexample_gpu
      +  use psb_base_mod
      +  use amg_prec_mod
      +  use psb_krylov_mod
      +  use psb_util_mod
      +  use psb_gpu_mod
      +  use data_input
      +  use amg_d_pde_mod
      +  implicit none
      +  .......
      +  ! GPU variables
      +  type(psb_d_hlg_sparse_mat) :: agmold
      +  type(psb_d_vect_gpu)       :: vgmold
      +  type(psb_i_vect_gpu)       :: igmold
      +
      + 
      +
      +

      +
      Listing 5: setup of a GPU-enabled test program part one.
      + + + +

      +

      In this particular example we are choosing to employ a HLG data structure for +sparse matrices on GPUs; for more information please refer to the PSBLAS-EXT users’ +guide. +

      We then have to initialize the GPU environment, and pass the appropriate MOLD +variables to the build methods (see also the PSBLAS and PSBLAS-EXT users’ +guides). + + + +


      + + + +
      +

      +

      +  call psb_init(ctxt)
      +  call psb_info(ctxt,iam,np)
      +  !
      +  ! BEWARE: if you have NGPUS  per node, the default is to
      +  ! attach to mod(IAM,NGPUS)
      +  !
      +  call psb_gpu_init(ictxt)
      +  ......
      +  t1 = psb_wtime()
      +  call prec%smoothers_build(a,desc_a,info, amold=agmold, vmold=vgmold, imold=igmold)
      +
      + 
      +
      +

      +
      Listing 6: setup of a GPU-enabled test program part two.
      + + + +

      +

      Finally, we convert the input matrix, the descriptor and the vectors to use a +GPU-enabled internal storage format. We then preallocate the preconditioner +workspace before entering the Krylov method. At the end of the code, we close the +GPU environment + + + +


      + + + +
      +

      +

      +  call desc_a%cnv(mold=igmold)
      +  call a%cscnv(info,mold=agmold)
      +  call psb_geasb(x,desc_a,info,mold=vgmold)
      +  call psb_geasb(b,desc_a,info,mold=vgmold)
      +
      +  !
      +  ! iterative method parameters
      +  !
      +  call psb_barrier(ctxt)
      +  call prec%allocate_wrk(info)
      +  t1 = psb_wtime()
      +  call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,&
      +       & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,&
      +       & istop=s_choice%istopc,irst=s_choice%irst)
      +  call prec%deallocate_wrk(info)
      +  call psb_barrier(ctxt)
      +  tslv = psb_wtime() - t1
      +
      +  ......
      +  call psb_gpu_exit()
      +  call psb_exit(ctxt)
      +  stop
      +
      + 
      +
      +

      +
      Listing 7: setup of a GPU-enabled test program part three.
      + + + +

      +

      It is very important to employ smoothers and coarsest solvers that are suited to the +GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that +satisfy this constraint include: +

        +
      • +

        JACOBI +

      • +
      • +

        BJAC with the following methods on the local blocks: +

          +
        • +

          INVK +

        • +
        • +

          INVT +

        • +
        • +

          AINV

        +
      +

      and their 1 variants. + + + + + + + + + +

      + id="tailuserhtmlse4.html"> diff --git a/docs/html/userhtmlse5.html b/docs/html/userhtmlse5.html index aa19b30e..5a4c35b8 100644 --- a/docs/html/userhtmlse5.html +++ b/docs/html/userhtmlse5.html @@ -29,7 +29,7 @@ class="cmr-12">up]

      5 User Interface

      The basic user interface of AMG4PBLAS consists of eight methods. The six methods @@ -63,36 +63,48 @@ class="cmr-12">must be passed to the method, i.e.,

      the sparse matrix data structure, containing the matrix to be preconditioned, must be of type psb_xspmat_type must be of type psb_xspmat_type with x = s = s for real single precision, x = d = d for real double precision, x = c = c for complex single precision, x = z = z for complex double precision;

    5. the preconditioner data structure must be of type amg_xprec_typethe preconditioner data structure must be of type amg_xprec_type, with x = s, d, c, z= s, d, c, z, according to the sparse matrix data structure;

    6. @@ -110,15 +122,21 @@ class="cmmi-12">B-1v must be of type psb_xvect_type must be of type psb_xvect_type with x = s, d, c= s, d, c, - zz, in a manner completely analogous to the sparse matrix type;
    7. @@ -129,7 +147,7 @@ class="cmr-12">the precision of the sparse matrix and preconditioner data struct Section 5.2).
    8. A description of each method is given in the remainder of this se -

      - 5.1 Method init -
      5.2 Method set -
      5.3 5.1 Method init +
      +

      +

      call p%init(contxt,ptype,info)

      +

      This method allocates and initializes the preconditioner p, according to the +preconditioner type chosen by the user. +

      Arguments +

      + + + + + +

      contxt

      type(psb_ctxt_type), intent(in).

      The communication context.

      ptype

      character(len=*), intent(in) .

      The type of preconditioner. Its values are specified in Table 1.

      Note that strings are case insensitive.

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      + + + +

      5.2 Method set

      +
      +

      +

      call p%set(what,val,info [,ilev, ilmax, pos, idx])

      +

      This method sets the parameters defining the preconditioner p. More precisely, the +parameter identified by what is assigned the value contained in val. +

      Arguments +

      + + + + + + + + + + + + + +

      what

      character(len=*).

      The parameter to be set. It can be specified through its name; the +string is case-insensitive. See Tables 2-8.

      val

      integer or character(len=*) or real(psb_spk_) or +real(psb_dpk_), intent(in).

      The value of the parameter to be set. The list of allowed values and +the corresponding data types is given in Tables 2-8. When the value +is of type character(len=*), it is also treated as case insensitive.

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      ilev

      integer, optional, intent(in).

      For the multilevel preconditioner, the level at which the +preconditioner parameter has to be set. The levels are numbered +in increasing order starting from the finest one, i.e., level 1 is the +finest level. If ilev is not present, the parameter identified by what +is set at all levels that are appropriate (see Tables 2-8).

      ilmax

      integer, optional, intent(in).

      For the multilevel preconditioner, when both ilev and ilmax are +present, the settings are applied at all levels ilev:ilmax. When +ilev is present but ilmax is not, then the default is ilmax=ilev. +The levels are numbered in increasing order starting from the finest +one, i.e., level 1 is the finest level.

      pos

      character(len=*), optional, intent(in).

      Whether the other arguments apply only to the pre-smoother +(PRE) or to the post-smoother (POST). If pos is not present, +the other arguments are applied to both smoothers. If the +preconditioner is one-level or the parameter identified by what does +not concern the smoothers, pos is ignored.

      idx

      integer, optional, intent(in).

      An auxiliary input argument that can be passed to the underlying +objects.

      +

      +

      A variety of preconditioners can be obtained by setting the appropriate +preconditioner parameters. These parameters can be logically divided into four groups, + + + +i.e., parameters defining +

        +
      1. +

        the type of multilevel cycle and how many cycles must be applied; +

      2. +
      3. +

        the coarsening algorithm; +

      4. +
      5. +

        the solver at the coarsest level (for multilevel preconditioners only); +

      6. +
      7. +

        the smoother of the multilevel preconditioners, or the one-level + preconditioner.

      +

      A list of the parameters that can be set, along with their allowed and default values, is +given in Tables 2-8.
      +

      Remark 2. A smoother is usually obtained by combining two objects: a +smoother (SMOOTHER_TYPE) and a local solver (SUB_SOLVE), as specified in +Tables 7-9. For example, the block-Jacobi smoother using ILU(0) on the blocks is +obtained by combining the block-Jacobi smoother object with the ILU(0) solver +object. Similarly, the hybrid Gauss-Seidel smoother (see Note in Table 7) is +obtained by combining the block-Jacobi smoother object with a single sweep of +the Gauss-Seidel solver object, while the point-Jacobi smoother is the result +of combining the block-Jacobi smoother object with a single sweep of the +point-Jacobi solver object. In the same way are obtained the 1-versions of the +smoothers. However, for simplicity, shortcuts are provided to set all versions of +point-Jacobi, hybrid (forward) Gauss-Seidel, and hybrid backward Gauss-Seidel, i.e., +the previous smoothers can be defined just by setting SMOOTHER_TYPE to +certain specific values (see Tables 7), without the need to set SUB_SOLVE as +well. +

      The smoother and solver objects are arranged in a hierarchical manner. When +specifying a smoother object, its parameters, including the local solver, are set to +their default values, and when a solver object is specified, its defaults are also + + + +set, overriding in both cases any previous settings even if explicitly specified. +Therefore if the user sets a smoother, and wishes to use a solver different from +the default one, the call to set the solver must come after the call to set the +smoother. +

      Similar considerations apply to the point-Jacobi, Gauss-Seidel and block-Jacobi +coarsest-level solvers, and shortcuts are available in this case too (see Table 5). +
      +

      Remark 3. The polynomial-accelerated smoother described in Tables 7 and 9 +redefines a sweep or iteration as corresponding to the degree of the polynomial used. +Consequently, the SMOOTHER_SWEEPS option is overridden by the POLY_DEGREE +option. This smoother is paired with a base smoother object, whose iterations are +accelerated using the specified polynomial smoothing technique. By default, the +1-Jacobi smoother serves as the base smoother, offering theoretical guarantees on the +resulting convergence factor [15, 27]. Alternative combinations are experimental and +lack established guarantees.
      +

      Remark 4. Many of the coarsest-level solvers apply to a specific coarsest-matrix +layout; therefore, setting the solver after the layout may change the layout to either +distributed or replicated. Similarly, setting the layout after the solver may change the +solver. +

      More precisely, UMFPACK and SuperLU require the coarsest-level matrix to be +replicated, while SuperLU_Dist and KRM require it to be distributed. In these cases, +setting the coarsest-level solver implies that the layout is redefined according to the +solver, ovverriding any previous settings. MUMPS, point-Jacobi, hybrid Gauss-Seidel +and block-Jacobi can be applied to replicated and distributed matrices, thus their +choice does not modify any previously specified layout. It is worth noting that, when +the matrix is replicated, the point-Jacobi, hybrid Gauss-Seidel and block-Jacobi solvers +and their 1- versions reduce to the corresponding local solver objects (see Remark 2). +For the point-Jacobi and Gauss-Seidel solvers, these objects correspond to a single +point-Jacobi sweep and a single Gauss-Seidel sweep, respectively, which are very poor +solvers. +

      On the other hand, the distributed layout can be used with any solver but +UMFPACK and SuperLU; therefore, if any of these two solvers has already been +selected, the coarsest-level solver is changed to block-Jacobi, with the previously chosen +solver applied to the local blocks. Likewise, the replicated layout can be used with any +solver but SuperLu_Dist and KRM; therefore, if SuperLu_Dist or KRM have been +previously set, the coarsest-level solver is changed to the default sequential +solver. +

      In a parallel setting with many cores, we suggest to the users to change the default +coarsest solver for using the KRM choice, i.e. a parallel distributed iterative solution of +the coarsest system based on Krylov methods. + + + +

      Remark 4. The argument idx can be used to allow finer control for those solvers; +for instance, by specifying the keyword MUMPS_IPAR_ENTRY and an appropriate value +for idx, it is possible to set any entry in the MUMPS integer control array. See also +Sec. 6. +

      + + + +


      + + + +
      +

      +

      + +





      what

      data type

      val

      default

      comments






      ML_CYCLE

      character(len=*)

      VCYCLE +

      WCYCLE +

      KCYCLE +

      ADD

      VCYCLE

      Multilevel cycle: V-cycle, W-cycle, K-cycle, +and additive composition.






      CYCLE_SWEEPS

      integer

      Any integer +

      number 1

      1

      Number of multilevel cycles.






      +
      Table 2: Parameters defining the multilevel cycle and the number of cycles to be +applied.
      + + + +

      +
      +
      + + + +


      + + + +
      +

      + + + +

      + + + + + + + +
      Note. The aggregation algorithm stops when at least one of the following criteria is met: the coarse size threshold,
      +
      the minimum coarsening ratio, or the maximum number of levels is reached.
      +
      Therefore, the actual number of levels may be smaller than the specified maximum number of levels.
      +





      what

      data type

      val

      default

      comments






      MIN_COARSE_SIZE_PER_PROCESS

      integer

      Any number +

      > 0

      200

      Coarse size threshold per process. The +aggregation stops if the global number of +variables of the computed coarsest matrix +is lower than or equal to this threshold +multiplied by the number of processes +(see Note).






      MIN_COARSE_SIZE

      integer

      Any number +

      > 0

      -1

      Coarse size threshold. The aggregation +stops if the global number of variables +of the computed coarsest matrix is lower +than or equal +to this threshold (see Note). If negative, +it is ignored in favour of the default for +MIN_COARSE_SIZE_PER_PROCESS.






      MIN_CR_RATIO

      real

      Any number +

      > 1

      1.5

      Minimum coarsening +ratio. The aggregation stops if the ratio +between the global matrix dimensions at +two consecutive levels is lower than or +equal to this threshold (see Note).






      MAX_LEVS

      integer

      Any integer +

      number > 1

      20

      Maximum number of levels. The +aggregation stops if the number of levels +reaches this value (see Note).






      PAR_AGGR_ALG

      character(len=*)

      ’DEC’, +’SYMDEC’, +’COUPLED’

      ’DEC’

      Parallel aggregation algorithm. +

      the SYMDEC option applies decoupled +aggregation to the sparsity pattern of +A + AT .






      AGGR_TYPE

      character(len=*)

      SOC1, +SOC2, +MATCHBOXP

      SOC1

      Type of aggregation algorithm: currently, +for the +decoupled aggregation we implement two +measures of strength of connection, the +one by Vaněk, Mandel and Brezina [35], +and the one by Gratton et al [24]. The +coupled aggregation is based on a parallel +version of the half-approximate matching +implemented in the MatchBox-P software +package [9].






      AGGR_SIZE

      integer

      Any integer +

      power of 2, +with +aggr_size +2

      4

      Maximum size of aggregates when the +coupled aggregation based on matching +is applied. For aggressive coarsening +with size of aggregate larger than 8 +we recommend the use of smoothed +prolongators. Used only with ’COUPLED’ +and ’MATCHBOXP’






      AGGR_PROL

      character(len=*)

      SMOOTHED, +UNSMOOTHED

      SMOOTHED

      Prolongator used by the aggregation +algorithm: smoothed or unsmoothed (i.e., +tentative prolongator).











      + + + +
      Table 3: Parameters defining the aggregation algorithm.
      +
      + + + +

      +
      +
      + + + +


      + + + +
      +

      +

      + + +
      Note. Different thresholds at different levels, such as those used in [35, Section 5.1], can be easily set by invoking the rou-
      +
      tine set with the parameter ilev.
      +





      what

      data type

      val

      default

      comments






      AGGR_ORD

      character(len=*)

      ’NATURAL’ +

      ’DEGREE’

      ’NATURAL’

      Initial ordering of indices for +the decoupled aggregation algorithm: +either natural ordering or sorted by +descending degrees of the nodes in the +matrix graph.






      AGGR_THRESH

      real(kind_parameter)

      Any real +

      number  +[0,1]

      0.01

      The threshold θ in the strength of +connection algorithm. See also the note +at the bottom of this table.






      AGGR_FILTER

      character(len=*)

      ’FILTER’ +

      ’NOFILTER’

      ’NOFILTER’

      Matrix used in computing the smoothed +prolongator: filtered or unfiltered.











      +
      Table 4: Parameters defining the aggregation algorithm (continued).
      + + + +

      +
      +
      + + + +


      + + + +
      +

      + + + +

      + + +
      Note. Defaults for COARSE_SOLVE and COARSE_SUBSOLVE are chosen in the following order:
      +
      single precision version – MUMPS if installed, then SLU if installed, ILU otherwise;
      +
      double precision version – UMF if installed, then MUMPS if installed, then SLU if installed, ILU otherwise.
      + + + + +
      Note. Further options for coarse solvers are contained in Table 6.
      +
      For a first use it is suggested to use the default options obtained by simply selecting the solver type.
      +





      what

      data type

      val

      default

      comments






      COARSE_MAT

      character(len=*)

      DIST +

      REPL

      REPL

      Coarsest matrix layout: distributed among the +processes or replicated on each of them.






      COARSE_SOLVE

      character(len=*)

      MUMPS +

      UMF +

      SLU +

      SLUDIST +

      ILU +

      JACOBI +

      GS +

      BJAC +

      KRM +

      L1-JACOBI +

      L1-BJAC +

      L1-FBGS

      See Note.

      Solver used at the coarsest level: sequential +LU from MUMPS, UMFPACK, or SuperLU +(plus triangular solve); distributed LU from +MUMPS or SuperLU_Dist (plus triangular solve); +point-Jacobi, hybrid Gauss-Seidel or block-Jacobi +and related 1-versions; Krylov Method (flexible +Conjugate Gradient) coupled with the block-Jacobi +preconditioner with ILU(0) on the blocks. Note +that UMF and SLU require the coarsest matrix to +be replicated, SLUDIST, JACOBI, GS, BJAC and KRM +require it to be distributed, MUMPS can be used with +either a replicated or a distributed matrix. When +any of the previous solvers is specified, the matrix +layout is set to a default value which allows the use +of the solver (see Remark 4, p. 21). Note also that +UMFPACK and SuperLU_Dist are available only in +double precision.






      COARSE_SUBSOLVE

      character(len=*)

      ILU +

      ILUT +

      MILU +

      MUMPS +

      SLU +

      UMF +

      INVT +

      INVK +

      AINV

      See Note.

      Solver for the diagonal blocks of the coarsest +matrix, in case the block Jacobi solver is chosen +as coarsest-level solver: ILU(p), ILU(p,t), MILU(p), +LU from MUMPS, SuperLU or UMFPACK +(plus triangular solve), Approximate Inverses +INVK(p,q), INVT(p1,p2,t1,t2) and AINV(t); note +that approximate inverses are specifically suited +for GPUs since they do not employ triangular +system solve kernels, see [3]. Note that UMFPACK +and SuperLU_Dist are available only in double +precision.











      what

      data type

      val

      default

      comments






      COARSE_SWEEPS

      integer

      Any +integer +

      number > +0

      10

      Number of sweeps when JACOBI, GS or BJAC is +chosen as coarsest-level solver.






      COARSE_FILLIN

      integer

      Any +integer +

      number +0

      0

      Fill-in level p of the ILU factorizations and first +fill-in for the approximate inverses.






      COARSE_ILUTHRS

      real(kind_parameter)

      Any real +

      number +0

      0

      Drop tolerance t in the ILU(p,t) factorization and +first drop-tolerance for the approximate inverses.











      + + + +
      Table 5: Parameters defining the solver at the coarsest level (continued).
      + + + +

      +
      +
      + + + +


      + + + +
      +

      + + + +

      + + + + + + + + + + + + + + +





      what

      data type

      val

      default

      comments






      BJAC_STOP

      character(len=*)

      FALSE +

      TRUE

      FALSE

      Select whether to use a stopping criterion for +the Block-Jacobi method used as a coarse +solver.






      BJAC_TRACE

      character(len=*)

      FALSE +

      TRUE

      FALSE

      Select whether to print a trace for the +calculated residual for the Block-Jacobi +method used as a coarse solver.






      BJAC_ITRACE

      integer

      Any integer +

      > 0

      -1

      Number of iterations after which a trace is to +be printed.






      BJAC_RESCHECK

      integer

      Any integer +

      > 0

      -1

      Number of iterations after which a residual is +to be calculated.






      BJAC_STOPTOL

      real(kind_parameter)

      Any real +

      < 1

      0

      Tolerance for the stopping criterion on the +residual.






      KRM_METHOD

      character(len=*)

      CG +

      FCG +

      CGS +

      CGR +

      BICG +

      BICGSTAB +

      BICGSTABL +

      RGMRES

      FCG

      A string that defines the iterative method to +be used when employing a Krylov method +KRM as a coarse solver. CG the Conjugate +Gradient method; FCG the Flexible Conjugate +Gradient method; CGS the Conjugate Gradient +Stabilized +method; GCR the Generalized Conjugate +Residual method; FCG the Flexible Conjugate +Gradient method; BICG the Bi-Conjugate +Gradient method; BICGSTAB the Bi-Conjugate +Gradient Stabilized method; BICGSTABL the +Bi-Conjugate Gradient Stabilized method +with restarting; RGMRES the Generalized +Minimal Residual method with restarting. +Refer to the PSBLAS guide [21] for further +information.






      KRM_KPREC

      character(len=*)

      Table 1

      BJAC

      The one-level +preconditioners from the Table 1 can be used +for the coarse Krylov solver.






      KRM_SUB_SOLVE

      character(len=*)

      Table 5

      ILU

      Solver for the diagonal blocks of the coarsest +matrix preconditioner, in case the block Jacobi +solver is chosen +as KRM_KPREC: ILU(p), ILU(p,t), MILU(p), +LU from MUMPS, SuperLU or UMFPACK +(plus triangular solve), Approximate Inverses +INVK(p,q), INVT(p1,p2,t1,t2) and AINV(t); +The same caveat from Table 5 applies here.






      KRM_GLOBAL

      character(len=*)

      TRUE, +FALSE

      FALSE

      Choose between a global Krylov solver, all +unknowns on a single node, or a distributed +one. The default choice is the distributed +solver.






      KRM_EPS

      real(kind_parameter)

      Real < 1

      10-6

      The stopping tolerance.






      KRM_IRST

      integer

      Integer +

      1

      30

      An integer specifying the restart parameter. +This is employed for the BiCGSTABL or RGMRES +methods, otherwise it is ignored.






      KRM_ISTOPC

      integer

      Integers +1,2,3

      2

      If 1 then the method uses the normwise +backward error in the infinity norm; if 2, the +it uses the relative residual in the 2-norm; if 3 +the relative residual reduction in the 2-norm is +used instead; refer to the PSBLAS [21] guide +for the details.






      KRM_ITMAX

      integer

      Integer +

      1

      40

      The maximum number of iterations to +perform.






      KRM_ITRACE

      integer

      Integer +

      0

      -1

      If > 0 print out +an informational message about convergence +every KRM_ITRACE iterations. If = 0 print a +message in case of convergence failure.






      KRM_FILLIN

      integer

      Integer +

      0

      0

      Fill-in level p of the ILU factorizations and +first fill-in for the approximate inverses.






      + + + +
      Table 6: Additional parameters defining the solver at the coarsest level.
      + + + +

      +
      +
      + + + +


      + + + +
      +

      +

      + + + +





      what

      data type

      val

      default

      comments






      SMOOTHER_TYPE

      character(len=*)

      JACOBI +

      GS +

      BGS +

      BJAC +

      AS +

      L1-JACOBI +

      L1-BJAC +

      L1-FBGS +

      POLY

      FBGS

      Type of smoother used in the multilevel +preconditioner: point-Jacobi, hybrid +(forward) Gauss-Seidel, hybrid backward +Gauss-Seidel, block-Jacobi, 1-Jacobi, +1–hybrid (forward) +Gauss-Seidel, 1-point-Jacobi and Additive +Schwarz, polynomial accelerators; see [15] +and Remark 3 (p. 21). +

      It is ignored by one-level preconditioners.






      SUB_SOLVE

      character(len=*)

      JACOBI +GS +

      BGS +

      ILU +

      ILUT +

      MILU +

      MUMPS +

      SLU +

      UMF +

      INVT +

      INVK +

      AINV

      GS and BGS for pre- +and post-smoothers of +multilevel +preconditioners, +respectively +

      ILU for block-Jacobi +and Additive Schwarz +one-level +preconditioners

      The local solver to be used with the +smoother or one-level preconditioner (see +Remark 2, page 24): point-Jacobi, hybrid +(forward) Gauss-Seidel, hybrid backward +Gauss-Seidel, ILU(p), ILU(p,t), MILU(p), +LU from MUMPS, +SuperLU or UMFPACK (plus triangular +solve), Approximate Inverses INVK(p,q), +INVT(p1,p2,t1,t2) and AINV(t); note +that approximate inverses are specifically +suited for GPUs since they do not employ +triangular system solve kernels, see [3]. See +Note for details on hybrid Gauss-Seidel.






      SMOOTHER_SWEEPS

      integer

      Any integer +

      number 0

      1

      Number of sweeps of the smoother or +one-level preconditioner. In the multilevel +case, no pre-smother or post-smoother +is used if this parameter is set to 0 +together with pos=PRE or pos=POST, +respectively. Is ignored if the smoother is +POLY






      POLY_DEGREE

      integer

      Any integer +

      number 1 +and 30

      1

      Degree of the polynomial accelerator, is +equal to the number of matrix-vector +products performed by the smoother. Is +ignored if the smoother is not POLY






      +
      Table 7: Parameters defining the smoother or the details of the one-level +preconditioner.
      + + + +

      +
      +
      + + + +


      + + + +
      +

      +

      + + + + + + + +





      what

      data type

      val

      default

      comments






      SUB_OVR

      integer

      Any integer +

      number 0

      1

      Number of overlap layers, for Additive +Schwarz only.

      SUB_RESTR

      character(len=*)

      HALO +

      NONE

      HALO

      Type of restriction operator, for Additive +Schwarz only: HALO for taking into account +the overlap, NONE for neglecting it. +

      Note that HALO must be chosen for the +classical Addditive Schwarz smoother and +its RAS variant.






      SUB_PROL

      character(len=*)

      SUM +

      NONE

      NONE

      Type of prolongation operator, for Additive +Schwarz only: SUM for adding the +contributions from the overlap, NONE for +neglecting them. +

      Note that SUM must be chosen for the +classical Additive Schwarz smoother, and +NONE for its RAS variant.






      SUB_FILLIN

      integer

      Any integer +

      number 0

      0

      Fill-in level p of the incomplete LU +factorizations.






      SUB_ILUTHRS

      real(kind_parameter)

      Any real +number 0

      0

      Drop tolerance t in the ILU(p,t) +factorization.






      MUMPS_LOC_GLOB

      character(len=*)

      LOCAL_SOLVER +

      GLOBAL_SOLVER

      GLOBAL_SOLVER

      Whether MUMPS should be used as a +distributed solver, or as a serial solver acting +only on the part of the matrix local to each +process.






      MUMPS_IPAR_ENTRY

      integer

      Any integer +number

      0

      Set an entry in the MUMPS integer control +array, as chosen via the idx optional +argument.






      MUMPS_RPAR_ENTRY

      real

      Any real number

      0

      Set an entry in the MUMPS real control +array, as chosen via the idx optional +argument.






      +
      Table 8: Parameters defining the smoother or the details of the one-level preconditioner +(continued).
      + + + +

      +
      +
      + + + +


      + + + +
      +

      +

      + + + +





      what

      data type

      val

      default

      comments






      POLY_VARIANT

      character(len=*)

      CHEB_4 +

      CHEB_4_OPT +

      CHEB_1_OPT

      CHEB_4

      Select the type of +polynomial accelerator. +The CHEB_4 and +CHEB_4_OPT types +are those based on the +Chebyshev +polynomials of +the 4th-kind described +in [27]. The +CHEB_1_OPT version +is the one described +in [15] and based on +the Chebyshev +polynomials of the +1st-kind.






      POLY_RHO_ESTIMATE

      character(len=*)

      POLY_RHO_EST_POWER

      POLY_RHO_EST_POWER

      Algorithm for +estimating the spectral +radius of the smoother +to +which the polynomial +acceleration is applied. +The only implemented +algorithm is the power +method; see also the +two following options.






      POLY_RHO_ESTIMATE_ITERATIONS

      integer

      Any integer +

      number 1

      20

      Number of iterations +for the spectral radius +estimate.






      POLY_RHO_BA

      real(kind_parameter)

      Any real +

      number (0,1]

      1

      Sets an estimate of +the spectral radius of +the base smoother to +which the polynomial +accelerator is applied.






      +
      Table 9: Parameters defining the smoother or the details of the one-level preconditioner +(continued).
      + + + +

      +
      + + + +

      5.3 Method hierarchy_build -
      5.4 _build

      +
      +

      +

      call p%hierarchy_build(a,desc_a,info)
      +

      +

      This method builds the hierarchy of matrices and restriction/prolongation operators for +the multilevel preconditioner p, according to the requirements made by the user +through the methods init and set. +

      Arguments +

      + + + + + +

      a

      type(psb_xspmat_type), intent(in).

      The sparse matrix structure containing the local part of the matrix +to be preconditioned. Note that x must be chosen according to the +real/complex, single/double precision version of AMG4PSBLAS +under use. See the PSBLAS User’s Guide for details [21].

      desc_a

      type(psb_desc_type), intent(in).

      The communication descriptor of a. See the PSBLAS User’s Guide +for details [21].

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      + + + +

      5.4 Method smoothers_build -
      5.5 Method build -
      5.6 Method apply -
      5.7 Method free -
      5.8 Method descr -
      5.9 Auxiliary Methods -
      5.9.1 Method: dump -
      5.9.2 Method: clone -
      5.9.3 Method: sizeof -
      5.9.4 Method: allocate_wrk -
      5.9.5 Method: free_wrk -

      +class="cmr-12">_build +
      +

      +

      call p%smoothers_build(a,desc_a,p,info[,amold,vmold,imold])
      +

      +

      This method builds the smoothers and the coarsest-level solvers for the multilevel +preconditioner p, according to the requirements made by the user through the methods +init and set, and based on the aggregation hierarchy produced by a previous call to +hierarchy_build (see Section 5.3). +

      Arguments +

      + + + + + + + + + + + +

      a

      type(psb_xspmat_type), intent(in).

      The sparse matrix structure containing the local part of the matrix +to be preconditioned. Note that x must be chosen according to the +real/complex, single/double precision version of AMG4PSBLAS +under use. See the PSBLAS User’s Guide for details [21].

      desc_a

      type(psb_desc_type), intent(in).

      The communication descriptor of a. See the PSBLAS User’s Guide +for details [21].

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      amold

      class(psb_x_base_sparse_mat), intent(in), optional.

      The desired dynamic type for internal matrix components; this +allows e.g. running on GPUs; it needs not be the same on all +processes. See the PSBLAS User’s Guide for details [21].

      vmold

      class(psb_x_base_vect_type), intent(in), optional.

      The desired dynamic type for internal vector components; this +allows e.g. running on GPUs.

      imold

      class(psb_i_base_vect_type), intent(in), optional.

      The desired dynamic type for internal integer vector components; +this allows e.g. running on GPUs.

      + + + +

      5.5 Method build

      +
      +

      +

      call p%build(a,desc_a,info[,amold,vmold,imold])
      +

      +

      This method builds the preconditioner p according to the requirements made by the +user through the methods init and set (see Sections 5.3 and 5.4 for multilevel +preconditioners). It is mostly provided for backward compatibility; indeed, it is +internally implemented by invoking the two previous methods hierarchy_build and +smoothers_build, whose nomenclature would however be somewhat unnatural when +dealing with simple one-level preconditioners. +

      Arguments +

      + + + + + + + + + + + +

      a

      type(psb_xspmat_type), intent(in).

      The sparse matrix structure containing the local part of the matrix +to be preconditioned. Note that x must be chosen according to the +real/complex, single/double precision version of AMG4PSBLAS +under use. See the PSBLAS User’s Guide for details [21].

      desc_a

      type(psb_desc_type), intent(in).

      The communication descriptor of a. See the PSBLAS User’s Guide +for details [21].

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      amold

      class(psb_x_base_sparse_mat), intent(in), optional.

      The desired dynamic type for internal matrix components; this +allows e.g. running on GPUs; it needs not be the same on all +processes. See the PSBLAS User’s Guide for details [21].

      vmold

      class(psb_x_base_vect_type), intent(in), optional.

      The desired dynamic type for internal vector components; this +allows e.g. running on GPUs.

      imold

      class(psb_i_base_vect_type), intent(in), optional.

      The desired dynamic type for internal integer vector components; +this allows e.g. running on GPUs.

      +

      The method can be used to build multilevel preconditioners too. + + + +

      5.6 Method apply

      +
      +

      +

      call p%apply(x,y,desc_a,info [,trans,work])
      +

      +

      This method computes y = op(B-1) x, where B is a previously built preconditioner, +stored into p, and op denotes the preconditioner itself or its transpose, according to the +value of trans. Note that, when AMG4PSBLAS is used with a Krylov solver from +PSBLAS, p%apply is called within the PSBLAS method psb_krylov and hence it is +completely transparent to the user. +

      Arguments +

      + + + + + + + + + + + +

      x

      type(kind_parameter), dimension(:), intent(in)—.

      The local part of the vector x. Note that type and kind_parameter +must be chosen according to the real/complex, single/double +precision version of AMG4PSBLAS under use.

      y

      type(kind_parameter), dimension(:), intent(out)—.

      The local part of the vector y. Note that type and kind_parameter +must be chosen according to the real/complex, single/double +precision version of AMG4PSBLAS under use.

      desc_a

      type(psb_desc_type), intent(in).

      The communication descriptor associated to the matrix to be +preconditioned.

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      trans

      character(len=1), optional, intent(in).

      If trans = N,n then op(B-1) = B-1; if trans = T,t +then op(B-1) = B-T (transpose of B-1); if trans = C,c then +op(B-1) = B-C (conjugate transpose of B-1).

      work

      type(kind_parameter), dimension(:), optional, target—.

      Workspace. Its size should be at least 4 * psb_cd_get_local_ +cols(desc_a) (see the PSBLAS User’s Guide). Note that type and +kind_parameter must be chosen according to the real/complex, +single/double precision version of AMG4PSBLAS under use.

      + + + +

      5.7 Method free

      +
      +

      +

      call p%free(p,info)
      +

      +

      This method deallocates the preconditioner data structure p. +

      Arguments +

      + +

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for +details.

      + + +

      5.8 Method descr

      +
      +

      +

      call p%descr(info, [iout, root, verbosity])
      +

      +

      This method prints a description of the preconditioner p to the standard output or to a +file. It must be called after hierachy_build and smoothers_build, or build, have +been called. +

      Arguments +

      + + + + + + + +

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      iout

      integer, intent(in), optional.

      The id of the file where the preconditioner description will be +printed; the default is the standard output.

      root

      integer, intent(in), optional.

      The id of the process where the preconditioner description +will be printed; the default is psb_root_.

      verbosity

      integer, intent(in), optional.

      The verbosity level of the description. Default value is 0. For +values higher than 0, it prints out further information, e.g., for +a distributed multilevel preconditioner the size of the coarse +matrices on every process.

      +

      +

      5.9 Auxiliary Methods

      +

      Various functionalities are implemented as additional methods of the preconditioner +object. +

      +

      5.9.1 Method: dump
      +
      +

      +

      call p%dump(info[,istart,iend,prefix,head,ac,rp,smoother,solver,global_num])
      +

      + + +

      Dump on file. +

      Arguments +

      + + + +

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      amold

      class(psb_x_base_sparse_mat), intent(in), optional.

      The desired dynamic type for internal matrix components; this +allows e.g. running on GPUs; it needs not be the same on all +processes. See the PSBLAS User’s Guide for details [21].

      +

      +

      5.9.2 Method: clone
      +
      +

      +

      call p%clone(pout,info)
      +

      +

      Create a (deep) copy of the preconditioner object. +

      Arguments +

      + + + +

      pout

      type(amg_xprec_type), intent(out).

      The copy of the preconditioner data structure. Note that x must +be chosen according to the real/complex, single/double precision +version of AMG4PSBLAS under use.

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      +

      +

      5.9.3 Method: sizeof
      +
      +

      +

      sz = p%sizeof([global])
      +

      +
      + +

      global

      logical, optional.

      Whether the global or local preconditioner memory occupation is +desired. Default: .false..

      +Return memory footprint in bytes. + + +

      +

      5.9.4 Method: allocate_wrk
      +
      +

      +

      call p%allocate_wrk(info[, vmold])
      +

      +

      Allocate internal work vectors. Each application of the preconditioner uses a number of +work vectors which are allocated internally as necessary; therefore allocation and +deallocation of memory occurs multiple times during the execution of a Krylov method. +In most cases this strategy is perfectly acceptable, but on some platforms, most +notably GPUs, memory allocation is a slow operation, and the default behaviour would +lead to a slowdown. This method allows to trade space for time by preallocating +the internal workspace outside of the invocation of a Krylov method. When +using GPUs or other specialized devices, the vmold argument is also necessary +to ensure the internal work vectors are of the appropriate dynamic type to +exploit the accelerator hardware; when allocation occurs internally this is +taken care of based on the dynamic type of the x argument to the apply +method. +

      Arguments +

      + + + +

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      vmold

      class(psb_x_base_vect_type), intent(in), optional.

      The desired dynamic type for internal vector components; this +allows e.g. running on GPUs.

      +

      +

      5.9.5 Method: free_wrk
      +
      +

      +

      call p%free_wrk(info)
      +

      + + +

      Deallocate internal work vectors. +

      Arguments +

      + +

      info

      integer, intent(out).

      Error code. If no error, 0 is returned. See Section 7 for details.

      + + + + + + + +

      + id="tailuserhtmlse5.html"> diff --git a/docs/html/userhtmlse6.html b/docs/html/userhtmlse6.html index a3db46f5..da9e4040 100644 --- a/docs/html/userhtmlse6.html +++ b/docs/html/userhtmlse6.html @@ -29,7 +29,7 @@ class="cmr-12">up]

      6 Adding new smoother and solver objects to AMG4PSBLAS

      Developers can add completely new smoother and/or solver classes derived from the @@ -37,7 +37,7 @@ class="cmr-12">Developers can add completely new smoother and/or solver classes class="cmr-12">base objects in the library (see Remark 2 in Section 5.2), without recompiling the declare in the application program a variable of the new type;

    9. pass that variable as the argument to the set pass that variable as the argument to the set routine as in the following:

      @@ -96,8 +97,10 @@ class="cmr-12">the AMG4PSBLAS library has not been modified to account for this development.

      It is possible to define new values for the keyword WHAT in the set It is possible to define new values for the keyword WHAT in the set routine; if the library code does not recognize a keyword, it passes it down the composition hierarchy @@ -113,7 +116,8 @@ class="cmr-12">any keyword/value pair that does not pertain to a given solver is class="cmr-12">ignored.

      An example is provided in the source code distribution under the folder -tests/newslvtests/newslv. In this example we are implementing a new incomplete factorization variant (which is simply the ILU(0) factorization under a new name). Because of the @@ -136,32 +140,34 @@ class="cmr-12">The interfaces for the calls shown above are defined using id="TBL-23-1">

      smoother

      class(amg_x_base_smoother_type) +class="td11">

      smoother

      class(amg_x_base_smoother_type)

      The user-defined new smoother to be employed in the - preconditioner. +preconditioner.

      solver

      class(amg_x_base_solver_type) +class="td11">

      solver

      class(amg_x_base_solver_type)

      The user-defined new solver to be employed in the preconditioner.

      +class="td11">

      The user-defined new solver to be employed in the preconditioner.

      The other arguments are defined in the way described in Sec. 5.2. As an example, in the -tests/newslv code we define a new object of type amg_d_tlu_solver_typetests/newslv code we define a new object of type amg_d_tlu_solver_type, and we pass it as follows: diff --git a/docs/html/userhtmlse7.html b/docs/html/userhtmlse7.html index 6e1c59a4..83d32c02 100644 --- a/docs/html/userhtmlse7.html +++ b/docs/html/userhtmlse7.html @@ -29,12 +29,13 @@ class="cmr-12">up]

      7 Error Handling

      The error handling in AMG4PSBLAS is based on the PSBLAS error handling. Error conditions are signaled via an integer argument infoconditions are signaled via an integer argument info; whenever an error condition is detected, an error trace stack is built by the library up to the top-level, user-callable @@ -50,7 +51,7 @@ class="cmr-12">an error message should be printed. These options may be set by u class="cmr-12">PSBLAS error handling routines; for further details see the PSBLAS User’s Guide [21]. @@ -66,10 +67,7 @@ class="cmr-12">. - - - -

      A License

      AMG4PSBLAS is freely distributable under the following copyright terms: @@ -132,7 +132,7 @@ class="cmr-12">AMG4PSBLAS is distributed together with (a small part of) the gra class="cmr-12">library MatchBox-P [9]. Per the license requirements, we reproduce the relevant part diff --git a/docs/html/userhtmlse9.html b/docs/html/userhtmlse9.html index d4373dbf..45324c91 100644 --- a/docs/html/userhtmlse9.html +++ b/docs/html/userhtmlse9.html @@ -12,7 +12,7 @@ >

      B Contributor Covenant Code of Conduct

      -
      -

      Our Pledge We as members, contributors, and leaders pledge to make participation in @@ -78,11 +76,11 @@ class="cmr-12">and learning from the experience class="cmr-12">Focusing on what is best not just for us as individuals, but for the overall community

    10. +

      Examples of unacceptable behavior include: -

      Examples of unacceptable behavior include:

      • Other conduct which could reasonably be considered inappro professional setting

      Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of @@ -137,7 +135,7 @@ class="cmr-12">appointed representative at an online or offline event. Enforceme abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at community leaders responsible for enforcement at eocoe@na.iac.cnr.it. All @@ -151,7 +149,7 @@ class="cmr-12">incident.

      Enforcement Guidelines Community leaders will follow these Community Impact Guidelines in @@ -161,7 +159,7 @@ class="cmr-12">determining the consequences for any action they deem in violatio class="cmr-12">Conduct:

      1. + class="enumerate" id="x12-37002x1">

        Correction

        clarity around the nature of the violation and an explanation of class="cmr-12">behavior was inappropriate. A public apology may be requested.

      2. + class="enumerate" id="x12-37004x2">

        Warning

        channels like social media. Violating these terms may lead to a t class="cmr-12">or permanent ban.

      3. + class="enumerate" id="x12-37006x3">

        Temporary Ban

        public or private interaction with the people involved, including class="cmr-12">interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. +

      4. -
      5. + class="enumerate" id="x12-37008x4">

        Permanent Ban

        A permanent ban from any sort of public interaction within the community.

      Attribution This Code of Conduct is adapted from the Contributor Covenant, version 2.0, available at available at https://www.contributor-covenant.org/version/2/0/code_of. Community Impact Guidelines were inspired by Mozilla’s co enforcement ladder. For answers to common questions about this code of conduct, see the FAQ at the FAQ at https://www.contributor-covenant.org/faq. Translations are available at @@ -284,7 +282,7 @@ class="cmr-12">.