diff --git a/docs/amg4psblas_1.0-guide.pdf b/docs/amg4psblas_1.0-guide.pdf
index efadc6b5..edff7f0a 100644
Binary files a/docs/amg4psblas_1.0-guide.pdf and b/docs/amg4psblas_1.0-guide.pdf differ
diff --git a/docs/html/index.html b/docs/html/index.html
index 96d69556..ffb1cef4 100644
--- a/docs/html/index.html
+++ b/docs/html/index.html
@@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR
Software version: 1.0
April 12, 2021
+class="cmr-12">May 11th, 2021
diff --git a/docs/html/userhtml.css b/docs/html/userhtml.css
index 02a4674d..70c936b2 100644
--- a/docs/html/userhtml.css
+++ b/docs/html/userhtml.css
@@ -149,6 +149,7 @@ div.abstract {width:100%;}
.Ovalbox-thick { padding-left:3pt; padding-right:3pt; border:solid thick; }
.shadowbox { padding-left:3pt; padding-right:3pt; border:solid thin; border-right:solid thick; border-bottom:solid thick; }
.doublebox { padding-left:3pt; padding-right:3pt; border-style:double; border:solid thick; }
+.rotatebox{display: inline-block;}
.figure img.graphics {margin-left:10%;}
.lstlisting .label{margin-right:0.5em; }
div.lstlisting{font-family: monospace; white-space: nowrap; margin-top:0.5em; margin-bottom:0.5em; }
diff --git a/docs/html/userhtml.html b/docs/html/userhtml.html
index 96d69556..ffb1cef4 100644
--- a/docs/html/userhtml.html
+++ b/docs/html/userhtml.html
@@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR
Software version: 1.0
April 12, 2021
+class="cmr-12">May 11th, 2021
diff --git a/docs/html/userhtmlli5.html b/docs/html/userhtmlli5.html
index fbb1075a..e7026c20 100644
--- a/docs/html/userhtmlli5.html
+++ b/docs/html/userhtmlli5.html
@@ -211,7 +211,7 @@ class="cmr-12"> Pothen, Distributed-memory parallel algorithms for matching and
coloring, in PCO11 New Trends in Parallel Computing and Optimization,
+class="cmr-12">, in PCO’11 New Trends in Parallel Computing and Optimization,
IEEE International Symposium on Parallel and Distributed Processing
abide by its terms:
AMG4PSBLAS is distributed together with (a small part) of the graph-matching
+class="cmr-12">AMG4PSBLAS is distributed together with (a small part of) the graph-matching
@@ -133,7 +133,7 @@ class="cmr-12">[9]. Per the license requirements, we reproduce the relative part
+class="cmr-12">. Per the license requirements, we reproduce the relevant part
here.
diff --git a/docs/html/userhtmlse9.html b/docs/html/userhtmlse9.html
index dc508773..1f30b61b 100644
--- a/docs/html/userhtmlse9.html
+++ b/docs/html/userhtmlse9.html
@@ -91,7 +91,7 @@ class="cmr-12">Trolling, insulting or derogatory comments, and personal or polit
class="cmr-12">Public or private harassment
Publishing others private information, such as a physical or email address,
+class="cmr-12">Publishing others’ private information, such as a physical or email address,
without their explicit permission
@@ -234,7 +234,7 @@ class="cmr-12">_of_conduct
.html. Community Impact Guidelines were inspired by Mozillas code of conduct
+class="cmr-12">. Community Impact Guidelines were inspired by Mozilla’s code of conduct
enforcement ladder. For answers to common questions about this code of conduct, see
github.com/sfilipponepsctoolkit/amg4psblaspsctoolkit/issues>.
diff --git a/docs/html/userhtmlsu4.html b/docs/html/userhtmlsu4.html
index ffe106f5..d9e5c8c6 100644
--- a/docs/html/userhtmlsu4.html
+++ b/docs/html/userhtmlsu4.html
@@ -36,8 +36,8 @@ class="cmr-12">If you find any bugs in our codes, please report them through our
on
https://github.com/psctoolkit/amg4psblas/issues
https://github.com/psctoolkit/psctoolkit/issues
To enable us to track the bug, please provide a log from the failing application, the
diff --git a/docs/html/userhtmlsu5.html b/docs/html/userhtmlsu5.html
index 4c715b88..6033e14c 100644
--- a/docs/html/userhtmlsu5.html
+++ b/docs/html/userhtmlsu5.html
@@ -29,40 +29,41 @@ class="cmr-12">3.5 Example and test programs
The package contains the examples and tests directories; both of them are
-further divided into The package contains a samples directory, divided in two subdirs simple and
+advanced; both of them are further divided into fileread and pdegen subdirectories. Their purpose is as
+class="cmr-12">subdirectories.
follows:
+class="cmr-12">Their purpose is as follows:
-
examples
- simple
- contains a set of simple example programs with a predefined choice
+class="cmr-12">contains a set of simple example programs with a predefined choice of
of preconditioners, selectable via integer values. These are intended to get
+class="cmr-12">preconditioners, selectable via integer values. These are intended to get
acquainted with the multilevel preconditioners available in AMG4PSBLAS.
-
tests
- advanced
- contains a set of more sophisticated examples that will allow the user, via
+class="cmr-12">contains a set of more sophisticated examples that will allow the user,
the input files in the via the input files in the runs subdirectories, to experiment with the full range
+class="cmr-12">subdirectories, to experiment with the full
of preconditioners implemented in the package.
-range of preconditioners implemented in the package.
+
The fileread directories contain sample programs that read sparse matrices from files,
diff --git a/docs/html/userhtmlsu6.html b/docs/html/userhtmlsu6.html
index 9c5fa760..0fa4d4a6 100644
--- a/docs/html/userhtmlsu6.html
+++ b/docs/html/userhtmlsu6.html
@@ -71,38 +71,39 @@ class="cmr-12">The part of the code dealing with reading and assembling the spar
right-hand side vector and the deallocation of the relevant data structures, performed
through the PSBLAS routines for sparse matrix and vector management, is not
+class="cmr-12">through the PSBLAS routines for sparse matrix and vector management,
reported here for the sake of conciseness. The complete code can be found in the
+class="cmr-12">is not reported here for the sake of conciseness. The complete code can be
example program file found in the example program file amg_dexample_ml.f90, in the directory examples/fileread of
-the AMG4PSBLAS implementation (see Section, in the directory
+samples/simple/fileread of the AMG4PSBLAS implementation (see Section 3.5). A sample test problem along
+class="cmr-12">). A
with the relevant input data is available in examples/fileread/runs. For details on
+class="cmr-12">sample test problem along with the relevant input data is available in
+samples/simple/fileread/runs. For details on the use of the PSBLAS routines, see
the use of the PSBLAS routines, see the PSBLAS User’s Guidethe PSBLAS User’s Guide [20].
-
The setup and application of the default multilevel preconditioner for the real single
+
The setup and application of the default multilevel preconditioner for the real single
precision and the complex, single and double precision, versions are obtained
for
details). If these versions are installed, the corresponding codes are available in
examples/fileread/samples/simple/fileread.
@@ -278,7 +280,7 @@ class="cmr-12">For all the previous preconditioners, example programs where the
class="cmr-12">and the right-hand side are generated by discretizing a PDE with Dirichlet
boundary conditions are also available in the directory examples/pdegensamples/simple/pdegen.
diff --git a/docs/html/userhtmlsu7.html b/docs/html/userhtmlsu7.html
index c0086ee7..d784817b 100644
--- a/docs/html/userhtmlsu7.html
+++ b/docs/html/userhtmlsu7.html
@@ -145,7 +145,7 @@ class="cmr-12">GPU environment
-
+
call desc_a%cnv(mold=igmold)
call a%cscnv(info,mold=agmold)
@@ -172,7 +172,7 @@ class="cmr-12">GPU environment
-
+
Listing 7: setup of a GPU-enabled test program part three.
@@ -180,25 +180,30 @@ class="content">setup of a GPU-enabled test program part three. It is very important to employ solvers that are suited to the GPU, i.e. solvers that
+
It is very important to employ smoothers and coarsest solvers that are suited to the
do NOT employ triangular system solve kernels. Solvers that satisfy this constraint
+class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that
include:
+class="cmr-12">satisfy this constraint include:
- JACOBI
- BJAC with the following methods on the local blocks:
+
+ - INVK
-
- -
+
- INVT
-
- -
+
- AINV
-
+and their ℓ1
An auxiliary input argument that can be passed to the underlying
objects. |
-
+class="cmr-12">objects.
+
A variety of preconditioners can be obtained by setting the appropriate
diff --git a/docs/src/building.tex b/docs/src/building.tex
index 282e4b34..a0013a36 100644
--- a/docs/src/building.tex
+++ b/docs/src/building.tex
@@ -190,22 +190,23 @@ make install
\subsection{Bug reporting}
If you find any bugs in our codes, please report them through our
issues page on \\[2mm]
-\url{https://github.com/psctoolkit/amg4psblas/issues}\\
+\url{https://github.com/psctoolkit/psctoolkit/issues}\\
To enable us to track the bug, please provide a log from the failing
application, the test conditions, and ideally a self-contained test
program reproducing the issue.
\subsection{Example and test programs\label{sec:ex_and_test}}
-The package contains the \verb|examples| and \verb|tests| directories;
+The package contains a \verb|samples| directory, divided in two
+subdirs \verb|simple| and \verb|advanced|;
both of them are further divided into \verb|fileread| and
\verb|pdegen| subdirectories. Their purpose is as follows:
\begin{description}
-\item[\tt examples] contains a set of simple example programs with a
+\item[\tt simple] contains a set of simple example programs with a
predefined choice of preconditioners, selectable via integer
values. These are intended to get acquainted with the
multilevel preconditioners available in AMG4PSBLAS.
-\item[\tt tests] contains a set of more sophisticated examples that
+\item[\tt advanced] contains a set of more sophisticated examples that
will allow the user, via the input files in the \verb|runs|
subdirectories, to experiment with the full range of preconditioners
implemented in the package.
diff --git a/docs/src/configureout.txt b/docs/src/configureout.txt
index e1ae3914..29fdc72a 100644
--- a/docs/src/configureout.txt
+++ b/docs/src/configureout.txt
@@ -159,4 +159,4 @@ Some influential environment variables:
Use these variables to override the choices made by `configure' or to help
it to find libraries and programs with nonstandard names/locations.
-Report bugs to .
+Report bugs to .
diff --git a/docs/src/gettingstarted.tex b/docs/src/gettingstarted.tex
index cf3ecaca..d67cd548 100644
--- a/docs/src/gettingstarted.tex
+++ b/docs/src/gettingstarted.tex
@@ -129,9 +129,9 @@ relevant data structures, performed
through the PSBLAS routines for sparse matrix and vector management, is not reported
here for the sake of conciseness.
The complete code can be found in the example program file \verb|amg_dexample_ml.f90|,
-in the directory \verb|examples/fileread| of the AMG4PSBLAS implementation (see
+in the directory \verb|samples/simple/file|\-\verb|read| of the AMG4PSBLAS implementation (see
Section~\ref{sec:ex_and_test}). A sample test problem along with the relevant
-input data is available in \verb|examples/fileread/runs|.
+input data is available in \verb|samples/simple/fileread/runs|.
For details on the use of the PSBLAS routines, see the PSBLAS User's
Guide~\cite{PSBLASGUIDE}.
@@ -139,7 +139,7 @@ The setup and application of the default multilevel preconditioner
for the real single precision and the complex, single and double
precision, versions are obtained with straightforward modifications of the previous
example (see Section~\ref{sec:userinterface} for details). If these versions are installed,
-the corresponding codes are available in \verb|examples/fileread/|.
+the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|.
\begin{listing}[tbp]
\begin{center}
@@ -300,7 +300,7 @@ The corresponding example program is available in the file
For all the previous preconditioners, example programs where the sparse matrix and
the right-hand side are generated by discretizing a PDE with Dirichlet
-boundary conditions are also available in the directory \verb|examples/pdegen|.
+boundary conditions are also available in the directory \verb|samples/simple/pdegen|.
\vspace{-1em}\begin{listing}[tbh]
\ifpdf%
\begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran}
@@ -535,7 +535,8 @@ Krylov method. At the end of the code, we close the GPU environment
call prec%allocate_wrk(info)
t1 = psb_wtime()
call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,&
- & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,&
+ & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,&
+ & itrace=s_choice%itrace,&
& istop=s_choice%istopc,irst=s_choice%irst)
call prec%deallocate_wrk(info)
call psb_barrier(ctxt)
@@ -584,15 +585,18 @@ Krylov method. At the end of the code, we close the GPU environment
\caption{setup of a GPU-enabled test program part three.\label{fig:gpu-ex3}}
\end{listing}
-It is very important to employ solvers that are suited
-to the GPU, i.e. solvers that do NOT employ triangular
-system solve kernels. Solvers that satisfy this constraint include:
+It is very important to employ smoothers and coarsest solvers that are suited
+to the GPU, i.e. methods that do NOT employ triangular
+system solve kernels. Methods that satisfy this constraint include:
\begin{itemize}
\item \verb|JACOBI|
+\item \verb|BJAC| with the following methods on the local blocks:
+\begin{itemize}
\item \verb|INVK|
\item \verb|INVT|
\item \verb|AINV|
\end{itemize}
+\end{itemize}
and their $\ell_1$ variants.
%%% Local Variables:
diff --git a/docs/src/license.tex b/docs/src/license.tex
index 14785321..a1d6574c 100644
--- a/docs/src/license.tex
+++ b/docs/src/license.tex
@@ -87,9 +87,9 @@ terms: {\small
\end{verbatim}
}
\pagebreak
-AMG4PSBLAS is distributed together with (a small part) of the graph-matching
+AMG4PSBLAS is distributed together with (a small part of) the graph-matching
library MatchBox-P~\cite{MatchBoxP}. Per the license requirements, we reproduce
-the relative part here.
+the relevant part here.
{\small
\begin{verbatim}
// ***********************************************************************
diff --git a/docs/src/userguide.tex b/docs/src/userguide.tex
index ff3a8c44..517e138b 100644
--- a/docs/src/userguide.tex
+++ b/docs/src/userguide.tex
@@ -154,7 +154,7 @@ Preconditioners Package based on PSBLAS}
\flushright
\large Software version: 1.0\\
%\todaym
-\large April 12, 2021
+\large May 11th, 2021
\end{minipage}}
%\addtolength{\textwidth}{\centeroffset}
\vspace{\stretch{2}}
diff --git a/docs/src/userhtml.tex b/docs/src/userhtml.tex
index bea768f6..edf3eebe 100644
--- a/docs/src/userhtml.tex
+++ b/docs/src/userhtml.tex
@@ -114,7 +114,7 @@
%\today
Software version: 1.0\\
%\today
- April 12, 2021
+ May 11th, 2021
\clearpage
\ \\
\thispagestyle{empty}
diff --git a/examples/gpu/amg_dexample_gpu.f90 b/examples/gpu/amg_dexample_gpu.f90
index 142dabe0..13fc343e 100644
--- a/examples/gpu/amg_dexample_gpu.f90
+++ b/examples/gpu/amg_dexample_gpu.f90
@@ -39,23 +39,18 @@
!
! This sample program solves a linear system obtained by discretizing a
! PDE with Dirichlet BCs. The solver is CG, coupled with one of the
-! following multi-level preconditioner, as explained in Section 4.1 of
+! following multi-level preconditioner, as explained in Section 4.2 of
! the AMG4PSBLAS User's and Reference Guide:
!
-! - choice = 1, the default multi-level preconditioner solver, i.e.,
-! V-cycle with decoupled smoothed aggregation, 1 hybrid forward/backward
-! GS sweep as pre/post-smoother and UMFPACK as coarsest-level
-! solver (Sec. 4.1, Listing 1)
+! - choice = 1, a V-cycle with decoupled smoothed aggregation, 4 Jacobi
+! sweeps as pre/post-smoother and 8 Jacobi sweeps as coarsest-level
+! solver with replicated coarsest matrix
!
-! - choice = 2, a V-cycle preconditioner with 1 block-Jacobi sweep
-! (with ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi
-! sweeps (with ILU(0) on the blocks) as coarsest-level solver (Sec. 4.1, Listing 2)
-!
-! - choice = 3, W-cycle preconditioner based on the coupled aggregation relying
-! on matching, with maximum size of aggregates equal to 8 and smoothed prolongators,
-! 2 hybrid forward/backward GS sweeps as pre/post-smoother, a distributed coarsest
-! matrix, and preconditioned Flexible Conjugate Gradient as coarsest-level solver
-! (Sec. 4.1, Listing 3)
+! - choice = 2, a W-cycle based on the coupled aggregation relying on matching,
+! with maximum size of aggregates equal to 8 and smoothed prolongators,
+! 2 sweeps of Block-Jacobi ipre/post-smoother using approximate inverse INVK and
+! 4 sweeps of Block-Jacobi with INVK as coarsest-level solver on distributed
+! coarsest matrix
!
! The matrix and the rhs are read from files (if an rhs is not available, the
! unit rhs is set).
@@ -183,8 +178,9 @@ program amg_dexample_gpu
case(1)
- ! initialize a V-cycle preconditioner with 4 Jacobi sweep
- ! and 8 Jacobi sweeps as coarsest-level solver
+ ! initialize a V-cycle preconditioner, relying on decoupled smoothed aggregation
+ ! with 4 Jacobi sweeps as pre/post-smoother
+ ! and 8 Jacobi sweeps as coarsest-level solver on replicated coarsest matrix
call P%init(ctxt,'ML',info)
call P%set('SMOOTHER_TYPE','JACOBI',info)
@@ -195,19 +191,22 @@ program amg_dexample_gpu
case(2)
- ! initialize a V-cycle preconditioner based on the coupled aggregation relying on matching,
+ ! initialize a W-cycle preconditioner based on the coupled aggregation relying on matching,
! with maximum size of aggregates equal to 8 and smoothed prolongators,
- ! Block-Jacobi smoother using approximate inverse INVK and
- ! and 4 sweeps of INVK on he coarsest level
+ ! 2 sweeps of Block-Jacobi pre/post-smoother using approximate inverse INVK and
+ ! 4 sweeps of Block-Jacobi with INVK on the coarsest level distributed matrix
call P%init(ctxt,'ML',info)
call P%set('PAR_AGGR_ALG','COUPLED',info)
call P%set('AGGR_TYPE','MATCHBOXP',info)
call P%set('AGGR_SIZE',8,info)
call P%set('ML_CYCLE','WCYCLE',info)
+ call P%set('SMOOTHER_TYPE','BJAC',info)
call P%set('SMOOTHER_SWEEPS',2,info)
call P%set('SUB_SOLVE','INVK',info)
- call P%set('COARSE_SOLVE','INVK',info)
+ call P%set('COARSE_SOLVE','BJAC',info)
+ call P%set('COARSE_SUBSOLVE','INVK',info)
+ call P%set('COARSE_SWEEPS',4,info)
call P%set('COARSE_MAT','DIST',info)
kmethod = 'CG'
diff --git a/samples/advanced/fileread/data_input.f90 b/samples/advanced/fileread/data_input.f90
index b25cdeb0..6b961352 100644
--- a/samples/advanced/fileread/data_input.f90
+++ b/samples/advanced/fileread/data_input.f90
@@ -1,14 +1,14 @@
!
!
-! MLD2P4 version 2.2
-! MultiLevel Domain Decomposition Parallel Preconditioners Package
-! based on PSBLAS (Parallel Sparse BLAS version 3.5)
+! AMG4PSBLAS version 1.0
+! Algebraic Multigrid Package
+! based on PSBLAS (Parallel Sparse BLAS version 3.7)
!
-! (C) Copyright 2008-2018
+! (C) Copyright 2021
!
! Salvatore Filippone
! Pasqua D'Ambra
-! Daniela di Serafino
+! Fabio Durastante
!
! Redistribution and use in source and binary forms, with or without
! modification, are permitted provided that the following conditions
@@ -18,14 +18,14 @@
! 2. Redistributions in binary form must reproduce the above copyright
! notice, this list of conditions, and the following disclaimer in the
! documentation and/or other materials provided with the distribution.
-! 3. The name of the MLD2P4 group or the names of its contributors may
+! 3. The name of the AMG4PSBLAS group or the names of its contributors may
! not be used to endorse or promote products derived from this
! software without specific written permission.
!
! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
! ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
! TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS
+! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS
! BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
diff --git a/samples/advanced/pdegen/data_input.f90 b/samples/advanced/pdegen/data_input.f90
index b25cdeb0..6b961352 100644
--- a/samples/advanced/pdegen/data_input.f90
+++ b/samples/advanced/pdegen/data_input.f90
@@ -1,14 +1,14 @@
!
!
-! MLD2P4 version 2.2
-! MultiLevel Domain Decomposition Parallel Preconditioners Package
-! based on PSBLAS (Parallel Sparse BLAS version 3.5)
+! AMG4PSBLAS version 1.0
+! Algebraic Multigrid Package
+! based on PSBLAS (Parallel Sparse BLAS version 3.7)
!
-! (C) Copyright 2008-2018
+! (C) Copyright 2021
!
! Salvatore Filippone
! Pasqua D'Ambra
-! Daniela di Serafino
+! Fabio Durastante
!
! Redistribution and use in source and binary forms, with or without
! modification, are permitted provided that the following conditions
@@ -18,14 +18,14 @@
! 2. Redistributions in binary form must reproduce the above copyright
! notice, this list of conditions, and the following disclaimer in the
! documentation and/or other materials provided with the distribution.
-! 3. The name of the MLD2P4 group or the names of its contributors may
+! 3. The name of the AMG4PSBLAS group or the names of its contributors may
! not be used to endorse or promote products derived from this
! software without specific written permission.
!
! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
! ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
! TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS
+! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS
! BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS