diff --git a/docs/amg4psblas_1.0-guide.pdf b/docs/amg4psblas_1.0-guide.pdf index efadc6b5..edff7f0a 100644 Binary files a/docs/amg4psblas_1.0-guide.pdf and b/docs/amg4psblas_1.0-guide.pdf differ diff --git a/docs/html/index.html b/docs/html/index.html index 96d69556..ffb1cef4 100644 --- a/docs/html/index.html +++ b/docs/html/index.html @@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR
Software version: 1.0
April 12, 2021 +class="cmr-12">May 11th, 2021 diff --git a/docs/html/userhtml.css b/docs/html/userhtml.css index 02a4674d..70c936b2 100644 --- a/docs/html/userhtml.css +++ b/docs/html/userhtml.css @@ -149,6 +149,7 @@ div.abstract {width:100%;} .Ovalbox-thick { padding-left:3pt; padding-right:3pt; border:solid thick; } .shadowbox { padding-left:3pt; padding-right:3pt; border:solid thin; border-right:solid thick; border-bottom:solid thick; } .doublebox { padding-left:3pt; padding-right:3pt; border-style:double; border:solid thick; } +.rotatebox{display: inline-block;} .figure img.graphics {margin-left:10%;} .lstlisting .label{margin-right:0.5em; } div.lstlisting{font-family: monospace; white-space: nowrap; margin-top:0.5em; margin-bottom:0.5em; } diff --git a/docs/html/userhtml.html b/docs/html/userhtml.html index 96d69556..ffb1cef4 100644 --- a/docs/html/userhtml.html +++ b/docs/html/userhtml.html @@ -31,7 +31,7 @@ class="cmr-12">University of Rome Tor-Vergata and IAC-CNR
Software version: 1.0
April 12, 2021 +class="cmr-12">May 11th, 2021 diff --git a/docs/html/userhtmlli5.html b/docs/html/userhtmlli5.html index fbb1075a..e7026c20 100644 --- a/docs/html/userhtmlli5.html +++ b/docs/html/userhtmlli5.html @@ -211,7 +211,7 @@ class="cmr-12"> Pothen, Distributed-memory parallel algorithms for matching and coloring, in PCO11 New Trends in Parallel Computing and Optimization, +class="cmr-12">, in PCO’11 New Trends in Parallel Computing and Optimization, IEEE International Symposium on Parallel and Distributed Processing abide by its terms:  

AMG4PSBLAS is distributed together with (a small part) of the graph-matching +class="cmr-12">AMG4PSBLAS is distributed together with (a small part of) the graph-matching @@ -133,7 +133,7 @@ class="cmr-12">[9]. Per the license requirements, we reproduce the relative part +class="cmr-12">. Per the license requirements, we reproduce the relevant part here. diff --git a/docs/html/userhtmlse9.html b/docs/html/userhtmlse9.html index dc508773..1f30b61b 100644 --- a/docs/html/userhtmlse9.html +++ b/docs/html/userhtmlse9.html @@ -91,7 +91,7 @@ class="cmr-12">Trolling, insulting or derogatory comments, and personal or polit class="cmr-12">Public or private harassment

  • Publishing others private information, such as a physical or email address, +class="cmr-12">Publishing others’ private information, such as a physical or email address, without their explicit permission
  • @@ -234,7 +234,7 @@ class="cmr-12">_of_conduct .html. Community Impact Guidelines were inspired by Mozillas code of conduct +class="cmr-12">. Community Impact Guidelines were inspired by Mozilla’s code of conduct enforcement ladder. For answers to common questions about this code of conduct, see github.com/sfilipponepsctoolkit/amg4psblaspsctoolkit/issues>. diff --git a/docs/html/userhtmlsu4.html b/docs/html/userhtmlsu4.html index ffe106f5..d9e5c8c6 100644 --- a/docs/html/userhtmlsu4.html +++ b/docs/html/userhtmlsu4.html @@ -36,8 +36,8 @@ class="cmr-12">If you find any bugs in our codes, please report them through our on
    https://github.com/psctoolkit/amg4psblas/issues
    https://github.com/psctoolkit/psctoolkit/issues

    To enable us to track the bug, please provide a log from the failing application, the diff --git a/docs/html/userhtmlsu5.html b/docs/html/userhtmlsu5.html index 4c715b88..6033e14c 100644 --- a/docs/html/userhtmlsu5.html +++ b/docs/html/userhtmlsu5.html @@ -29,40 +29,41 @@ class="cmr-12">3.5 Example and test programs

    The package contains the examples and tests directories; both of them are -further divided into The package contains a samples directory, divided in two subdirs simple and +advanced; both of them are further divided into fileread and pdegen subdirectories. Their purpose is as +class="cmr-12">subdirectories. follows: +class="cmr-12">Their purpose is as follows:

    examples
    simple
    contains a set of simple example programs with a predefined choice +class="cmr-12">contains a set of simple example programs with a predefined choice of of preconditioners, selectable via integer values. These are intended to get +class="cmr-12">preconditioners, selectable via integer values. These are intended to get acquainted with the multilevel preconditioners available in AMG4PSBLAS.
    tests
    advanced
    contains a set of more sophisticated examples that will allow the user, via +class="cmr-12">contains a set of more sophisticated examples that will allow the user, the input files in the via the input files in the runs subdirectories, to experiment with the full range +class="cmr-12">subdirectories, to experiment with the full of preconditioners implemented in the package.
    -

    range of preconditioners implemented in the package. +

    The fileread directories contain sample programs that read sparse matrices from files, diff --git a/docs/html/userhtmlsu6.html b/docs/html/userhtmlsu6.html index 9c5fa760..0fa4d4a6 100644 --- a/docs/html/userhtmlsu6.html +++ b/docs/html/userhtmlsu6.html @@ -71,38 +71,39 @@ class="cmr-12">The part of the code dealing with reading and assembling the spar right-hand side vector and the deallocation of the relevant data structures, performed through the PSBLAS routines for sparse matrix and vector management, is not +class="cmr-12">through the PSBLAS routines for sparse matrix and vector management, reported here for the sake of conciseness. The complete code can be found in the +class="cmr-12">is not reported here for the sake of conciseness. The complete code can be example program file found in the example program file amg_dexample_ml.f90, in the directory examples/fileread of -the AMG4PSBLAS implementation (see Section, in the directory +samples/simple/fileread of the AMG4PSBLAS implementation (see Section 3.5). A sample test problem along +class="cmr-12">). A with the relevant input data is available in examples/fileread/runs. For details on +class="cmr-12">sample test problem along with the relevant input data is available in +samples/simple/fileread/runs. For details on the use of the PSBLAS routines, see the use of the PSBLAS routines, see the PSBLAS User’s Guidethe PSBLAS User’s Guide [20]. -

    The setup and application of the default multilevel preconditioner for the real single +

    The setup and application of the default multilevel preconditioner for the real single precision and the complex, single and double precision, versions are obtained for details). If these versions are installed, the corresponding codes are available in examples/fileread/samples/simple/fileread. @@ -278,7 +280,7 @@ class="cmr-12">For all the previous preconditioners, example programs where the class="cmr-12">and the right-hand side are generated by discretizing a PDE with Dirichlet boundary conditions are also available in the directory examples/pdegensamples/simple/pdegen. diff --git a/docs/html/userhtmlsu7.html b/docs/html/userhtmlsu7.html index c0086ee7..d784817b 100644 --- a/docs/html/userhtmlsu7.html +++ b/docs/html/userhtmlsu7.html @@ -145,7 +145,7 @@ class="cmr-12">GPU environment

    -

    +

      call desc_a%cnv(mold=igmold)  
      call a%cscnv(info,mold=agmold) @@ -172,7 +172,7 @@ class="cmr-12">GPU environment  
     
     
    -

    +


    Listing 7: setup of a GPU-enabled test program part three.
    @@ -180,25 +180,30 @@ class="content">setup of a GPU-enabled test program part three.

    It is very important to employ solvers that are suited to the GPU, i.e. solvers that +

    It is very important to employ smoothers and coarsest solvers that are suited to the do NOT employ triangular system solve kernels. Solvers that satisfy this constraint +class="cmr-12">GPU, i.e. methods that do NOT employ triangular system solve kernels. Methods that include: +class="cmr-12">satisfy this constraint include:

    +

    and their 1

    An auxiliary input argument that can be passed to the underlying objects. - +class="cmr-12">objects. +

    A variety of preconditioners can be obtained by setting the appropriate diff --git a/docs/src/building.tex b/docs/src/building.tex index 282e4b34..a0013a36 100644 --- a/docs/src/building.tex +++ b/docs/src/building.tex @@ -190,22 +190,23 @@ make install \subsection{Bug reporting} If you find any bugs in our codes, please report them through our issues page on \\[2mm] -\url{https://github.com/psctoolkit/amg4psblas/issues}\\ +\url{https://github.com/psctoolkit/psctoolkit/issues}\\ To enable us to track the bug, please provide a log from the failing application, the test conditions, and ideally a self-contained test program reproducing the issue. \subsection{Example and test programs\label{sec:ex_and_test}} -The package contains the \verb|examples| and \verb|tests| directories; +The package contains a \verb|samples| directory, divided in two +subdirs \verb|simple| and \verb|advanced|; both of them are further divided into \verb|fileread| and \verb|pdegen| subdirectories. Their purpose is as follows: \begin{description} -\item[\tt examples] contains a set of simple example programs with a +\item[\tt simple] contains a set of simple example programs with a predefined choice of preconditioners, selectable via integer values. These are intended to get acquainted with the multilevel preconditioners available in AMG4PSBLAS. -\item[\tt tests] contains a set of more sophisticated examples that +\item[\tt advanced] contains a set of more sophisticated examples that will allow the user, via the input files in the \verb|runs| subdirectories, to experiment with the full range of preconditioners implemented in the package. diff --git a/docs/src/configureout.txt b/docs/src/configureout.txt index e1ae3914..29fdc72a 100644 --- a/docs/src/configureout.txt +++ b/docs/src/configureout.txt @@ -159,4 +159,4 @@ Some influential environment variables: Use these variables to override the choices made by `configure' or to help it to find libraries and programs with nonstandard names/locations. -Report bugs to . +Report bugs to . diff --git a/docs/src/gettingstarted.tex b/docs/src/gettingstarted.tex index cf3ecaca..d67cd548 100644 --- a/docs/src/gettingstarted.tex +++ b/docs/src/gettingstarted.tex @@ -129,9 +129,9 @@ relevant data structures, performed through the PSBLAS routines for sparse matrix and vector management, is not reported here for the sake of conciseness. The complete code can be found in the example program file \verb|amg_dexample_ml.f90|, -in the directory \verb|examples/fileread| of the AMG4PSBLAS implementation (see +in the directory \verb|samples/simple/file|\-\verb|read| of the AMG4PSBLAS implementation (see Section~\ref{sec:ex_and_test}). A sample test problem along with the relevant -input data is available in \verb|examples/fileread/runs|. +input data is available in \verb|samples/simple/fileread/runs|. For details on the use of the PSBLAS routines, see the PSBLAS User's Guide~\cite{PSBLASGUIDE}. @@ -139,7 +139,7 @@ The setup and application of the default multilevel preconditioner for the real single precision and the complex, single and double precision, versions are obtained with straightforward modifications of the previous example (see Section~\ref{sec:userinterface} for details). If these versions are installed, -the corresponding codes are available in \verb|examples/fileread/|. +the corresponding codes are available in \verb|samples/simple/file|\-\verb|read|. \begin{listing}[tbp] \begin{center} @@ -300,7 +300,7 @@ The corresponding example program is available in the file For all the previous preconditioners, example programs where the sparse matrix and the right-hand side are generated by discretizing a PDE with Dirichlet -boundary conditions are also available in the directory \verb|examples/pdegen|. +boundary conditions are also available in the directory \verb|samples/simple/pdegen|. \vspace{-1em}\begin{listing}[tbh] \ifpdf% \begin{minted}[breaklines=true,bgcolor=bg,fontsize=\small]{fortran} @@ -535,7 +535,8 @@ Krylov method. At the end of the code, we close the GPU environment call prec%allocate_wrk(info) t1 = psb_wtime() call psb_krylov(s_choice%kmethd,a,prec,b,x,s_choice%eps,& - & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,itrace=s_choice%itrace,& + & desc_a,info,itmax=s_choice%itmax,iter=iter,err=err,& + & itrace=s_choice%itrace,& & istop=s_choice%istopc,irst=s_choice%irst) call prec%deallocate_wrk(info) call psb_barrier(ctxt) @@ -584,15 +585,18 @@ Krylov method. At the end of the code, we close the GPU environment \caption{setup of a GPU-enabled test program part three.\label{fig:gpu-ex3}} \end{listing} -It is very important to employ solvers that are suited -to the GPU, i.e. solvers that do NOT employ triangular -system solve kernels. Solvers that satisfy this constraint include: +It is very important to employ smoothers and coarsest solvers that are suited +to the GPU, i.e. methods that do NOT employ triangular +system solve kernels. Methods that satisfy this constraint include: \begin{itemize} \item \verb|JACOBI| +\item \verb|BJAC| with the following methods on the local blocks: +\begin{itemize} \item \verb|INVK| \item \verb|INVT| \item \verb|AINV| \end{itemize} +\end{itemize} and their $\ell_1$ variants. %%% Local Variables: diff --git a/docs/src/license.tex b/docs/src/license.tex index 14785321..a1d6574c 100644 --- a/docs/src/license.tex +++ b/docs/src/license.tex @@ -87,9 +87,9 @@ terms: {\small \end{verbatim} } \pagebreak -AMG4PSBLAS is distributed together with (a small part) of the graph-matching +AMG4PSBLAS is distributed together with (a small part of) the graph-matching library MatchBox-P~\cite{MatchBoxP}. Per the license requirements, we reproduce -the relative part here. +the relevant part here. {\small \begin{verbatim} // *********************************************************************** diff --git a/docs/src/userguide.tex b/docs/src/userguide.tex index ff3a8c44..517e138b 100644 --- a/docs/src/userguide.tex +++ b/docs/src/userguide.tex @@ -154,7 +154,7 @@ Preconditioners Package based on PSBLAS} \flushright \large Software version: 1.0\\ %\todaym -\large April 12, 2021 +\large May 11th, 2021 \end{minipage}} %\addtolength{\textwidth}{\centeroffset} \vspace{\stretch{2}} diff --git a/docs/src/userhtml.tex b/docs/src/userhtml.tex index bea768f6..edf3eebe 100644 --- a/docs/src/userhtml.tex +++ b/docs/src/userhtml.tex @@ -114,7 +114,7 @@ %\today Software version: 1.0\\ %\today - April 12, 2021 + May 11th, 2021 \clearpage \ \\ \thispagestyle{empty} diff --git a/examples/gpu/amg_dexample_gpu.f90 b/examples/gpu/amg_dexample_gpu.f90 index 142dabe0..13fc343e 100644 --- a/examples/gpu/amg_dexample_gpu.f90 +++ b/examples/gpu/amg_dexample_gpu.f90 @@ -39,23 +39,18 @@ ! ! This sample program solves a linear system obtained by discretizing a ! PDE with Dirichlet BCs. The solver is CG, coupled with one of the -! following multi-level preconditioner, as explained in Section 4.1 of +! following multi-level preconditioner, as explained in Section 4.2 of ! the AMG4PSBLAS User's and Reference Guide: ! -! - choice = 1, the default multi-level preconditioner solver, i.e., -! V-cycle with decoupled smoothed aggregation, 1 hybrid forward/backward -! GS sweep as pre/post-smoother and UMFPACK as coarsest-level -! solver (Sec. 4.1, Listing 1) +! - choice = 1, a V-cycle with decoupled smoothed aggregation, 4 Jacobi +! sweeps as pre/post-smoother and 8 Jacobi sweeps as coarsest-level +! solver with replicated coarsest matrix ! -! - choice = 2, a V-cycle preconditioner with 1 block-Jacobi sweep -! (with ILU(0) on the blocks) as pre- and post-smoother, and 8 block-Jacobi -! sweeps (with ILU(0) on the blocks) as coarsest-level solver (Sec. 4.1, Listing 2) -! -! - choice = 3, W-cycle preconditioner based on the coupled aggregation relying -! on matching, with maximum size of aggregates equal to 8 and smoothed prolongators, -! 2 hybrid forward/backward GS sweeps as pre/post-smoother, a distributed coarsest -! matrix, and preconditioned Flexible Conjugate Gradient as coarsest-level solver -! (Sec. 4.1, Listing 3) +! - choice = 2, a W-cycle based on the coupled aggregation relying on matching, +! with maximum size of aggregates equal to 8 and smoothed prolongators, +! 2 sweeps of Block-Jacobi ipre/post-smoother using approximate inverse INVK and +! 4 sweeps of Block-Jacobi with INVK as coarsest-level solver on distributed +! coarsest matrix ! ! The matrix and the rhs are read from files (if an rhs is not available, the ! unit rhs is set). @@ -183,8 +178,9 @@ program amg_dexample_gpu case(1) - ! initialize a V-cycle preconditioner with 4 Jacobi sweep - ! and 8 Jacobi sweeps as coarsest-level solver + ! initialize a V-cycle preconditioner, relying on decoupled smoothed aggregation + ! with 4 Jacobi sweeps as pre/post-smoother + ! and 8 Jacobi sweeps as coarsest-level solver on replicated coarsest matrix call P%init(ctxt,'ML',info) call P%set('SMOOTHER_TYPE','JACOBI',info) @@ -195,19 +191,22 @@ program amg_dexample_gpu case(2) - ! initialize a V-cycle preconditioner based on the coupled aggregation relying on matching, + ! initialize a W-cycle preconditioner based on the coupled aggregation relying on matching, ! with maximum size of aggregates equal to 8 and smoothed prolongators, - ! Block-Jacobi smoother using approximate inverse INVK and - ! and 4 sweeps of INVK on he coarsest level + ! 2 sweeps of Block-Jacobi pre/post-smoother using approximate inverse INVK and + ! 4 sweeps of Block-Jacobi with INVK on the coarsest level distributed matrix call P%init(ctxt,'ML',info) call P%set('PAR_AGGR_ALG','COUPLED',info) call P%set('AGGR_TYPE','MATCHBOXP',info) call P%set('AGGR_SIZE',8,info) call P%set('ML_CYCLE','WCYCLE',info) + call P%set('SMOOTHER_TYPE','BJAC',info) call P%set('SMOOTHER_SWEEPS',2,info) call P%set('SUB_SOLVE','INVK',info) - call P%set('COARSE_SOLVE','INVK',info) + call P%set('COARSE_SOLVE','BJAC',info) + call P%set('COARSE_SUBSOLVE','INVK',info) + call P%set('COARSE_SWEEPS',4,info) call P%set('COARSE_MAT','DIST',info) kmethod = 'CG' diff --git a/samples/advanced/fileread/data_input.f90 b/samples/advanced/fileread/data_input.f90 index b25cdeb0..6b961352 100644 --- a/samples/advanced/fileread/data_input.f90 +++ b/samples/advanced/fileread/data_input.f90 @@ -1,14 +1,14 @@ ! ! -! MLD2P4 version 2.2 -! MultiLevel Domain Decomposition Parallel Preconditioners Package -! based on PSBLAS (Parallel Sparse BLAS version 3.5) +! AMG4PSBLAS version 1.0 +! Algebraic Multigrid Package +! based on PSBLAS (Parallel Sparse BLAS version 3.7) ! -! (C) Copyright 2008-2018 +! (C) Copyright 2021 ! ! Salvatore Filippone ! Pasqua D'Ambra -! Daniela di Serafino +! Fabio Durastante ! ! Redistribution and use in source and binary forms, with or without ! modification, are permitted provided that the following conditions @@ -18,14 +18,14 @@ ! 2. Redistributions in binary form must reproduce the above copyright ! notice, this list of conditions, and the following disclaimer in the ! documentation and/or other materials provided with the distribution. -! 3. The name of the MLD2P4 group or the names of its contributors may +! 3. The name of the AMG4PSBLAS group or the names of its contributors may ! not be used to endorse or promote products derived from this ! software without specific written permission. ! ! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ! ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ! TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR -! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS +! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS ! BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR ! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF ! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS diff --git a/samples/advanced/pdegen/data_input.f90 b/samples/advanced/pdegen/data_input.f90 index b25cdeb0..6b961352 100644 --- a/samples/advanced/pdegen/data_input.f90 +++ b/samples/advanced/pdegen/data_input.f90 @@ -1,14 +1,14 @@ ! ! -! MLD2P4 version 2.2 -! MultiLevel Domain Decomposition Parallel Preconditioners Package -! based on PSBLAS (Parallel Sparse BLAS version 3.5) +! AMG4PSBLAS version 1.0 +! Algebraic Multigrid Package +! based on PSBLAS (Parallel Sparse BLAS version 3.7) ! -! (C) Copyright 2008-2018 +! (C) Copyright 2021 ! ! Salvatore Filippone ! Pasqua D'Ambra -! Daniela di Serafino +! Fabio Durastante ! ! Redistribution and use in source and binary forms, with or without ! modification, are permitted provided that the following conditions @@ -18,14 +18,14 @@ ! 2. Redistributions in binary form must reproduce the above copyright ! notice, this list of conditions, and the following disclaimer in the ! documentation and/or other materials provided with the distribution. -! 3. The name of the MLD2P4 group or the names of its contributors may +! 3. The name of the AMG4PSBLAS group or the names of its contributors may ! not be used to endorse or promote products derived from this ! software without specific written permission. ! ! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ! ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED ! TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR -! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MLD2P4 GROUP OR ITS CONTRIBUTORS +! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AMG4PSBLAS GROUP OR ITS CONTRIBUTORS ! BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR ! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF ! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS