diff --git a/docs/amg4psblas_1.0-guide.pdf b/docs/amg4psblas_1.0-guide.pdf index 2303b96c..60f23ddf 100644 Binary files a/docs/amg4psblas_1.0-guide.pdf and b/docs/amg4psblas_1.0-guide.pdf differ diff --git a/docs/html/index.html b/docs/html/index.html index e720e239..6957ab26 100644 --- a/docs/html/index.html +++ b/docs/html/index.html @@ -123,57 +123,57 @@ class="cmr-12">Method set
 5.3 Method hierarchy_build
 5.4 Method smoothers_build
 5.5 Method build
 5.6 Method apply
 5.7 Method free
 5.8 Method descr
 5.9 Auxiliary Methods
6 Adding new smoother and solver objects to AMG4PSBLAS
7 Error Handling
A License
References
References diff --git a/docs/html/userhtml.css b/docs/html/userhtml.css index 3d2eaa23..f8b025b4 100644 --- a/docs/html/userhtml.css +++ b/docs/html/userhtml.css @@ -194,9 +194,9 @@ div.lstinputlisting{ font-family: monospace; white-space: nowrap; } #TBL-5{border-collapse:collapse;} #TBL-5 colgroup{border-left: 1px solid black;border-right:1px solid black;} #TBL-5{border-collapse:collapse;} -td#TBL-5-10-5{border-left:solid black 0.4pt;border-right:solid black 0.4pt;} td#TBL-5-11-5{border-left:solid black 0.4pt;border-right:solid black 0.4pt;} td#TBL-5-12-5{border-left:solid black 0.4pt;border-right:solid black 0.4pt;} +td#TBL-5-13-5{border-left:solid black 0.4pt;border-right:solid black 0.4pt;} #TBL-6 colgroup{border-left: 1px solid black;border-right:1px solid black;} #TBL-6{border-collapse:collapse;} #TBL-6 colgroup{border-left: 1px solid black;border-right:1px solid black;} @@ -250,17 +250,5 @@ td#TBL-7-7-5{border-left:solid black 0.4pt;border-right:solid black 0.4pt;} #TBL-9{border-collapse:collapse;} #TBL-9 colgroup{border-left: 1px solid black;border-right:1px solid black;} #TBL-9{border-collapse:collapse;} -#TBL-10 colgroup{border-left: 1px solid black;border-right:1px solid black;} -#TBL-10{border-collapse:collapse;} -#TBL-10 colgroup{border-left: 1px solid black;border-right:1px solid black;} -#TBL-10{border-collapse:collapse;} -#TBL-10 colgroup{border-left: 1px solid black;border-right:1px solid black;} -#TBL-10{border-collapse:collapse;} -#TBL-10 colgroup{border-left: 1px solid black;border-right:1px solid black;} -#TBL-10{border-collapse:collapse;} -#TBL-10 colgroup{border-left: 1px solid black;border-right:1px solid black;} -#TBL-10{border-collapse:collapse;} -#TBL-10 colgroup{border-left: 1px solid black;border-right:1px solid black;} -#TBL-10{border-collapse:collapse;} /* end css.sty */ diff --git a/docs/html/userhtml.html b/docs/html/userhtml.html index e720e239..6957ab26 100644 --- a/docs/html/userhtml.html +++ b/docs/html/userhtml.html @@ -123,57 +123,57 @@ class="cmr-12">Method set
 5.3 Method hierarchy_build
 5.4 Method smoothers_build
 5.5 Method build
 5.6 Method apply
 5.7 Method free
 5.8 Method descr
 5.9 Auxiliary Methods
6 Adding new smoother and solver objects to AMG4PSBLAS
7 Error Handling
A License
References
References diff --git a/docs/html/userhtmlli2.html b/docs/html/userhtmlli2.html index 061f8760..8f14145c 100644 --- a/docs/html/userhtmlli2.html +++ b/docs/html/userhtmlli2.html @@ -135,32 +135,32 @@ class="cmr-12">Auxiliary Methods class="cmr-12">  5.9.1 Method: dump
  5.9.2 Method: clone
  5.9.3 Method: sizeof
  5.9.4 Method: allocate_wrk
  5.9.5 Method: free_wrk
A License
References diff --git a/docs/html/userhtmlli4.html b/docs/html/userhtmlli4.html index 31370ba6..38af5368 100644 --- a/docs/html/userhtmlli4.html +++ b/docs/html/userhtmlli4.html @@ -29,7 +29,7 @@ class="cmr-12">]

class="cmr-12">References

+ id="Q1-28-46">

. class="cmr-12">[2]   D. Bertaccini and S. Filippone, Sparse approximate inverse + preconditioners on high performance GPU platforms, Comput. Math. Appl. + 71 (2016), no. 3, 693–711. +

+

+ [3]   M., Computing, 63, 1999, 233–263.

[3][4]   , SIAM, 2000.

[4][5]   State of the Art in Scientific Computing, Lecture Notes in Comput Springer, 2005, 593–602.

+ + +

[5][6]   applications, Applicable Algebra in Engineering, Communications and Computing, 18 (3) 2007, 223–239. - - -

[6][7]   (2), 1999, 792–797.

[7][8]   Workshops, IEEE CS, 2011.

[8][9]   , Applied Numerical Mathematics, Elsevier Science, class="cmr-12">57 (11-12), 2007, 1181-1196.

- [9][10]   , ACM Trans. Math. Softw., 37(3), 2010, art. 30.

[10][11]   , Appl. Algebra Engrg. Comm. Comput., 18(3), 2007, 223–239.

[11][12]   , 2020, arXiv:2006.16147v3. + + +

[12][13]   , ACM Transactions on Mathematical Software, 30, 2004, 196– class="cmr-12">(See also http://www.cise.ufl.edu/~davis/) - - -

[13][14]   SIAM Journal on Matrix Analysis and Applications, 20 (3), 1999, 7

[14][15]   Software, 16 (1) 1990, 1–17.

[15][16]   Transactions on Mathematical Software, 14 (1) 1988, 1–17.

[16][17]   .

[17][18]    23.

[18][19]   Library for Parallel Linear Algebra Computation on Sparse Matric class="cmr-12">, ACM Transactions on Mathematical Software, 26 (4), 2000, 527–550. + + +

[19][20]   algebraic multigrid by aggregation, Numerical Lin. Algebra with Applications, 2016, 23:501-518 - - -

[20][21]   , MIT Press, 1998.

[21][22]   Mathematical Software, 5 (3), 1979, 308–323.

[22][23]   ACM Transactions on Mathematical Software, 29 (2), 2003, 110̵

[23][24]   Numerical Linear Algebra with Applications, 15 (5), 2008, 473R

[24][25]   2003.

[25][26]   , Cambridge University Press, 1996.

+ + +

[26][27]   Press, 1998.

[27][28]    class="cmr-12">U. Trottenberg, C. Oosterlee, Multigrid, Academic Press, 2001. - - -

[28][29]   editor, Proceedings of SuperComputing 2000, Dallas, 2000.

[29][30]   (3) 1996, 179–196.

[30][31]   16, (2013) 59–76.

[31][32]   ) provides parallel Algebraic MultiGrid (AMG) preconditioners (se class="cmr-12">e.g., [34, 2728]), to be used in the iterative solution of linear systems, and a version of a Krylov-type cycle (K-cycle) [34, 2324]are available, which can beproposed in [23, 2930], and already included in the previous versions of thepackage [10, 911]; @@ -177,11 +177,11 @@ class="cmr-12">on Compatible Weighted Matching introduced in [3031, 3132]and described indetails in [1112];

computational framework [1819, 1718]. PSBLAS provides basic linear algebra operators multilevel (i.e., AMG) preconditioners with the Krylov solvers in [1617]. The following steps are required:

    @@ -134,8 +134,8 @@ class="cmr-12"> 2-87.
  1. 2-8 for further details of +href="userhtmlsu8.html#x17-16014r7">7 for further details of the preconditioner.
The interfaces for the calls shown above are defined using

-




smoother

smoother

class(amg_x_base_smoother_type)

The user-defined new smoother to be employed in the preconditioner.

solver

solver

class(amg_x_base_solver_type)

The user-defined new solver to be employed in the preconditioner.

PSBLAS error handling routines; for further details see the PSBLA [1617]. @@ -61,6 +61,10 @@ class="cmr-12">. +

+ + + diff --git a/docs/html/userhtmlsu1.html b/docs/html/userhtmlsu1.html index 945fed8d..f6d3e1f0 100644 --- a/docs/html/userhtmlsu1.html +++ b/docs/html/userhtmlsu1.html @@ -36,15 +36,15 @@ class="cmbx-12">BLAS

[1415, 1516, 2122] Many vendors provide optimized versions of BLAS; if no MPI
[2021, 2627] A version of MPI is available on most high-performance computing PSBLAS
[1617, 1819] Parallel Sparse BLAS (PSBLAS) is available from -
[1213] A sparse LU factorization package included in the SuiteSparse SuperLU
[1314] A sparse LU factorization package available from _Dist
[2223] A sparse LU factorization package available from the same PSBLAS User’s Guide [1617].

 2-87.  2-87. When the value is of type  2-87).  2-87.

), as specified in Tables 76-87. For example, the block-Jacobi smoother using ILU(0) on the blocks is obtained by combining the block-Jacobi smoother object with the @@ -410,8 +410,8 @@ class="cmr-12">ILU(0) solver object. Similarly, the hybrid Gauss-Seidel smoother Table 76) is obtained by combining the block-Jacobi smoother object with a single sweep of the Gauss-Seidel solver object, while the point-Jacobi smoother @@ -429,8 +429,8 @@ class="cmtt-12">SMOOTHER_TYPE to appropriate values (see Tables 76), i.e., without setting smoother. class="cmr-12">Similar considerations apply to the point-Jacobi, Gauss-Seidel and block-Jacobi coarsest-level solvers, and shortcuts are available in this case too (see Table 5 ??).
@@ -606,16 +605,26 @@ class="cmtt-10x-x-109">character
(len=*)

’VCYCLE’ -

’WCYCLE’ -

’KCYCLE’ -

’ADD’

’VCYCLE’

VCYCLE +

WCYCLE +

KCYCLE +

ADD

VCYCLE

Multilevel cycle: V-cycle, W-cycle, K-cycle, and additive composition.

OUTER_SWEEPS integer

integer

Any integer

number 1 ments s






+






MIN_COARSE_SIZE_PER_PROCESS. +class="cmtt-10x-x-109">’.




Minimum ratio. The aggregation stops if the ratio between the global matrix dimensions at two consecutive levels is lower than or -equal to this threshold (see Note). +equal to this threshold (see Note).






> 1

20

Maximum number of levels. The aggregation stops if the number of levels -reaches this value (see Note). +reaches this value (see Note).






PAR_AGGR_ALG

character(len=*)

’DEC’, ’SYMDEC’, ’COUPLED’

’DEC’

Parallel aggregation algorithm. -

the

Parallel aggregation algorithm. +

the SYMDEC option applies decoupled aggregation to the sparsity pattern of A + AT . +class="cmmi-8">T .






what data type

val

default

comments +






AGGR_TYPE

character(len=*)

’SOC1’

’SOC1’, -’SOC2’, -’MATCHBOXP’

Type of aggregation algorithm: currently, +class="cmtt-10x-x-109">=*)

SOC1

SOC1, +SOC2, +MATCHBOXP

Type of aggregation algorithm: currently, for the decoupled aggregation we implement two measures of strength of connection, the one by Vaněk, Mandel and Brezina [29], +href="userhtmlli4.html#XVANEK_MANDEL_BREZINA">30], and the one by Gratton et al [19]. The +href="userhtmlli4.html#XGrHeJi:16">20]. The coupled aggregation is based on a parallel version of the half-approximate matching implemented in the MatchBox-P software -package AGGIUNGERE LINK AL -PACKAGE? +package [8].






AGGR_SIZE

integer

Any integer -

number +class="cmtt-10x-x-109">’

integer

Any integer +

number power of 2 and > 2

4

Maximum size of aggregates when the +class="cmmi-10x-x-109">> 2

4

Maximum size of aggregates when the coupled aggregation based on matching is applied. For aggressive coarsening with size of aggregate larger than 8 @@ -865,49 +916,56 @@ we recommend the use of smoothed prolongators. MODIFICARE CODICE +class="cmbx-10x-x-109">CODICE






AGGR_PROL

character(len=*)

’SMOOTHED’, -’UNSMOOTHED’

’SMOOTHED’

Prolongator used by the aggregation +class="cmtt-10x-x-109">=*)

SMOOTHED, +UNSMOOTHED

SMOOTHED

Prolongator used by the aggregation algorithm: smoothed or unsmoothed (i.e., -tentative prolongator). +tentative prolongator).






Note. The aggregation algorithm stops when at least one of the following criteria is met: the coarse size threshold, the coarse size threshold per process, the
-
minimum coarsening ratio, or the maximum number of levels is reached. Therefore, the actual number of levels may be
+class="td11">
Note. The aggregation algorithm stops when at least one of the following criteria is met: the coarse size threshold,
smaller than the specified maximum number of levels.
+class="td11">
the coarse size threshold per process, the minimum coarsening ratio, or the maximum number of levels is reached.
+
Therefore, the actual number of levels may be smaller than the specified maximum number of levels.





- +
Table 3: Parameters defining the aggregation algorithm.
+ @@ -917,7 +975,7 @@ class="content">Parameters defining the aggregation algorithm.


@@ -925,7 +983,7 @@ class="content">Parameters defining the aggregation algorithm.

+

< style="vertical-align:baseline;" id="TBL-6-5-">
Note. Different thresholds at different levels, such as those used in [29, Section 5.1], can be easily set by invoking the rou-
+href="userhtmlli4.html#XVANEK_MANDEL_BREZINA">30, Section 5.1], can be easily set by invoking the rou-
tine Parameters defining the aggregation algorithm (continued).


@@ -1075,7 +1133,10 @@ class="content">Parameters defining the aggregation algorithm (continued). -

+

+ + +






what

type

val

deault

cots






AGGR_ORD

character(len=*)

’NATURAL’ -

’DEGREE’

’NATURAL’

Initial ordering of indices for +class="td11">

Initial ordering of indices for the decoupled aggregation algorithm: either natural ordering or sorted by descending degrees of the nodes in the @@ -994,23 +1052,23 @@ matrix graph.






AGGR_THRESH

real( kind_parameter

_parameter

)

Any real -

number 

Any real +

number  [0,1]

0.01

The threshold

0.01

The threshold θ in the decoupled aggregation algorithm, see (??) in @@ -1020,22 +1078,22 @@ bottom of this table.






AGGR_FILTER

character(len=*)

’FILTER’ -

’NOFILTER’

’NOFILTER’

Matrix used in computing the smoothed +class="td11">

Matrix used in computing the smoothed prolongator: filtered or unfiltered (see (??) in Section 






+LU from MUMPS, SuperLU or UMFPACK (plus +triangular solve). Suitable for GPUs (no triangular +solve) approximate inverse solvers INVK(p,q), +INVT(p1,p2,t1,t2) and AINV(t), see [2]. Note that +UMFPACK and SuperLU_Dist are available only in +double precision.





what

a type

e

val

deault

cots






COARSE_MAT

character(len=*)

’DIST’ -

’REPL’

’REPL’

Coarsest matrix layout: distributed among the +class="td11">

DIST +

REPL

REPL

Coarsest matrix layout: distributed among the processes or replicated on each of them.






COARSE_SOLVE

character(len=*)

’MUMPS’ -

’UMF’ -

’SLU’ -

’SLUDIST’ -

’JACOBI’ -

’GS’ -

’BJAC’ -

’PCG’

See Note.

Solver used at the coarsest level: sequential LU +class="td11">

MUMPS +

UMF +

SLU +

SLUDIST +

JACOBI +

GS +

BJAC +

RKR

See Note.

Solver used at the coarsest level: sequential LU from MUMPS, UMFPACK, or SuperLU (plus triangular solve); distributed LU from MUMPS or SuperLU_Dist (plus triangular solve); point-Jacobi, @@ -1180,62 +1263,95 @@ class="cmbx-10x-x-109">preconditioned Conjugate Gradient coupled with the block-Jacobi preconditioner with ILU(0) on the blocks. -

Note that UMF and SLU require the coarsest matrix -to be replicated, ILU(0) on the blocks. Note that UMF and +SLU require the coarsest matrix to be replicated, +SLUDIST, JACOBI, GS, BJAC and PCG -require it to be distributed, MUMPS can be used with -either a replicated or a distributed matrix. When -any of the previous solvers is specified, the matrix -layout is set to a default value which allows the use -of the solver (see Remark 3, p. 24). Note also that +class="cmtt-10x-x-109">PCG require it to +be distributed, MUMPS can be used with either a +replicated or a distributed matrix. When any of +the previous solvers is specified, the matrix layout +is set to a default value which allows the use of +the solver (see Remark 3, p. 24). Note also that UMFPACK and SuperLU_Dist are available only in double precision.






COARSE_SUBSOLVE

character(len=*)

’ILU’ -

’ILUT’ -

’MILU’ -

’MUMPS’ -

’SLU’ -

’UMF’

See Note.

Solver for the diagonal blocks of the coarse matrix, +class="td11">

ILU +

ILUT +

MILU +

MUMPS +

SLU +

UMF +

INVT +

INVK +

AINV

See Note.

Solver for the diagonal blocks of the coarse matrix, in case the block Jacobi solver is chosen as coarsest-level solver: ILU(p), ILU(p,t), MILU(p), -LU from MUMPS, SuperLU or UMFPACK -(plus triangular solve). Aggiungere Sparse -Approssimate per GPU? Note that UMFPACK -and SuperLU_Dist are available only in double -precision.






ILU otherwise.





-
Table 5: Parameters defining the coarse-space correction at the coarsest level.
- - - -
- -
- - - -


- - - -
-

-

s + +class="cmbx-10x-x-109">criterio di arresto del PCG? +class="cmtt-10x-x-109">’ +class="cmti-10x-x-109">_parameter

)






what

what

da type

val

e

val

default

t

coments






COARSE_SWEEPS

integer

Any integer -

number > 0

10

Number of sweeps when JACOBI, GS or BJAC is -chosen as coarsest-level solver.

integer

Any +integer +

number > +0

10

Number of sweeps when JACOBI, GS or BJAC +is chosen as coarsest-level solver. Aggiungere criterio di arresto del PCG?






COARSE_FILLIN

integer

Any integer -

number 0

0

Fill-in level p of the ILU factorizations.

integer

Any +integer +

number +0

0

Fill-in level p of the ILU factorizations.






COARSE_ILUTHRS

real( kind_parameter

)

Any real -

number 0

0

Drop tolerance t in the ILU(p,t) -factorization.

Any real +

number +0

0

Drop tolerance t in the ILU(p,t) factorization.






+ style="vertical-align:baseline;" id="TBL-7-12-"> +
+ + +
Table 6: Table 5: Parameters defining the coarse-space correction at the coarsest level -(continued).
+(continued). @@ -1399,27 +1492,27 @@ class="content">Parameters defining the coarse-space correction at the co -



-

-

+






what

what

da type

val

e

val

default

t

conts






SMOOTHER_TYPE

character(len=*)

=*)

JACOBI -

GS -

BGS -

BJAC -

AS

FBGS

Type of smoother used in the multilevel preconditioner: point-Jacobi, hybrid @@ -1499,39 +1592,55 @@ class="cmr-7">1-versions? and Additive Schwarz. -

It is ignored by one-level preconditioners.






SUB_SOLVE

character(len=*)

’JACOBI’ -

’GS’ -

=*)

JACOBI +

GS +

’BGS’ -

’ILU’ -

’ILUT’ -

’MILU’ -

’MUMPS’ -

’SLU’ -

’UMF’

ILU +

ILUT +

MILU +

MUMPS +

SLU +

UMF

GS and BGS multilevel class="cmr-10">preconditioners, respectively -

ILU for block-Jacobi preconditioners 1-versions?

-versions?

The local solver to be used with the smoother or one-level preconditioner (see @@ -1583,23 +1692,23 @@ class="cmr-10">or UMFPACK (plus triangular solve). See class="cmr-10">Note for details on hybrid Gauss-Seidel.






SMOOTHER_SWEEPS

integer

integer

Any integer -

number 0

1

0

1

Number of sweeps of the smoother or one-level preconditioner. In the multilevel @@ -1624,34 +1733,34 @@ class="cmr-10">, class="cmr-10">respectively.






SUB_OVR

integer

integer

Any integer -

number 0

1

0

1

Number of overlap layers, for Additive Schwarz only.







Table 7: Table 6: Parameters defining the smoother or the details of the one-level -preconditioner.
+preconditioner.
@@ -1661,27 +1770,27 @@ preconditioner.
-



-

-

+






what

what

da type

val

e

val

default

t

conts






SUB_RESTR

character(len=*)

=*)

HALO -

NONE

HALO

Type of restriction operator, for Additive Schwarz only: NONE for neglecting it. -

Note that HALO must be chosen for the @@ -1758,29 +1867,29 @@ class="cmr-10">classical Addditive Schwarz smoother and class="cmr-10">its RAS variant.






SUB_PROL

character(len=*)

=*)

SUM -

NONE

NONE

Type of prolongation operator, for Additive Schwarz only: for neglecting them. -

Note that SUM for its RAS variant.






SUB_FILLIN

integer

integer

Any integer -

number 0

0

0

0

Fill-in level p of the incomplete LU @@ -1835,27 +1944,27 @@ class="cmr-10">of the incomplete LU class="cmr-10">factorizations.






SUB_ILUTHRS

real( kind_parameter

)

_parameter

)

Any real number 0

0

0

0

Drop tolerance t in the ILU() factorization.

MUMPS_LOC_GLOB

character(len=*)

=*)

LOCAL_SOLVER -

GLOBAL_SOLVER

GLOBAL_SOLVER

Whether MUMPS should be used as a distributed solver, or as a serial solver acting @@ -1895,20 +2004,20 @@ class="cmr-10">only on the part of the matrix local to each process.

MUMPS_IPAR_ENTRY

integer

integer

Any integer number

0

number

0

Set an entry in the MUMPS integer control array, as chosen via the optional argument.

MUMPS_RPAR_ENTRY

real

Any real number

0

real

Any real number

0

Set an entry in the MUMPS real control array, as chosen via the optional class="cmr-10">argument.







Table 8: Table 7: Parameters defining the smoother or the details of the one-level preconditioner -(continued).
+(continued).
@@ -1955,7 +2064,7 @@ class="content">Parameters defining the smoother or the details of the one-level -