34 KiB
Nested Preconditioner Report
Executive Summary
This report documents the double-precision nested preconditioner functionality added to PSBLAS. The implementation provides a PSBLAS-native field-split preconditioner for matrices represented through the nested matrix infrastructure. It is implemented primarily in:
prec/psb_d_nestedprec.f90prec/impl/psb_d_prec_type_impl.f90base/modules/serial/psb_d_nest_base_mat_mod.F90- nested test drivers under
test/nested
The nested preconditioner is intended for block-structured systems, especially multiphysics and field-split problems, where the global matrix is naturally written as:
[ A_11 A_12 ... A_1m ]
A = [ A_21 A_22 ... A_2m ]
[ ... ... ... ... ]
[ A_m1 A_m2 ... A_mm ]
Each field owns a PSBLAS descriptor and each block is a normal sparse PSBLAS matrix. The preconditioner extracts the field diagonal blocks, builds ordinary PSBLAS sub-preconditioners on those blocks, and combines the field solves through additive, multiplicative, symmetric multiplicative, or two-field Schur-style compositions.
The implementation now also includes independent per-field inner Krylov solve contexts. These are configured through the existing prec%set option mechanism and are applied inside field solves when requested. The current inner methods are local nested-preconditioner kernels for CG and BICGSTAB, rather than calls into the top-level psb_krylov driver.
Outer Krylov solver (psb_krylov)
|
+-- Preconditioner / Block solve
|
+-- Field 1 solve
| |
| +-- Inner CG/BICGSTAB (optional)
|
+-- Field 2 solve
|
+-- Inner CG/BICGSTAB (optional)
PSBLAS Background
PSBLAS separates sparse linear algebra into a few recurring concepts:
- a sparse matrix object, such as
psb_dspmat_type - a communication descriptor,
psb_desc_type, that describes distributed row ownership and halo data - dense vector storage, either raw arrays or PSBLAS vector objects
- a preconditioner wrapper,
psb_dprec_type, that owns a polymorphic implementation derived frompsb_d_base_prec_type - Krylov solvers, such as
psb_krylov, that operate on a matrix, a descriptor, and an optional preconditioner
The usual preconditioner lifecycle is:
call prec%init('...', info)
call prec%set('...', value, info)
call prec%build(a, desc_a, info)
call psb_krylov(method, a, prec, b, x, eps, desc_a, info, ...)
call prec%free(info)
The nested preconditioner follows this same lifecycle. From the outside it is just another psb_dprec_type implementation selected by:
call prec%init('NEST', info)
The outer Krylov method does not need to know that the preconditioner is block structured. It applies prec through the standard PSBLAS preconditioner virtual interface.
Motivation
Many application matrices are block systems, not just scalar sparse matrices. Examples include:
- coupled PDE systems
- saddle point systems
- fluid-structure and multiphysics discretizations
- systems reordered by physical field or variable type
- matrices where different fields require different smoothers or approximate solvers
A scalar preconditioner treats the matrix as a single undifferentiated sparse operator. That is often too weak or too inflexible for block systems. A field-split preconditioner can use the mathematical structure of the problem:
- each field can use a different sub-preconditioner
- off-diagonal coupling can be handled additively, multiplicatively, or through Schur approximations
- expensive exact block solves can be replaced by approximate preconditioned field iterations
- the outer solver can remain unchanged
The goal of this implementation is to provide that behavior inside PSBLAS without introducing a separate solver framework or requiring users to leave the existing prec%init, prec%set, prec%build, and psb_krylov flow.
Nested Matrix Infrastructure
The nested preconditioner depends on the nested matrix support already added to the repository. A nested matrix stores a global operator as a block matrix over fields. The important pieces are:
psb_d_nest_matrix: user-facing builder/helper for nested matricespsb_d_nest_base_mat: sparse matrix backend that adapts the nested object to the standard PSBLAS sparse matrix interfacea_glob: global sparse wrapper used by ordinary PSBLAS solver callsdesc_glob: global descriptor for the composed vector layout- per-field descriptors for field-local vectors and block operations
- helper routines to restrict/prolong between global and field vector layouts
The nested preconditioner uses the following helper operations from psb_d_nest_base_mat_mod:
psb_d_nest_get_n_fields
psb_d_nest_get_field_owned
psb_d_nest_get_block
psb_d_nest_get_field_desc
psb_d_nest_restrict_field
psb_d_nest_restrict_field_local
psb_d_nest_prolong_field
psb_d_nest_apply_block
These routines are what keep the preconditioner PSBLAS-native. The preconditioner does not invent its own distributed layout. It asks the nested matrix infrastructure how fields map into the global vector and uses the existing descriptors for communication.
Design Goals
The main design goals were:
- expose nested preconditioning through the standard PSBLAS preconditioner API
- reuse existing PSBLAS preconditioner implementations for diagonal field blocks
- preserve parallel correctness by using PSBLAS descriptors and nested field helpers
- support practical field-split compositions first
- allow per-field configuration of block preconditioners
- allow per-field inner Krylov solves without creating a dependency cycle in the library
- keep the first implementation scoped to double precision
The implementation is deliberately conservative. It does not attempt to become a separate nonlinear or block-solver framework. It provides preconditioner application paths that can be used by existing PSBLAS Krylov solvers.
Mathematical Model
Assume a nested matrix with m fields:
[ A_11 A_12 ... A_1m ]
A = [ A_21 A_22 ... A_2m ]
[ ... ... ... ... ]
[ A_m1 A_m2 ... A_mm ]
For field i, let B_i denote the field preconditioner built from the diagonal block A_ii. A global vector x is restricted into field components:
x -> (x_1, x_2, ..., x_m)
The nested preconditioner applies field solves and then prolongs field corrections back into the global vector layout:
z_i ~= B_i r_i
z = prolong(z_1, z_2, ..., z_m)
The different compositions change how field residuals are formed and how off-diagonal blocks are used.
Additive And Diagonal Composition
The additive composition is the block-Jacobi field split:
z_i = B_i r_i, i = 1, ..., m
All fields are solved independently from the original restricted residual. Off-diagonal blocks are ignored during the preconditioner application. This is the simplest and most parallel composition.
For a three-field system:
[ A_11 A_12 A_13 ]
A = [ A_21 A_22 A_23 ]
[ A_31 A_32 A_33 ]
and residual:
[ r_1 ]
r = [ r_2 ]
[ r_3 ]
the additive preconditioner applies:
z_1 = B_1 r_1
z_2 = B_2 r_2
z_3 = B_3 r_3
Graphically:
r_1 --> B_1 --> z_1
r_2 --> B_2 --> z_2
r_3 --> B_3 --> z_3
There is no dependency between field solves. Field 1 does not wait for field 2 or field 3; field 2 does not use corrections from field 1 or field 3; field 3 is also independent. In matrix terms, additive composition approximates the inverse of the block diagonal part:
[ A_11 0 0 ]^{-1}
P^{-1} ~[ 0 A_22 0 ]
[ 0 0 A_33 ]
but with B_i in place of exact A_ii^{-1}:
[ B_1 0 0 ]
P^{-1} =[ 0 B_2 0 ]
[ 0 0 B_3 ]
This is why additive composition is robust as a first preconditioner: it is cheap, parallel, and easy to reason about. Its weakness is that it does not directly account for coupling terms like A_12, A_21, A_23, or A_32. The outer Krylov method must handle those couplings.
DIAGONAL and ADDITIVE are treated as equivalent names in the implementation.
Multiplicative Composition
The multiplicative composition is a field Gauss-Seidel sweep. Fields are visited in order. For field i, the residual is corrected using already computed field updates:
r_i^hat = r_i - sum_{j < i} A_ij z_j
z_i = B_i r_i^hat
The implementation uses nested block application for the off-diagonal products and prolongs intermediate field corrections into global work storage as needed. This lets later fields see the effect of earlier field corrections.
For a three-field forward sweep, the fields are processed as 1, 2, 3.
Field 1 has nothing before it:
rhs_1 = r_1
z_1 = B_1 rhs_1
Field 2 can use the field-1 correction:
rhs_2 = r_2 - A_21 z_1
z_2 = B_2 rhs_2
Field 3 can use both earlier corrections:
rhs_3 = r_3 - A_31 z_1 - A_32 z_2
z_3 = B_3 rhs_3
Graphically:
r_1 ----------------------> B_1 ---> z_1
|
v
r_2 - A_21 z_1 ----------> B_2 ---> z_2
|
v
r_3 - A_31 z_1 - A_32 z_2 -> B_3 ---> z_3
This resembles a lower triangular block solve. In exact block form, replacing B_i by A_ii^{-1} would correspond to applying an approximate inverse of:
[ A_11 0 0 ]
[ A_21 A_22 0 ]
[ A_31 A_32 A_33 ]
The actual preconditioner does not assemble this triangular matrix. It computes the needed off-diagonal products with the existing nested blocks and solves each field with psb_d_nested_field_solve.
Symmetric Multiplicative Composition
The symmetric multiplicative composition performs a forward sweep followed by a backward sweep:
forward: i = 1, ..., m
backward: i = m, ..., 1
The same sweep kernel is reused in both directions. This gives a symmetric-style block Gauss-Seidel preconditioner when the underlying block choices are compatible with the matrix properties.
For a three-field backward sweep, the fields are processed as 3, 2, 1.
Field 3 has nothing above it in this ordering:
rhs_3 = r_3
z_tilde_3 = B_3 rhs_3
Field 2 can now use the already computed field-3 correction:
rhs_2 = r_2 - A_23 z_tilde_3
z_tilde_2 = B_2 rhs_2
Field 1 can use both later-field corrections:
rhs_1 = r_1 - A_12 z_tilde_2 - A_13 z_tilde_3
z_tilde_1 = B_1 rhs_1
Graphically:
r_3 ------------------------------> B_3 ---> z_tilde_3
|
v
r_2 - A_23 z_tilde_3 ------------> B_2 ---> z_tilde_2
|
v
r_1 - A_12 z_tilde_2 - A_13 z_tilde_3 -> B_1 ---> z_tilde_1
This resembles an upper triangular block solve:
[ A_11 A_12 A_13 ]
[ 0 A_22 A_23 ]
[ 0 0 A_33 ]
The symmetric multiplicative composition combines the information flow from both directions. The forward sweep accounts for lower-block couplings such as A_21, A_31, and A_32; the backward sweep accounts for upper-block couplings such as A_12, A_13, and A_23. This is more expensive than additive composition because fields are no longer independent, but it can be much stronger when coupling terms dominate convergence.
Schur-Style Composition
Schur-complement preconditioning starts from the two-field coupled system:
[ A_11 A_12 ]
A = [ A_21 A_22 ]
and:
[ x_1 ] [ r_1 ]
x = [ x_2 ], r = [ r_2 ]
The block system A x = r expands to:
A_11 x_1 + A_12 x_2 = r_1
A_21 x_1 + A_22 x_2 = r_2
To eliminate field 1, solve the first equation for x_1:
A_11 x_1 = r_1 - A_12 x_2
x_1 = A_11^{-1} (r_1 - A_12 x_2)
Insert this expression into the second equation:
A_21 A_11^{-1} (r_1 - A_12 x_2) + A_22 x_2 = r_2
Rearranging gives:
(A_22 - A_21 A_11^{-1} A_12) x_2 = r_2 - A_21 A_11^{-1} r_1
The matrix:
S = A_22 - A_21 A_11^{-1} A_12
is the Schur complement.
This is important because the original coupled system has been reduced to three conceptual steps:
- solve a field-1 problem
- solve a field-2 Schur problem
- recover or correct field 1
That pattern is the basis of many modern multiphysics and saddle-point preconditioners.
The exact block LU factorization is:
[ I 0 ] [ A_11 0 ] [ I A_11^{-1} A_12 ]
A = [ A_21 A_11^{-1} I ] [ 0 S ] [ 0 I ]
The diagonal blocks in this factorization are A_11 and S; the remaining factors are triangular. This immediately suggests a preconditioner based on applying approximate triangular solves and approximate inverses for the diagonal factors.
In exact arithmetic, the inverse action would involve:
P^{-1} = U^{-1} D^{-1} L^{-1}
D^{-1} = [ A_11^{-1} 0 ]
[ 0 S^{-1} ]
In practice, nobody wants to explicitly compute the exact Schur complement for large sparse problems. The problem is A_11^{-1}. Even if A_11, A_12, and A_21 are sparse, A_11^{-1} is generally dense, and therefore:
A_21 A_11^{-1} A_12
can be dense or nearly dense. Assembling S would destroy sparsity and can make storage and computation infeasible for large systems.
The nested preconditioner therefore uses approximations:
B_1 ~= A_11^{-1}
B_2 ~= A_22^{-1} or B_2 ~= S^{-1}
where B_1 and B_2 are field solves provided by psb_d_nested_field_solve. A field solve may be a direct block preconditioner application, or it may use the independent per-field inner Krylov context if INNER_SOLVE is enabled.
The implementation provides three Schur-style compositions.
Lower Schur Variant
The lower Schur variant computes:
z_1 = B_1 r_1
z_2 = B_2 (r_2 - A_21 z_1)
Substituting the first equation into the second gives:
z_2 = B_2 (r_2 - A_21 B_1 r_1)
This is the approximate forward solve associated with the lower triangular block factor.
r_1 --> B_1 --> z_1
|
v
A_21 z_1
|
v
r_2 - A_21 z_1 --> B_2 --> z_2
Upper Schur Variant
The upper Schur variant reverses the ordering:
z_2 = B_2 r_2
z_1 = B_1 (r_1 - A_12 z_2)
This corresponds to applying the upper triangular block factor:
r_2 --> B_2 --> z_2
|
v
A_12 z_2
|
v
r_1 - A_12 z_2 --> B_1 --> z_1
Full Schur Variant
The full Schur variant applies the lower-style solve and then corrects field 1 with the upper coupling:
z_1 = B_1 r_1
z_2 = B_2 (r_2 - A_21 z_1)
z_1 = z_1 - B_1 A_12 z_2
This approximates the full U^{-1} D^{-1} L^{-1} block-LU inverse action. It is generally stronger than purely additive block-Jacobi or a single directional block sweep because it accounts for both off-diagonal couplings.
Matrix-Free Schur Action
The most important implementation detail is that the code can apply a Schur action without assembling S.
Given a field-2 vector x, the matrix-free Schur action computes:
y = S x
through the sequence:
t = A_12 x
u = B_1 t
v = A_21 u
y = A_22 x - v
or equivalently:
y = A_22 x - A_21 B_1 A_12 x
Graphically:
x
|
v
A_12
|
v
B_1
|
v
A_21
|
v
subtract from A_22 x
|
v
y
No Schur matrix is ever stored. The code only needs block products and field solves.
Richardson Schur Solve
When SCHUR_SOLVE = MATRIX_FREE, the preconditioner needs an approximation to S^{-1} even though it only has a routine for S x. The implementation therefore uses a small preconditioned Richardson iteration:
z_{k+1} = z_k + M^{-1} (b - S z_k)
where M^{-1} is the field-2 solve, implemented through psb_d_nested_field_solve and usually acting like B_2.
Each Richardson step uses:
- the matrix-free Schur product
S z_k - a Schur residual
b - S z_k - the field-2 preconditioner or field-2 inner solve to correct
z_k
This gives a cheap inner solve for the Schur complement without assembling or storing S.
The implementation does not assemble an exact sparse Schur complement. Instead, it applies Schur actions using:
- diagonal field solves through
psb_d_nested_field_solve - off-diagonal products through
psb_d_nest_apply_block - global/field work vectors managed by the nested matrix helpers
Supported Schur compositions are:
SCHUR_LOWERSCHUR_UPPERSCHUR_FULL
The Schur solve mode is controlled by SCHUR_SOLVE:
A22: use the field-2 block preconditioner as the Schur approximationMATRIX_FREE: use a small Richardson iteration around the matrix-free Schur action
The MATRIX_FREE mode is controlled by SCHUR_MAXIT and SCHUR_TOL.
Implementation Structure
The core implementation lives in prec/psb_d_nestedprec.f90, which defines psb_d_nestedprec and the concrete preconditioner type:
type :: psb_d_nested_block_prec
character(len=16) :: ptype
class(psb_d_base_prec_type), allocatable :: pc
end type psb_d_nested_block_prec
type :: psb_d_nested_krylov_context
logical :: enabled
character(len=16) :: method
integer(psb_ipk_) :: itmax
integer(psb_ipk_) :: itrace
integer(psb_ipk_) :: istop
real(psb_dpk_) :: tol
end type psb_d_nested_krylov_context
type, extends(psb_d_base_prec_type) :: psb_d_nested_prec_type
integer(psb_ipk_) :: composition
character(len=32) :: composition_name
character(len=16) :: default_block_ptype
integer(psb_ipk_) :: schur_solve
character(len=32) :: schur_solve_name
integer(psb_ipk_) :: schur_maxit
real(psb_dpk_) :: schur_tol
type(psb_d_nested_krylov_context) :: default_krylov
integer(psb_ipk_) :: nfields
type(psb_d_nest_base_mat), pointer :: nest_op
type(psb_d_nested_block_prec), allocatable :: blocks(:)
character(len=16), allocatable :: field_block_ptype(:)
type(psb_d_nested_krylov_context), allocatable :: field_krylov(:)
type(psb_d_nested_iopt), allocatable :: field_iopts(:)
type(psb_d_nested_ropt), allocatable :: field_ropts(:)
type(psb_d_nested_copt), allocatable :: field_copts(:)
contains
procedure, pass(prec) :: d_apply_v => psb_d_nested_apply_vect
procedure, pass(prec) :: d_apply => psb_d_nested_apply
procedure, pass(prec) :: precbld => psb_d_nested_precbld
procedure, pass(prec) :: precinit => psb_d_nested_precinit
procedure, pass(prec) :: precseti => psb_d_nested_precseti
procedure, pass(prec) :: precsetr => psb_d_nested_precsetr
procedure, pass(prec) :: precsetc => psb_d_nested_precsetc
procedure, pass(prec) :: precdescr => psb_d_nested_precdescr
procedure, pass(prec) :: dump => psb_d_nested_dump
procedure, pass(prec) :: clone => psb_d_nested_clone
procedure, pass(prec) :: free => psb_d_nested_precfree
procedure, pass(prec) :: sizeof => psb_d_nested_sizeof
procedure, pass(prec) :: get_nzeros => psb_d_nested_get_nzeros
end type psb_d_nested_prec_type
The defaults set by psb_d_nested_precinit are:
- composition:
ADDITIVE - default block solve:
DIAG - Schur solve:
A22 - Schur maximum iterations:
4 - Schur tolerance:
0.0 - default inner Krylov disabled
- default inner Krylov method:
CG - default inner Krylov maximum iterations:
20 - default inner Krylov trace:
-1 - default inner Krylov stopping rule:
2 - default inner Krylov tolerance:
1.0e-6
The preconditioner stores:
- the selected field-split composition
- the default block preconditioner type
- optional per-field block preconditioner types
- Schur solve configuration
- default and per-field inner Krylov configuration
- pending per-field sub-preconditioner options
- one built block preconditioner per field
- a pointer to the nested matrix backend
The preconditioner does not own the nested matrix. It stores a pointer to the nested operator exposed by the global sparse matrix wrapper.
Build Phase
The build phase is implemented by psb_d_nested_precbld.
At build time, the preconditioner:
- releases previously built field preconditioners while keeping user configuration
- verifies that the input sparse matrix is backed by a nested base matrix
- stores a pointer to the nested operator
- queries the number of fields
- allocates one
psb_d_nested_block_precentry per field - obtains each diagonal block
A_ii - obtains each field descriptor
- allocates the selected PSBLAS block preconditioner type for that field
- replays routed per-field options onto the block preconditioner
- builds the field block preconditioner on
A_ii
The build path deliberately uses existing PSBLAS preconditioner implementations. The nested preconditioner is a composition layer over ordinary field preconditioners, not a replacement for ILU, diagonal, approximate inverse, or block-Jacobi preconditioners.
Field Block Preconditioners
The nested block solve type is configured through BLOCK_SOLVE or NEST_BLOCK_SOLVE.
Supported block preconditioner type names currently include:
DIAGBJACNONE
The implementation allocates the corresponding concrete PSBLAS preconditioner for each diagonal field block. idx can be used on prec%set calls to override a particular field:
call prec%set('BLOCK_SOLVE', 'BJAC', info, idx=1)
call prec%set('BLOCK_SOLVE', 'DIAG', info, idx=2)
Field-level sub-preconditioner tuning is also routed through psb_d_prec_type_impl.f90. For nested preconditioners, options such as SUB_SOLVE, SUB_FILLIN, SUB_ILUTHRS, INV_FILLIN, and INV_THRESH are stored on the nested preconditioner and replayed onto the selected field block preconditioners during build.
This matters because the field block preconditioners do not exist until precbld. The routed option storage lets users configure fields before the nested object has been built.
Independent Per-Field Inner Krylov Contexts
The implementation includes independent per-field inner Krylov solve contexts:
type(psb_d_nested_krylov_context) :: default_krylov
type(psb_d_nested_krylov_context), allocatable :: field_krylov(:)
The default context applies to all fields when idx is not supplied. Field-specific contexts are allocated and updated when idx is supplied.
Supported inner methods are:
CGBICGSTABNONE, which disables the inner Krylov path for that field
The inner Krylov options are:
INNER_SOLVE, also accepted asKRYLOV_SOLVEorFIELD_SOLVEINNER_MAXIT, also accepted asINNER_ITMAX,KRYLOV_MAXIT, orKRYLOV_ITMAXINNER_TOL, also accepted asKRYLOV_TOLINNER_ITRACE, also accepted asKRYLOV_ITRACEINNER_ISTOP, also accepted asKRYLOV_ISTOP
Example:
call prec%set('INNER_SOLVE', 'CG', info, idx=1)
call prec%set('INNER_TOL', 1.0d-8, info, idx=1)
call prec%set('INNER_MAXIT', 50, info, idx=1)
call prec%set('INNER_SOLVE', 'BICGSTAB', info, idx=2)
call prec%set('INNER_TOL', 1.0d-6, info, idx=2)
call prec%set('INNER_MAXIT', 30, info, idx=2)
When a field has inner Krylov enabled, psb_d_nested_field_solve dispatches to:
psb_d_nested_inner_cgpsb_d_nested_inner_bicgstab
Those routines apply the field diagonal matrix through psb_d_nested_field_matvec and use the field block preconditioner as the inner preconditioner. When inner Krylov is disabled, psb_d_nested_field_solve directly applies the field block preconditioner.
The implementation does not call the top-level psb_krylov routine for inner solves. That choice avoids a build dependency cycle: the Krylov solvers live above the preconditioner layer, while prec must compile without depending on linsolve. The result is a PSBLAS-local inner Krylov implementation that provides independent per-field solve contexts without conflicting with the existing framework.
Apply Phase
The apply phase is implemented by:
psb_d_nested_apply_vectfor PSBLAS vector objectspsb_d_nested_applyfor raw double-precision arrays
The vector-object path copies into raw array buffers, calls the array path, and copies back. The array path handles:
- optional transpose rejection
- alpha/beta scaling
- work-vector management
- dispatch to the selected composition
The selected composition is then applied by one of:
psb_d_nested_add_applypsb_d_nested_sweeppsb_d_nested_apply_schur
All paths eventually solve field systems through psb_d_nested_field_solve, so per-field block-preconditioner settings and per-field inner Krylov settings are shared by additive, multiplicative, symmetric, and Schur-style compositions.
Additive Apply Path
psb_d_nested_add_apply performs the following for each field:
- restrict the global residual to the field vector
- solve the field problem through
psb_d_nested_field_solve - prolong the field correction into the global output vector
Because the additive path does not use off-diagonal blocks, all field solves are independent at the mathematical level.
Multiplicative Apply Path
psb_d_nested_sweep implements both forward and backward field sweeps. For each field:
- restrict the original residual to the current field
- subtract the effect of already computed field corrections through off-diagonal block products
- solve the corrected field residual
- prolong the correction into global work storage
The same routine is used for:
MULTIPLICATIVE: one forward sweepSYMMETRIC_MULTIPLICATIVE: a forward sweep followed by a backward sweep
The sweep implementation uses field descriptors and global work vectors so that off-diagonal products see the correct local and halo data.
Schur Apply Path
psb_d_nested_apply_schur is defined for exactly two fields. It implements lower, upper, and full Schur-style block factorizations using field solves and block products.
Supporting routines are:
psb_d_nested_schur_actionpsb_d_nested_schur_solve
psb_d_nested_schur_action applies the matrix-free Schur operator:
S x_2 = A_22 x_2 - A_21 B_1 A_12 x_2
where B_1 is the field-1 block solve or inner field solve. psb_d_nested_schur_solve either applies the field-2 block solve directly (A22) or runs a short preconditioned Richardson iteration using the matrix-free Schur action (MATRIX_FREE).
Public Options
The nested preconditioner exposes integer option keys in psb_d_nestedprec:
integer(psb_ipk_), parameter :: psb_d_nested_composition_ = 9101
integer(psb_ipk_), parameter :: psb_d_nested_block_solve_ = 9102
integer(psb_ipk_), parameter :: psb_d_nested_schur_solve_ = 9103
integer(psb_ipk_), parameter :: psb_d_nested_schur_maxit_ = 9104
integer(psb_ipk_), parameter :: psb_d_nested_schur_tol_ = 9105
integer(psb_ipk_), parameter :: psb_d_nested_inner_solve_ = 9106
integer(psb_ipk_), parameter :: psb_d_nested_inner_maxit_ = 9107
integer(psb_ipk_), parameter :: psb_d_nested_inner_tol_ = 9108
integer(psb_ipk_), parameter :: psb_d_nested_inner_itrace_ = 9109
integer(psb_ipk_), parameter :: psb_d_nested_inner_istop_ = 9110
The user-facing string options routed by psb_d_prec_type_impl.f90 are:
| Option | Type | Values | Scope |
|---|---|---|---|
COMPOSITION, NEST_COMPOSITION |
character | ADDITIVE, DIAGONAL, MULTIPLICATIVE, SYMMETRIC_MULTIPLICATIVE, SCHUR_LOWER, SCHUR_UPPER, SCHUR_FULL |
global |
BLOCK_SOLVE, NEST_BLOCK_SOLVE |
character | DIAG, BJAC, NONE |
global or idx field |
SCHUR_SOLVE, NEST_SCHUR_SOLVE |
character | A22, MATRIX_FREE |
global |
SCHUR_MAXIT, NEST_SCHUR_MAXIT |
integer | nonnegative iteration count | global |
SCHUR_TOL, NEST_SCHUR_TOL |
real | nonnegative tolerance | global |
INNER_SOLVE, KRYLOV_SOLVE, FIELD_SOLVE |
character | NONE, CG, BICGSTAB |
global or idx field |
INNER_MAXIT, INNER_ITMAX, KRYLOV_MAXIT, KRYLOV_ITMAX |
integer | positive iteration count | global or idx field |
INNER_TOL, KRYLOV_TOL |
real | nonnegative tolerance | global or idx field |
INNER_ITRACE, KRYLOV_ITRACE |
integer | trace level | global or idx field |
INNER_ISTOP, KRYLOV_ISTOP |
integer | stopping rule code | global or idx field |
SUB_SOLVE |
character | routed field sub-preconditioner selection | global or idx field |
SUB_FILLIN |
integer | routed ILU fill setting | global or idx field |
SUB_ILUTHRS |
real | routed ILUT threshold | global or idx field |
INV_FILLIN |
integer | routed inverse fill setting | global or idx field |
INV_THRESH |
real | routed inverse threshold | global or idx field |
Example Usage
A typical nested preconditioner setup looks like:
call prec%init('NEST', info)
call prec%set('COMPOSITION', 'SCHUR_FULL', info)
call prec%set('SCHUR_SOLVE', 'MATRIX_FREE', info)
call prec%set('SCHUR_MAXIT', 6, info)
call prec%set('SCHUR_TOL', 1.0d-8, info)
call prec%set('BLOCK_SOLVE', 'BJAC', info, idx=1)
call prec%set('BLOCK_SOLVE', 'DIAG', info, idx=2)
call prec%set('INNER_SOLVE', 'CG', info, idx=1)
call prec%set('INNER_MAXIT', 40, info, idx=1)
call prec%set('INNER_TOL', 1.0d-8, info, idx=1)
call prec%build(nested_matrix%a_glob, nested_matrix%desc_glob, info)
call psb_krylov('BICGSTAB', nested_matrix%a_glob, prec, b, x, eps, &
& nested_matrix%desc_glob, info)
This keeps the outer solve in the existing PSBLAS API while allowing field-specific behavior inside the nested preconditioner.
Factory And API Integration
psb_dprecinit recognizes the nested preconditioner name:
call prec%init('NEST', info)
and allocates:
psb_d_nested_prec_type
The generic prec%set wrappers in prec/impl/psb_d_prec_type_impl.f90 route nested-specific options only when the underlying concrete preconditioner is psb_d_nested_prec_type. For non-nested preconditioners, the existing behavior is preserved where appropriate.
This means nested-specific settings do not change the behavior of ordinary PSBLAS preconditioners.
Parallel Layout Considerations
Parallel correctness depends on using PSBLAS descriptors and nested helper routines consistently.
Each field has:
- its own descriptor
- local owned rows
- possible halo entries needed by block products
- mappings between field-local vectors and global nested vectors
The preconditioner uses:
psb_d_nest_restrict_fieldto extract field data from global vectorspsb_d_nest_restrict_field_localwhere only local field data are neededpsb_d_nest_prolong_fieldto insert field corrections into global vectorspsb_d_nest_apply_blockfor diagonal and off-diagonal block products
Multiplicative and Schur compositions require more care than additive composition because they use off-diagonal products. Intermediate field corrections are prolonged into global work storage before being consumed by later block products. This ensures that distributed halo data are consistent with the nested matrix layout.
Validation
The repository contains nested preconditioner tests under:
test/nested
The main nested preconditioner drivers are:
test/nested/psb_d_nest_prec_cg_test.F90test/nested/psb_d_nest_mult_prec_cg_test.F90test/nested/psb_d_nest_schur_prec_cg_test.F90
These tests exercise:
- nested matrix construction
- nested preconditioner initialization
- additive/diagonal composition
- multiplicative and symmetric multiplicative composition
- Schur lower, upper, and full composition
- matrix-free Schur solve mode
- per-field inner Krylov configuration and apply through the Schur preconditioner
- serial and MPI execution paths in the nested examples
Validation of the nested preconditioner examples showed convergence for the additive, multiplicative, and Schur-style configurations on the nested Laplacian test problems.
The Schur nested preconditioner test includes an additional SCHUR_FULL / A22+INNER case. That case configures independent per-field inner Krylov contexts, builds the nested preconditioner, applies it directly to the right-hand side, and verifies a successful finite preconditioner application. The existing Schur cases continue to check outer BiCGSTAB convergence.
The serial and two-rank runs both passed after rebuilding the nested preconditioner object, the factory object, the preconditioner archive, and the test executable.
Current Limitations
The current implementation has the following limitations:
- double precision only
- nested preconditioner name currently exposed as
NEST - transposed preconditioner application is rejected
- Schur-style compositions are implemented for exactly two fields
- matrix-free Schur solve currently uses a small Richardson iteration
- inner Krylov methods currently include
CGandBICGSTAB
Future Work
Recommended next steps are:
- add single, complex, and complex double precision variants
Summary
The nested preconditioner adds a field-split preconditioning layer to PSBLAS while preserving the existing solver and preconditioner API. It reuses standard PSBLAS block preconditioners on diagonal field blocks and combines them through additive, multiplicative, symmetric multiplicative, and Schur-style compositions.
The implementation integrates with:
- nested matrix storage and field mapping
- the
psb_dprec_typefactory - the generic
prec%setconfiguration path - the ordinary PSBLAS preconditioner build/apply lifecycle
- the existing outer
psb_krylovsolver interface
The newer per-field inner Krylov support gives each field an independent local solve context, configured through the same PSBLAS preconditioner option API. It does not conflict with the existing framework because it is implemented inside the preconditioner layer and avoids adding a dependency from prec back to the higher-level Krylov solver layer.