You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

34 KiB

Raw Blame History

Nested Preconditioner Report

Executive Summary

This report documents the double-precision nested preconditioner functionality added to PSBLAS. The implementation provides a PSBLAS-native field-split preconditioner for matrices represented through the nested matrix infrastructure. It is implemented primarily in:

prec/psb_d_nestedprec.f90
prec/impl/psb_d_prec_type_impl.f90
base/modules/serial/psb_d_nest_base_mat_mod.F90
nested test drivers under test/nested

The nested preconditioner is intended for block-structured systems, especially multiphysics and field-split problems, where the global matrix is naturally written as:

      [ A_11  A_12  ... A_1m ]
  A = [ A_21  A_22  ... A_2m ]
      [  ...   ...  ...  ... ]
      [ A_m1  A_m2  ... A_mm ]

Each field owns a PSBLAS descriptor and each block is a normal sparse PSBLAS matrix. The preconditioner extracts the field diagonal blocks, builds ordinary PSBLAS sub-preconditioners on those blocks, and combines the field solves through additive, multiplicative, symmetric multiplicative, or two-field Schur-style compositions.

The implementation now also includes independent per-field inner Krylov solve contexts. These are configured through the existing prec%set option mechanism and are applied inside field solves when requested. The current inner methods are local nested-preconditioner kernels for CG and BICGSTAB, rather than calls into the top-level psb_krylov driver.

Outer Krylov solver (psb_krylov)
    |
    +-- Preconditioner / Block solve
            |
            +-- Field 1 solve
            |      |
            |      +-- Inner CG/BICGSTAB (optional)
            |
            +-- Field 2 solve
                   |
                   +-- Inner CG/BICGSTAB (optional)

PSBLAS Background

PSBLAS separates sparse linear algebra into a few recurring concepts:

a sparse matrix object, such as psb_dspmat_type
a communication descriptor, psb_desc_type, that describes distributed row ownership and halo data
dense vector storage, either raw arrays or PSBLAS vector objects
a preconditioner wrapper, psb_dprec_type, that owns a polymorphic implementation derived from psb_d_base_prec_type
Krylov solvers, such as psb_krylov, that operate on a matrix, a descriptor, and an optional preconditioner

The usual preconditioner lifecycle is:

call prec%init('...', info)
call prec%set('...', value, info)
call prec%build(a, desc_a, info)
call psb_krylov(method, a, prec, b, x, eps, desc_a, info, ...)
call prec%free(info)

The nested preconditioner follows this same lifecycle. From the outside it is just another psb_dprec_type implementation selected by:

call prec%init('NEST', info)

The outer Krylov method does not need to know that the preconditioner is block structured. It applies prec through the standard PSBLAS preconditioner virtual interface.

Motivation

Many application matrices are block systems, not just scalar sparse matrices. Examples include:

coupled PDE systems
saddle point systems
fluid-structure and multiphysics discretizations
systems reordered by physical field or variable type
matrices where different fields require different smoothers or approximate solvers

A scalar preconditioner treats the matrix as a single undifferentiated sparse operator. That is often too weak or too inflexible for block systems. A field-split preconditioner can use the mathematical structure of the problem:

each field can use a different sub-preconditioner
off-diagonal coupling can be handled additively, multiplicatively, or through Schur approximations
expensive exact block solves can be replaced by approximate preconditioned field iterations
the outer solver can remain unchanged

The goal of this implementation is to provide that behavior inside PSBLAS without introducing a separate solver framework or requiring users to leave the existing prec%init, prec%set, prec%build, and psb_krylov flow.

Nested Matrix Infrastructure

The nested preconditioner depends on the nested matrix support already added to the repository. A nested matrix stores a global operator as a block matrix over fields. The important pieces are:

psb_d_nest_matrix: user-facing builder/helper for nested matrices
psb_d_nest_base_mat: sparse matrix backend that adapts the nested object to the standard PSBLAS sparse matrix interface
a_glob: global sparse wrapper used by ordinary PSBLAS solver calls
desc_glob: global descriptor for the composed vector layout
per-field descriptors for field-local vectors and block operations
helper routines to restrict/prolong between global and field vector layouts

The nested preconditioner uses the following helper operations from psb_d_nest_base_mat_mod:

psb_d_nest_get_n_fields
psb_d_nest_get_field_owned
psb_d_nest_get_block
psb_d_nest_get_field_desc
psb_d_nest_restrict_field
psb_d_nest_restrict_field_local
psb_d_nest_prolong_field
psb_d_nest_apply_block

These routines are what keep the preconditioner PSBLAS-native. The preconditioner does not invent its own distributed layout. It asks the nested matrix infrastructure how fields map into the global vector and uses the existing descriptors for communication.

Design Goals

The main design goals were:

expose nested preconditioning through the standard PSBLAS preconditioner API
reuse existing PSBLAS preconditioner implementations for diagonal field blocks
preserve parallel correctness by using PSBLAS descriptors and nested field helpers
support practical field-split compositions first
allow per-field configuration of block preconditioners
allow per-field inner Krylov solves without creating a dependency cycle in the library
keep the first implementation scoped to double precision

The implementation is deliberately conservative. It does not attempt to become a separate nonlinear or block-solver framework. It provides preconditioner application paths that can be used by existing PSBLAS Krylov solvers.

Mathematical Model

Assume a nested matrix with m fields:

      [ A_11  A_12  ... A_1m ]
  A = [ A_21  A_22  ... A_2m ]
      [  ...   ...  ...  ... ]
      [ A_m1  A_m2  ... A_mm ]

For field i, let B_i denote the field preconditioner built from the diagonal block A_ii. A global vector x is restricted into field components:

x -> (x_1, x_2, ..., x_m)

The nested preconditioner applies field solves and then prolongs field corrections back into the global vector layout:

z_i ~= B_i r_i
z   = prolong(z_1, z_2, ..., z_m)

The different compositions change how field residuals are formed and how off-diagonal blocks are used.

Additive And Diagonal Composition

The additive composition is the block-Jacobi field split:

z_i = B_i r_i,  i = 1, ..., m

All fields are solved independently from the original restricted residual. Off-diagonal blocks are ignored during the preconditioner application. This is the simplest and most parallel composition.

For a three-field system:

      [ A_11  A_12  A_13 ]
  A = [ A_21  A_22  A_23 ]
      [ A_31  A_32  A_33 ]

and residual:

    [ r_1 ]
r = [ r_2 ]
    [ r_3 ]

the additive preconditioner applies:

z_1 = B_1 r_1
z_2 = B_2 r_2
z_3 = B_3 r_3

Graphically:

r_1 --> B_1 --> z_1

r_2 --> B_2 --> z_2

r_3 --> B_3 --> z_3

There is no dependency between field solves. Field 1 does not wait for field 2 or field 3; field 2 does not use corrections from field 1 or field 3; field 3 is also independent. In matrix terms, additive composition approximates the inverse of the block diagonal part:

        [ A_11  0     0    ]^{-1}
P^{-1} ~[ 0     A_22  0    ]
        [ 0     0     A_33 ]

but with B_i in place of exact A_ii^{-1}:

        [ B_1  0    0   ]
P^{-1} =[ 0    B_2  0   ]
        [ 0    0    B_3 ]

This is why additive composition is robust as a first preconditioner: it is cheap, parallel, and easy to reason about. Its weakness is that it does not directly account for coupling terms like A_12, A_21, A_23, or A_32. The outer Krylov method must handle those couplings.

DIAGONAL and ADDITIVE are treated as equivalent names in the implementation.

Multiplicative Composition

The multiplicative composition is a field Gauss-Seidel sweep. Fields are visited in order. For field i, the residual is corrected using already computed field updates:

r_i^hat = r_i - sum_{j < i} A_ij z_j
z_i     = B_i r_i^hat

The implementation uses nested block application for the off-diagonal products and prolongs intermediate field corrections into global work storage as needed. This lets later fields see the effect of earlier field corrections.

For a three-field forward sweep, the fields are processed as 1, 2, 3.

Field 1 has nothing before it:

rhs_1 = r_1
z_1   = B_1 rhs_1

Field 2 can use the field-1 correction:

rhs_2 = r_2 - A_21 z_1
z_2   = B_2 rhs_2

Field 3 can use both earlier corrections:

rhs_3 = r_3 - A_31 z_1 - A_32 z_2
z_3   = B_3 rhs_3

Graphically:

r_1 ----------------------> B_1 ---> z_1
                                      |
                                      v
r_2 - A_21 z_1 ----------> B_2 ---> z_2
                                      |
                                      v
r_3 - A_31 z_1 - A_32 z_2 -> B_3 ---> z_3

This resembles a lower triangular block solve. In exact block form, replacing B_i by A_ii^{-1} would correspond to applying an approximate inverse of:

[ A_11  0     0    ]
[ A_21  A_22  0    ]
[ A_31  A_32  A_33 ]

The actual preconditioner does not assemble this triangular matrix. It computes the needed off-diagonal products with the existing nested blocks and solves each field with psb_d_nested_field_solve.

Symmetric Multiplicative Composition

The symmetric multiplicative composition performs a forward sweep followed by a backward sweep:

forward:   i = 1, ..., m
backward:  i = m, ..., 1

The same sweep kernel is reused in both directions. This gives a symmetric-style block Gauss-Seidel preconditioner when the underlying block choices are compatible with the matrix properties.

For a three-field backward sweep, the fields are processed as 3, 2, 1.

Field 3 has nothing above it in this ordering:

rhs_3 = r_3
z_tilde_3 = B_3 rhs_3

Field 2 can now use the already computed field-3 correction:

rhs_2 = r_2 - A_23 z_tilde_3
z_tilde_2 = B_2 rhs_2

Field 1 can use both later-field corrections:

rhs_1 = r_1 - A_12 z_tilde_2 - A_13 z_tilde_3
z_tilde_1 = B_1 rhs_1

Graphically:

r_3 ------------------------------> B_3 ---> z_tilde_3
                                              |
                                              v
r_2 - A_23 z_tilde_3 ------------> B_2 ---> z_tilde_2
                                              |
                                              v
r_1 - A_12 z_tilde_2 - A_13 z_tilde_3 -> B_1 ---> z_tilde_1

This resembles an upper triangular block solve:

[ A_11  A_12  A_13 ]
[ 0     A_22  A_23 ]
[ 0     0     A_33 ]

The symmetric multiplicative composition combines the information flow from both directions. The forward sweep accounts for lower-block couplings such as A_21, A_31, and A_32; the backward sweep accounts for upper-block couplings such as A_12, A_13, and A_23. This is more expensive than additive composition because fields are no longer independent, but it can be much stronger when coupling terms dominate convergence.

Schur-Style Composition

Schur-complement preconditioning starts from the two-field coupled system:

    [ A_11  A_12 ]
A = [ A_21  A_22 ]

and:

    [ x_1 ]        [ r_1 ]
x = [ x_2 ],  r = [ r_2 ]

The block system A x = r expands to:

A_11 x_1 + A_12 x_2 = r_1
A_21 x_1 + A_22 x_2 = r_2

To eliminate field 1, solve the first equation for x_1:

A_11 x_1 = r_1 - A_12 x_2
x_1      = A_11^{-1} (r_1 - A_12 x_2)

Insert this expression into the second equation:

A_21 A_11^{-1} (r_1 - A_12 x_2) + A_22 x_2 = r_2

Rearranging gives:

(A_22 - A_21 A_11^{-1} A_12) x_2 = r_2 - A_21 A_11^{-1} r_1

The matrix:

S = A_22 - A_21 A_11^{-1} A_12

is the Schur complement.

This is important because the original coupled system has been reduced to three conceptual steps:

solve a field-1 problem
solve a field-2 Schur problem
recover or correct field 1

That pattern is the basis of many modern multiphysics and saddle-point preconditioners.

The exact block LU factorization is:

    [ I              0 ] [ A_11  0 ] [ I  A_11^{-1} A_12 ]
A = [ A_21 A_11^{-1} I ] [ 0     S ] [ 0  I              ]

The diagonal blocks in this factorization are A_11 and S; the remaining factors are triangular. This immediately suggests a preconditioner based on applying approximate triangular solves and approximate inverses for the diagonal factors.

In exact arithmetic, the inverse action would involve:

P^{-1} = U^{-1} D^{-1} L^{-1}

D^{-1} = [ A_11^{-1}  0      ]
         [ 0          S^{-1} ]

In practice, nobody wants to explicitly compute the exact Schur complement for large sparse problems. The problem is A_11^{-1}. Even if A_11, A_12, and A_21 are sparse, A_11^{-1} is generally dense, and therefore:

A_21 A_11^{-1} A_12

can be dense or nearly dense. Assembling S would destroy sparsity and can make storage and computation infeasible for large systems.

The nested preconditioner therefore uses approximations:

B_1 ~= A_11^{-1}
B_2 ~= A_22^{-1}  or  B_2 ~= S^{-1}

where B_1 and B_2 are field solves provided by psb_d_nested_field_solve. A field solve may be a direct block preconditioner application, or it may use the independent per-field inner Krylov context if INNER_SOLVE is enabled.

The implementation provides three Schur-style compositions.

Lower Schur Variant

The lower Schur variant computes:

z_1 = B_1 r_1
z_2 = B_2 (r_2 - A_21 z_1)

Substituting the first equation into the second gives:

z_2 = B_2 (r_2 - A_21 B_1 r_1)

This is the approximate forward solve associated with the lower triangular block factor.

r_1 --> B_1 --> z_1
                  |
                  v
              A_21 z_1
                  |
                  v
r_2 - A_21 z_1 --> B_2 --> z_2

Upper Schur Variant

The upper Schur variant reverses the ordering:

z_2 = B_2 r_2
z_1 = B_1 (r_1 - A_12 z_2)

This corresponds to applying the upper triangular block factor:

r_2 --> B_2 --> z_2
                  |
                  v
              A_12 z_2
                  |
                  v
r_1 - A_12 z_2 --> B_1 --> z_1

Full Schur Variant

The full Schur variant applies the lower-style solve and then corrects field 1 with the upper coupling:

z_1 = B_1 r_1
z_2 = B_2 (r_2 - A_21 z_1)
z_1 = z_1 - B_1 A_12 z_2

This approximates the full U^{-1} D^{-1} L^{-1} block-LU inverse action. It is generally stronger than purely additive block-Jacobi or a single directional block sweep because it accounts for both off-diagonal couplings.

Matrix-Free Schur Action

The most important implementation detail is that the code can apply a Schur action without assembling S.

Given a field-2 vector x, the matrix-free Schur action computes:

y = S x

through the sequence:

t = A_12 x
u = B_1 t
v = A_21 u
y = A_22 x - v

or equivalently:

y = A_22 x - A_21 B_1 A_12 x

Graphically:

x
|
v
A_12
|
v
B_1
|
v
A_21
|
v
subtract from A_22 x
|
v
y

No Schur matrix is ever stored. The code only needs block products and field solves.

Richardson Schur Solve

When SCHUR_SOLVE = MATRIX_FREE, the preconditioner needs an approximation to S^{-1} even though it only has a routine for S x. The implementation therefore uses a small preconditioned Richardson iteration:

z_{k+1} = z_k + M^{-1} (b - S z_k)

where M^{-1} is the field-2 solve, implemented through psb_d_nested_field_solve and usually acting like B_2.

Each Richardson step uses:

the matrix-free Schur product S z_k
a Schur residual b - S z_k
the field-2 preconditioner or field-2 inner solve to correct z_k

This gives a cheap inner solve for the Schur complement without assembling or storing S.

The implementation does not assemble an exact sparse Schur complement. Instead, it applies Schur actions using:

diagonal field solves through psb_d_nested_field_solve
off-diagonal products through psb_d_nest_apply_block
global/field work vectors managed by the nested matrix helpers

Supported Schur compositions are:

SCHUR_LOWER
SCHUR_UPPER
SCHUR_FULL

The Schur solve mode is controlled by SCHUR_SOLVE:

A22: use the field-2 block preconditioner as the Schur approximation
MATRIX_FREE: use a small Richardson iteration around the matrix-free Schur action

The MATRIX_FREE mode is controlled by SCHUR_MAXIT and SCHUR_TOL.

Implementation Structure

The core implementation lives in prec/psb_d_nestedprec.f90, which defines psb_d_nestedprec and the concrete preconditioner type:

type :: psb_d_nested_block_prec
  character(len=16) :: ptype
  class(psb_d_base_prec_type), allocatable :: pc
end type psb_d_nested_block_prec

type :: psb_d_nested_krylov_context
  logical :: enabled
  character(len=16) :: method
  integer(psb_ipk_) :: itmax
  integer(psb_ipk_) :: itrace
  integer(psb_ipk_) :: istop
  real(psb_dpk_) :: tol
end type psb_d_nested_krylov_context

type, extends(psb_d_base_prec_type) :: psb_d_nested_prec_type
  integer(psb_ipk_) :: composition
  character(len=32) :: composition_name
  character(len=16) :: default_block_ptype
  integer(psb_ipk_) :: schur_solve
  character(len=32) :: schur_solve_name
  integer(psb_ipk_) :: schur_maxit
  real(psb_dpk_) :: schur_tol
  type(psb_d_nested_krylov_context) :: default_krylov
  integer(psb_ipk_) :: nfields
  type(psb_d_nest_base_mat), pointer :: nest_op
  type(psb_d_nested_block_prec), allocatable :: blocks(:)
  character(len=16), allocatable :: field_block_ptype(:)
  type(psb_d_nested_krylov_context), allocatable :: field_krylov(:)
  type(psb_d_nested_iopt), allocatable :: field_iopts(:)
  type(psb_d_nested_ropt), allocatable :: field_ropts(:)
  type(psb_d_nested_copt), allocatable :: field_copts(:)
contains
  procedure, pass(prec) :: d_apply_v => psb_d_nested_apply_vect
  procedure, pass(prec) :: d_apply   => psb_d_nested_apply
  procedure, pass(prec) :: precbld   => psb_d_nested_precbld
  procedure, pass(prec) :: precinit  => psb_d_nested_precinit
  procedure, pass(prec) :: precseti  => psb_d_nested_precseti
  procedure, pass(prec) :: precsetr  => psb_d_nested_precsetr
  procedure, pass(prec) :: precsetc  => psb_d_nested_precsetc
  procedure, pass(prec) :: precdescr => psb_d_nested_precdescr
  procedure, pass(prec) :: dump      => psb_d_nested_dump
  procedure, pass(prec) :: clone     => psb_d_nested_clone
  procedure, pass(prec) :: free      => psb_d_nested_precfree
  procedure, pass(prec) :: sizeof    => psb_d_nested_sizeof
  procedure, pass(prec) :: get_nzeros => psb_d_nested_get_nzeros
end type psb_d_nested_prec_type

The defaults set by psb_d_nested_precinit are:

composition: ADDITIVE
default block solve: DIAG
Schur solve: A22
Schur maximum iterations: 4
Schur tolerance: 0.0
default inner Krylov disabled
default inner Krylov method: CG
default inner Krylov maximum iterations: 20
default inner Krylov trace: -1
default inner Krylov stopping rule: 2
default inner Krylov tolerance: 1.0e-6

The preconditioner stores:

the selected field-split composition
the default block preconditioner type
optional per-field block preconditioner types
Schur solve configuration
default and per-field inner Krylov configuration
pending per-field sub-preconditioner options
one built block preconditioner per field
a pointer to the nested matrix backend

The preconditioner does not own the nested matrix. It stores a pointer to the nested operator exposed by the global sparse matrix wrapper.

Build Phase

The build phase is implemented by psb_d_nested_precbld.

At build time, the preconditioner:

releases previously built field preconditioners while keeping user configuration
verifies that the input sparse matrix is backed by a nested base matrix
stores a pointer to the nested operator
queries the number of fields
allocates one psb_d_nested_block_prec entry per field
obtains each diagonal block A_ii
obtains each field descriptor
allocates the selected PSBLAS block preconditioner type for that field
replays routed per-field options onto the block preconditioner
builds the field block preconditioner on A_ii

The build path deliberately uses existing PSBLAS preconditioner implementations. The nested preconditioner is a composition layer over ordinary field preconditioners, not a replacement for ILU, diagonal, approximate inverse, or block-Jacobi preconditioners.

Field Block Preconditioners

The nested block solve type is configured through BLOCK_SOLVE or NEST_BLOCK_SOLVE.

Supported block preconditioner type names currently include:

DIAG
BJAC
NONE

The implementation allocates the corresponding concrete PSBLAS preconditioner for each diagonal field block. idx can be used on prec%set calls to override a particular field:

call prec%set('BLOCK_SOLVE', 'BJAC', info, idx=1)
call prec%set('BLOCK_SOLVE', 'DIAG', info, idx=2)

Field-level sub-preconditioner tuning is also routed through psb_d_prec_type_impl.f90. For nested preconditioners, options such as SUB_SOLVE, SUB_FILLIN, SUB_ILUTHRS, INV_FILLIN, and INV_THRESH are stored on the nested preconditioner and replayed onto the selected field block preconditioners during build.

This matters because the field block preconditioners do not exist until precbld. The routed option storage lets users configure fields before the nested object has been built.

Independent Per-Field Inner Krylov Contexts

The implementation includes independent per-field inner Krylov solve contexts:

type(psb_d_nested_krylov_context) :: default_krylov
type(psb_d_nested_krylov_context), allocatable :: field_krylov(:)

The default context applies to all fields when idx is not supplied. Field-specific contexts are allocated and updated when idx is supplied.

Supported inner methods are:

CG
BICGSTAB
NONE, which disables the inner Krylov path for that field

The inner Krylov options are:

INNER_SOLVE, also accepted as KRYLOV_SOLVE or FIELD_SOLVE
INNER_MAXIT, also accepted as INNER_ITMAX, KRYLOV_MAXIT, or KRYLOV_ITMAX
INNER_TOL, also accepted as KRYLOV_TOL
INNER_ITRACE, also accepted as KRYLOV_ITRACE
INNER_ISTOP, also accepted as KRYLOV_ISTOP

Example:

call prec%set('INNER_SOLVE', 'CG', info, idx=1)
call prec%set('INNER_TOL', 1.0d-8, info, idx=1)
call prec%set('INNER_MAXIT', 50, info, idx=1)

call prec%set('INNER_SOLVE', 'BICGSTAB', info, idx=2)
call prec%set('INNER_TOL', 1.0d-6, info, idx=2)
call prec%set('INNER_MAXIT', 30, info, idx=2)

When a field has inner Krylov enabled, psb_d_nested_field_solve dispatches to:

psb_d_nested_inner_cg
psb_d_nested_inner_bicgstab

Those routines apply the field diagonal matrix through psb_d_nested_field_matvec and use the field block preconditioner as the inner preconditioner. When inner Krylov is disabled, psb_d_nested_field_solve directly applies the field block preconditioner.

The implementation does not call the top-level psb_krylov routine for inner solves. That choice avoids a build dependency cycle: the Krylov solvers live above the preconditioner layer, while prec must compile without depending on linsolve. The result is a PSBLAS-local inner Krylov implementation that provides independent per-field solve contexts without conflicting with the existing framework.

Apply Phase

The apply phase is implemented by:

psb_d_nested_apply_vect for PSBLAS vector objects
psb_d_nested_apply for raw double-precision arrays

The vector-object path copies into raw array buffers, calls the array path, and copies back. The array path handles:

optional transpose rejection
alpha/beta scaling
work-vector management
dispatch to the selected composition

The selected composition is then applied by one of:

psb_d_nested_add_apply
psb_d_nested_sweep
psb_d_nested_apply_schur

All paths eventually solve field systems through psb_d_nested_field_solve, so per-field block-preconditioner settings and per-field inner Krylov settings are shared by additive, multiplicative, symmetric, and Schur-style compositions.

Additive Apply Path

psb_d_nested_add_apply performs the following for each field:

restrict the global residual to the field vector
solve the field problem through psb_d_nested_field_solve
prolong the field correction into the global output vector

Because the additive path does not use off-diagonal blocks, all field solves are independent at the mathematical level.

Multiplicative Apply Path

psb_d_nested_sweep implements both forward and backward field sweeps. For each field:

restrict the original residual to the current field
subtract the effect of already computed field corrections through off-diagonal block products
solve the corrected field residual
prolong the correction into global work storage

The same routine is used for:

MULTIPLICATIVE: one forward sweep
SYMMETRIC_MULTIPLICATIVE: a forward sweep followed by a backward sweep

The sweep implementation uses field descriptors and global work vectors so that off-diagonal products see the correct local and halo data.

Schur Apply Path

psb_d_nested_apply_schur is defined for exactly two fields. It implements lower, upper, and full Schur-style block factorizations using field solves and block products.

Supporting routines are:

psb_d_nested_schur_action
psb_d_nested_schur_solve

psb_d_nested_schur_action applies the matrix-free Schur operator:

S x_2 = A_22 x_2 - A_21 B_1 A_12 x_2

where B_1 is the field-1 block solve or inner field solve. psb_d_nested_schur_solve either applies the field-2 block solve directly (A22) or runs a short preconditioned Richardson iteration using the matrix-free Schur action (MATRIX_FREE).

Public Options

The nested preconditioner exposes integer option keys in psb_d_nestedprec:

integer(psb_ipk_), parameter :: psb_d_nested_composition_ = 9101
integer(psb_ipk_), parameter :: psb_d_nested_block_solve_ = 9102
integer(psb_ipk_), parameter :: psb_d_nested_schur_solve_ = 9103
integer(psb_ipk_), parameter :: psb_d_nested_schur_maxit_ = 9104
integer(psb_ipk_), parameter :: psb_d_nested_schur_tol_ = 9105
integer(psb_ipk_), parameter :: psb_d_nested_inner_solve_ = 9106
integer(psb_ipk_), parameter :: psb_d_nested_inner_maxit_ = 9107
integer(psb_ipk_), parameter :: psb_d_nested_inner_tol_ = 9108
integer(psb_ipk_), parameter :: psb_d_nested_inner_itrace_ = 9109
integer(psb_ipk_), parameter :: psb_d_nested_inner_istop_ = 9110

The user-facing string options routed by psb_d_prec_type_impl.f90 are:

Option	Type	Values	Scope
`COMPOSITION`, `NEST_COMPOSITION`	character	`ADDITIVE`, `DIAGONAL`, `MULTIPLICATIVE`, `SYMMETRIC_MULTIPLICATIVE`, `SCHUR_LOWER`, `SCHUR_UPPER`, `SCHUR_FULL`	global
`BLOCK_SOLVE`, `NEST_BLOCK_SOLVE`	character	`DIAG`, `BJAC`, `NONE`	global or `idx` field
`SCHUR_SOLVE`, `NEST_SCHUR_SOLVE`	character	`A22`, `MATRIX_FREE`	global
`SCHUR_MAXIT`, `NEST_SCHUR_MAXIT`	integer	nonnegative iteration count	global
`SCHUR_TOL`, `NEST_SCHUR_TOL`	real	nonnegative tolerance	global
`INNER_SOLVE`, `KRYLOV_SOLVE`, `FIELD_SOLVE`	character	`NONE`, `CG`, `BICGSTAB`	global or `idx` field
`INNER_MAXIT`, `INNER_ITMAX`, `KRYLOV_MAXIT`, `KRYLOV_ITMAX`	integer	positive iteration count	global or `idx` field
`INNER_TOL`, `KRYLOV_TOL`	real	nonnegative tolerance	global or `idx` field
`INNER_ITRACE`, `KRYLOV_ITRACE`	integer	trace level	global or `idx` field
`INNER_ISTOP`, `KRYLOV_ISTOP`	integer	stopping rule code	global or `idx` field
`SUB_SOLVE`	character	routed field sub-preconditioner selection	global or `idx` field
`SUB_FILLIN`	integer	routed ILU fill setting	global or `idx` field
`SUB_ILUTHRS`	real	routed ILUT threshold	global or `idx` field
`INV_FILLIN`	integer	routed inverse fill setting	global or `idx` field
`INV_THRESH`	real	routed inverse threshold	global or `idx` field

Example Usage

A typical nested preconditioner setup looks like:

call prec%init('NEST', info)
call prec%set('COMPOSITION', 'SCHUR_FULL', info)
call prec%set('SCHUR_SOLVE', 'MATRIX_FREE', info)
call prec%set('SCHUR_MAXIT', 6, info)
call prec%set('SCHUR_TOL', 1.0d-8, info)

call prec%set('BLOCK_SOLVE', 'BJAC', info, idx=1)
call prec%set('BLOCK_SOLVE', 'DIAG', info, idx=2)

call prec%set('INNER_SOLVE', 'CG', info, idx=1)
call prec%set('INNER_MAXIT', 40, info, idx=1)
call prec%set('INNER_TOL', 1.0d-8, info, idx=1)

call prec%build(nested_matrix%a_glob, nested_matrix%desc_glob, info)
call psb_krylov('BICGSTAB', nested_matrix%a_glob, prec, b, x, eps, &
     & nested_matrix%desc_glob, info)

This keeps the outer solve in the existing PSBLAS API while allowing field-specific behavior inside the nested preconditioner.

Factory And API Integration

psb_dprecinit recognizes the nested preconditioner name:

call prec%init('NEST', info)

and allocates:

psb_d_nested_prec_type

The generic prec%set wrappers in prec/impl/psb_d_prec_type_impl.f90 route nested-specific options only when the underlying concrete preconditioner is psb_d_nested_prec_type. For non-nested preconditioners, the existing behavior is preserved where appropriate.

This means nested-specific settings do not change the behavior of ordinary PSBLAS preconditioners.

Parallel Layout Considerations

Parallel correctness depends on using PSBLAS descriptors and nested helper routines consistently.

Each field has:

its own descriptor
local owned rows
possible halo entries needed by block products
mappings between field-local vectors and global nested vectors

The preconditioner uses:

psb_d_nest_restrict_field to extract field data from global vectors
psb_d_nest_restrict_field_local where only local field data are needed
psb_d_nest_prolong_field to insert field corrections into global vectors
psb_d_nest_apply_block for diagonal and off-diagonal block products

Multiplicative and Schur compositions require more care than additive composition because they use off-diagonal products. Intermediate field corrections are prolonged into global work storage before being consumed by later block products. This ensures that distributed halo data are consistent with the nested matrix layout.

Validation

The repository contains nested preconditioner tests under:

test/nested

The main nested preconditioner drivers are:

test/nested/psb_d_nest_prec_cg_test.F90
test/nested/psb_d_nest_mult_prec_cg_test.F90
test/nested/psb_d_nest_schur_prec_cg_test.F90

These tests exercise:

nested matrix construction
nested preconditioner initialization
additive/diagonal composition
multiplicative and symmetric multiplicative composition
Schur lower, upper, and full composition
matrix-free Schur solve mode
per-field inner Krylov configuration and apply through the Schur preconditioner
serial and MPI execution paths in the nested examples

Validation of the nested preconditioner examples showed convergence for the additive, multiplicative, and Schur-style configurations on the nested Laplacian test problems.

The Schur nested preconditioner test includes an additional SCHUR_FULL / A22+INNER case. That case configures independent per-field inner Krylov contexts, builds the nested preconditioner, applies it directly to the right-hand side, and verifies a successful finite preconditioner application. The existing Schur cases continue to check outer BiCGSTAB convergence.

The serial and two-rank runs both passed after rebuilding the nested preconditioner object, the factory object, the preconditioner archive, and the test executable.

Current Limitations

The current implementation has the following limitations:

double precision only
nested preconditioner name currently exposed as NEST
transposed preconditioner application is rejected
Schur-style compositions are implemented for exactly two fields
matrix-free Schur solve currently uses a small Richardson iteration
inner Krylov methods currently include CG and BICGSTAB

Future Work

Recommended next steps are:

add single, complex, and complex double precision variants

Summary

The nested preconditioner adds a field-split preconditioning layer to PSBLAS while preserving the existing solver and preconditioner API. It reuses standard PSBLAS block preconditioners on diagonal field blocks and combines them through additive, multiplicative, symmetric multiplicative, and Schur-style compositions.

The implementation integrates with:

nested matrix storage and field mapping
the psb_dprec_type factory
the generic prec%set configuration path
the ordinary PSBLAS preconditioner build/apply lifecycle
the existing outer psb_krylov solver interface

The newer per-field inner Krylov support gives each field an independent local solve context, configured through the same PSBLAS preconditioner option API. It does not conflict with the existing framework because it is implemented inside the preconditioner layer and avoids adding a dependency from prec back to the higher-level Krylov solver layer.

34 KiB Raw Blame History

Nested Preconditioner Report

Executive Summary

PSBLAS Background

Motivation

Nested Matrix Infrastructure

Design Goals

Mathematical Model

Additive And Diagonal Composition

Multiplicative Composition

Symmetric Multiplicative Composition

Schur-Style Composition

Lower Schur Variant

Upper Schur Variant

Full Schur Variant

Matrix-Free Schur Action

Richardson Schur Solve

Implementation Structure

Build Phase

Field Block Preconditioners

Independent Per-Field Inner Krylov Contexts

Apply Phase

Additive Apply Path

Multiplicative Apply Path

Schur Apply Path

Public Options

Example Usage

Factory And API Integration

Parallel Layout Considerations

Validation

Current Limitations

Future Work

Summary

34 KiB

Raw Blame History