Complete the integration of the nested (MATNEST) operator into the standard
PSBLAS infrastructure:
- Preconditioners: implement get_diag and csgetrow on psb_d_nest_base_mat so
the stock one-level preconditioners build directly on the nested operator
(DIAG through the concatenated block diagonals, BJAC through the
format-agnostic csget path used by the ILU factorizations).
- Configurable block storage: psb_d_nest_rect_block and psb_d_nest_matrix%asb
accept an optional type ('CSR' default, 'CSC', 'COO') or mold (any class
extending psb_d_base_sparse_mat, e.g. the psb_ext ELL/HLL formats); the
operator is format-agnostic since every operation delegates to the blocks.
- Device-capable matvec: override vect_mv to gather/scatter through the
vectors' own gth/sct with encapsulated index vectors (device kernels on
device vectors) and to run each block through its vect_mv, so device block
formats execute their native kernels; bit-equivalent to csmv on host.
- Full psb_d_base_sparse_mat contract by delegation to the blocks: transposed
csmv (dedicated kernel, ghost contributions left to the transposed halo
exchange), multi-RHS csmm, cp_to_coo/mv_to_coo (unlocking cscnv, csclip,
tril/triu through the base generics), rowsum/arwsum/colsum/aclsum,
maxval/spnmi/spnm1, scal (left/right) and scals, clone (view semantics:
shared blocks, re-owned index maps), mold, sizeof. cp_from_coo/mv_from_coo,
csput and cssv/cssm are intentionally left to the base error (meaningless
for a block-operator view), documented in the type and in the README.
Tests: glob assembles the blocks in HLL (psb_ext) and rect in CSC, both still
bit-identical to the monolithic CSR oracle; the CG test solves under NONE,
DIAG and BJAC/ILU(0), requiring convergence to the exact solution for all of
them and DIAG bit-identical to NONE (exactness check of the nested get_diag).
README updated with the user API reference, the preconditioner section and
the implemented-contract section.
Author: Simone Staccone (Stack-1)
* **Field** — a contiguous index space (e.g. velocity `V` and pressure `Q` in a saddle-point problem). Each field has its own `psb_desc_type` distribution.
* **Block (i,j)** — the sub-matrix coupling field `i` (rows) with field `j` (columns). It may be rectangular (`|field i| /= |field j|`) and may be absent.
* **Block (i,j)** — the sub-matrix coupling field `i` (rows) with field `j` (columns). It may be rectangular (different field sizes) and may be absent.
* **Global operator** — the blocks are concatenated into a single **square** operator `M` of size `sum(field_sizes)`, distributed over one **composed global descriptor** with a **union halo** (one halo exchange per matrix-vector product, covering all blocks of a given column field at once).
* **Rectangular blocks** — PSBLAS does not support rectangular *distributed* matrices, but it does support rectangular *local* CSR/COO matrices. The rectangular product therefore happens only in the **local** block `csmv`; the only object carrying a descriptor (and hence communication) is the global operator, which is always square.
The global operator (`a_glob`) and global descriptor (`desc_glob`) can be passed unchanged to `psb_spmm`, `psb_krylov`, and the standard preconditioners.
## 2. Recommended API: `psb_d_nest_matrix`
## 2. Quick start: `psb_d_nest_matrix`
The easy way to build a nested matrix is the `psb_d_nest_matrix` type (module `psb_d_nest_builder_mod`, re-exported by the umbrella `psb_d_nest_mod`), which follows the usual PSBLAS `init` / `ins` / `asb` pattern and hides all the descriptor / halo / compose / setup boilerplate:
@ -47,8 +47,6 @@ integer(psb_lpk_) :: n1, n2
call nested_matrix%init(ctxt, [n1, n2], info)
! 2) insert the block values, owned rows only (PSBLAS convention).
* To know which rows it owns in a field, a process can query the per-field descriptor exposed as `nested_matrix%field_desc(i)` (e.g. `nested_matrix%field_desc(1)%get_local_rows()` and `%l2g(...)`), exactly as it would with a plain `psb_cdall` descriptor.
* Off-diagonal blocks may be rectangular: the cross-field column indices are registered into the union halo automatically by `ins`.
* The CG solver requires an SPD operator; a genuine saddle-point operator is indefinite and needs MINRES/GMRES (plus, eventually, a block preconditioner).
* **Do not copy/move** a `psb_d_nest_matrix` after `asb`: the wrapped operator holds internal pointers into the object.
## 3. User API reference
All of the public API is available through the umbrella module:
## 3. Low-level path (advanced)
```fortran
use psb_d_nest_mod
```
### 3.1 `type(psb_d_nest_matrix)` — the nested matrix (recommended)
| Member | Meaning |
|--------|---------|
| `a_glob` | `type(psb_dspmat_type)` — the assembled global operator; pass it to `psb_spmm`, `psb_krylov`, `prec%build` |
| `desc_glob` | `type(psb_desc_type)` — the composed global descriptor; pass it wherever a descriptor is expected |
| `field_desc(i)` | `type(psb_desc_type)` — the descriptor of field `i` (query `%get_local_rows()`, `%l2g(...)` to find the rows owned by this process) |
| `n_fields` | number of fields |
Methods (collective over the communicator unless noted):
Assemble: builds the per-field halos, the (possibly rectangular) local blocks,
the composed global descriptor `desc_glob` and the global operator `a_glob`.
After `asb` no further `ins` is allowed, and the object must not be
copied/moved (the operator holds internal pointers into it).
The optional arguments select the **storage format of the blocks**:
| Argument | Type | Meaning |
|----------|------|---------|
| `type` | `character(len=*)` | a base format name: `'CSR'` (default), `'CSC'`, `'COO'` |
| `mold` | `class(psb_d_base_sparse_mat)` | any format class, e.g. `psb_d_ell_sparse_mat` / `psb_d_hll_sparse_mat` from `psb_ext` |
The nested operator is format-agnostic: every operation delegates to the
blocks' own methods, so each block runs its native kernels.
#### `call nested_matrix%free(info)`
Release every internal object (blocks, descriptors, global operator).
### 3.2 Solvers and preconditioners
`a_glob` / `desc_glob` work with the standard PSBLAS infrastructure:
* **Krylov methods** — `psb_krylov('CG' | 'BICGSTAB' | 'GMRES' | ..., nested_matrix%a_glob, prec, b, x, eps, nested_matrix%desc_glob, info, ...)`. Remember that CG requires an SPD operator; a genuine saddle-point operator is indefinite and needs MINRES/GMRES.
* **Preconditioners** — all the stock PSBLAS one-level preconditioners can be built directly on the nested operator:
* `'NONE'` — identity;
* `'DIAG'` / `'JACOBI'` — diagonal scaling (served by the nested `get_diag`, which concatenates the diagonals of the diagonal blocks; absent blocks contribute zeros);
* `'BJAC'` — block Jacobi with ILU factorization of the local rows (served by the nested `csgetrow`, which extracts the local rows of the global operator across all blocks).
* **Mutation/bookkeeping** — `scal` (left/right) and `scals` (the operator is a
view: scaling acts on the blocks), `clone` (shares the blocks, re-owns the
private index maps), `mold`, `sizeof`, `free`, `get_nzeros`, `get_fmt`.
Intentionally **not** implemented (they fail with the standard "missing
override" error): `cp_from_coo`/`mv_from_coo` (a nested operator cannot be
built from a flat matrix without the field structure), `csput` (insertions go
to the blocks before assembly), `cssv`/`cssm` (a triangular solve is undefined
for a block operator).
### 3.4 Low-level API (advanced)
`psb_d_nest_matrix` is built on three lower-level pieces, available directly for advanced use (see `psb_d_nest_cg_test.F90` for an end-to-end example):
`psb_d_nest_matrix` is built on lower-level pieces, available directly (see `psb_d_nest_cg_test.F90` for an end-to-end example):
* `psb_cd_nest_compose(grid_desc, desc_glob, info)` — compose the per-field descriptors into the single global descriptor with the union halo.
* `psb_d_nest_base_setup(nest_op, block_storage, grid_desc, desc_glob, info)` — set up the `psb_d_nest_base_mat` operator (implements the local `csmv`).
* `psb_d_nest_base_setup(nest_op, block_storage, grid_desc, desc_glob, info)` — set up the `psb_d_nest_base_mat` operator (implements the local `csmv`, `get_diag`, `csgetrow`).
* `psb_d_nest_rect_block(blk, nz, ia, ja, val, desc_row, desc_col, info)` — build a single (possibly rectangular) local block from global triplets, with rows localized against `desc_row` and columns against `desc_col`.
A field-split interface (`psb_d_nest_get_block`, `psb_d_nest_get_field_desc`,
`psb_d_nest_apply_block`) is exposed on `psb_d_nest_base_mat` as the hook for a future block (field-split / Schur) preconditioner.
A field-split interface (`psb_d_nest_get_block`, `psb_d_nest_get_field_desc`, `psb_d_nest_restrict_field`, `psb_d_nest_prolong_field`, `psb_d_nest_apply_block`) is exposed on `psb_d_nest_base_mat` as the hook for a future block (field-split / Schur) preconditioner.
## 4. Tests
@ -93,8 +192,8 @@ A field-split interface (`psb_d_nest_get_block`, `psb_d_nest_get_field_desc`,
| Test | What it checks |
|------------------------------|----------------|
| `psb_d_nest_glob_test` | Square 2×2 operator built with `psb_d_nest_matrix`; the nested `psb_spmm` is compared bit-for-bit against the same matrix assembled monolithically in CSR. |
| `psb_d_nest_rect_test` | Same, with fields of different size (`|V| = 2|Q|`) and genuinely **rectangular** off-diagonal blocks. |
| `psb_d_nest_cg_test` | Standard PSBLAS **CG** on an SPD, ill-conditioned operator (1D Laplacian reordered red-black), built on the **low-level path**; the solution is recovered to machine precision over hundreds of matvecs. |
| `psb_d_nest_rect_test` | Same, with fields of different size (`nV = 2 nQ`) and genuinely **rectangular** off-diagonal blocks. |
| `psb_d_nest_cg_test` | Standard PSBLAS **CG** on an SPD, ill-conditioned operator (1D Laplacian reordered red-black), built on the **low-level path**, solved under every stock preconditioner (`NONE`, `DIAG`, `BJAC`/ILU(0)); requires convergence to machine precision for all of them, and that `DIAG` reproduces the `NONE` iteration count exactly (a bit-precise check of the nested `get_diag`, since the diagonal is the constant `2I`). |
| `psb_d_nest_builder_test` | Same CG solve as above but built through the `psb_d_nest_matrix` utility (high-level path). |
All tests run both serially and in parallel, and the result is invariant with respect to the number of MPI processes.