|
|
# Nested (block-structured / MATNEST) matrices in PSBLAS
|
|
|
|
|
|
Author: Simone Staccone (Stack-1)
|
|
|
|
|
|
This directory contains the tests for the **nested matrix** support added to PSBLAS: a block-structured distributed operator
|
|
|
|
|
|
```
|
|
|
[ A11 A12 ... ]
|
|
|
M = [ A21 A22 ... ]
|
|
|
[ ... ... ]
|
|
|
```
|
|
|
|
|
|
whose blocks are kept as separate sparse matrices (one per field) but which presents itself to Krylov solvers and preconditioners as a **single ordinary distributed matrix**. It is the PSBLAS analogue of PETSc's `MATNEST`.
|
|
|
|
|
|
The motivating case is the **saddle-point** system
|
|
|
|
|
|
```
|
|
|
M = [ A B^T ]
|
|
|
[ B 0 ]
|
|
|
```
|
|
|
|
|
|
(symmetric indefinite, with the (2,2) block absent), but the implementation supports any square multi-field block operator with possibly **rectangular**
|
|
|
sub-blocks.
|
|
|
|
|
|
|
|
|
## 1. Concepts
|
|
|
|
|
|
* **Field** — a contiguous index space (e.g. velocity `V` and pressure `Q` in a saddle-point problem). Each field has its own `psb_desc_type` distribution.
|
|
|
* **Block (i,j)** — the sub-matrix coupling field `i` (rows) with field `j` (columns). It may be rectangular (different field sizes) and may be absent.
|
|
|
* **Global operator** — the blocks are concatenated into a single **square** operator `M` of size `sum(field_sizes)`, distributed over one **composed global descriptor** with a **union halo** (one halo exchange per matrix-vector product, covering all blocks of a given column field at once).
|
|
|
* **Rectangular blocks** — PSBLAS does not support rectangular *distributed* matrices, but it does support rectangular *local* CSR/COO matrices. The rectangular product therefore happens only in the **local** block `csmv`; the only object carrying a descriptor (and hence communication) is the global operator, which is always square.
|
|
|
|
|
|
The global operator (`a_glob`) and global descriptor (`desc_glob`) can be passed unchanged to `psb_spmm`, `psb_krylov`, and the standard preconditioners.
|
|
|
|
|
|
|
|
|
## 2. Quick start: `psb_d_nest_matrix`
|
|
|
|
|
|
The easy way to build a nested matrix is the `psb_d_nest_matrix` type (module `psb_d_nest_builder_mod`, re-exported by the umbrella `psb_d_nest_mod`), which follows the usual PSBLAS `init` / `ins` / `asb` pattern and hides all the descriptor / halo / compose / setup boilerplate:
|
|
|
|
|
|
```fortran
|
|
|
use psb_d_nest_mod
|
|
|
|
|
|
type(psb_d_nest_matrix) :: nested_matrix
|
|
|
integer(psb_lpk_) :: n1, n2
|
|
|
|
|
|
! 1) declare the field structure: 2 fields of global size n1, n2
|
|
|
call nested_matrix%init(ctxt, [n1, n2], info)
|
|
|
|
|
|
! 2) insert the block values, owned rows only (PSBLAS convention).
|
|
|
call nested_matrix%ins(1, 1, nz_A, iaA, jaA, valA, info) ! A = block (1,1)
|
|
|
call nested_matrix%ins(1, 2, nz_Bt, iaBt, jaBt, valBt, info) ! B^T = block (1,2)
|
|
|
call nested_matrix%ins(2, 1, nz_B, iaB, jaB, valB, info) ! B = block (2,1)
|
|
|
! (the (2,2) block is simply not inserted)
|
|
|
|
|
|
! 3) assemble: builds nested_matrix%a_glob and nested_matrix%desc_glob
|
|
|
call nested_matrix%asb(info)
|
|
|
|
|
|
! 4) from here on it is an ordinary distributed matrix/descriptor
|
|
|
call psb_geall(x, nested_matrix%desc_glob, info)
|
|
|
...
|
|
|
call prec%init(ctxt, 'BJAC', info)
|
|
|
call prec%build(nested_matrix%a_glob, nested_matrix%desc_glob, info)
|
|
|
call psb_krylov('CG', nested_matrix%a_glob, prec, b, x, eps, &
|
|
|
& nested_matrix%desc_glob, info, itmax=..., iter=..., err=...)
|
|
|
|
|
|
! 5) release
|
|
|
call nested_matrix%free(info)
|
|
|
```
|
|
|
|
|
|
|
|
|
## 3. User API reference
|
|
|
|
|
|
All of the public API is available through the umbrella module:
|
|
|
|
|
|
```fortran
|
|
|
use psb_d_nest_mod
|
|
|
```
|
|
|
|
|
|
### 3.1 `type(psb_d_nest_matrix)` — the nested matrix (recommended)
|
|
|
|
|
|
| Member | Meaning |
|
|
|
|--------|---------|
|
|
|
| `a_glob` | `type(psb_dspmat_type)` — the assembled global operator; pass it to `psb_spmm`, `psb_krylov`, `prec%build` |
|
|
|
| `desc_glob` | `type(psb_desc_type)` — the composed global descriptor; pass it wherever a descriptor is expected |
|
|
|
| `field_desc(i)` | `type(psb_desc_type)` — the descriptor of field `i` (advanced use; for the common queries see `get_owned_rows` below) |
|
|
|
| `n_fields` | number of fields |
|
|
|
|
|
|
To know which rows it must insert, a process asks the matrix directly — no
|
|
|
descriptor jargon needed:
|
|
|
|
|
|
```fortran
|
|
|
integer(psb_lpk_), allocatable :: my_rows(:)
|
|
|
|
|
|
my_rows = nested_matrix%get_owned_rows(1) ! global rows of field 1 owned here
|
|
|
do k = 1, size(my_rows)
|
|
|
global_row = my_rows(k)
|
|
|
... ! build the entries of this row
|
|
|
end do
|
|
|
```
|
|
|
|
|
|
| Query | Result |
|
|
|
|-------|--------|
|
|
|
| `nested_matrix%get_owned_rows(i_field)` | `integer(psb_lpk_), allocatable (:)` — the GLOBAL indices (in the field index space, 1..field size) of the rows of field `i_field` owned by this process |
|
|
|
| `nested_matrix%get_owned_row_count(i_field)` | `integer(psb_ipk_)` — how many rows of field `i_field` this process owns |
|
|
|
|
|
|
Methods (collective over the communicator unless noted):
|
|
|
|
|
|
#### `call nested_matrix%init(ctxt, field_sizes, info)`
|
|
|
|
|
|
Create the field structure. One descriptor per field is created with a block
|
|
|
row distribution; the total size is independent of the number of processes.
|
|
|
|
|
|
| Argument | Type | Intent | Meaning |
|
|
|
|----------|------|--------|---------|
|
|
|
| `ctxt` | `type(psb_ctxt_type)` | in | parallel context from `psb_init` |
|
|
|
| `field_sizes(:)` | `integer(psb_lpk_)` | in | global size of each field, e.g. `[n1, n2]` |
|
|
|
| `info` | `integer(psb_ipk_)` | out | return code, `psb_success_` on success |
|
|
|
|
|
|
#### `call nested_matrix%ins(block_row, block_col, n_entries, entry_rows, entry_cols, entry_vals, info)`
|
|
|
|
|
|
Insert a batch of coefficients into block `(block_row, block_col)`. May be
|
|
|
called any number of times per block, in any order, before `asb`. Each process
|
|
|
inserts only the rows it owns (PSBLAS convention); cross-field columns are
|
|
|
registered into the union halo automatically.
|
|
|
|
|
|
| Argument | Type | Intent | Meaning |
|
|
|
|----------|------|--------|---------|
|
|
|
| `block_row` | `integer(psb_ipk_)` | in | row-field index of the block (1..n_fields) |
|
|
|
| `block_col` | `integer(psb_ipk_)` | in | column-field index of the block (1..n_fields) |
|
|
|
| `n_entries` | `integer(psb_ipk_)` | in | number of triplets in this batch |
|
|
|
| `entry_rows(:)` | `integer(psb_lpk_)` | in | GLOBAL row indices in field `block_row` (1..field size) |
|
|
|
| `entry_cols(:)` | `integer(psb_lpk_)` | in | GLOBAL column indices in field `block_col` (1..field size) |
|
|
|
| `entry_vals(:)` | `real(psb_dpk_)` | in | coefficient values |
|
|
|
| `info` | `integer(psb_ipk_)` | out | return code |
|
|
|
|
|
|
#### `call nested_matrix%asb(info [, type] [, mold])`
|
|
|
|
|
|
Assemble: builds the per-field halos, the (possibly rectangular) local blocks,
|
|
|
the composed global descriptor `desc_glob` and the global operator `a_glob`.
|
|
|
After `asb` no further `ins` is allowed, and the object must not be
|
|
|
copied/moved (the operator holds internal pointers into it).
|
|
|
|
|
|
The optional arguments select the **storage format of the blocks**:
|
|
|
|
|
|
| Argument | Type | Meaning |
|
|
|
|----------|------|---------|
|
|
|
| `type` | `character(len=*)` | a base format name: `'CSR'` (default), `'CSC'`, `'COO'` |
|
|
|
| `mold` | `class(psb_d_base_sparse_mat)` | any format class, e.g. `psb_d_ell_sparse_mat` / `psb_d_hll_sparse_mat` from `psb_ext` |
|
|
|
|
|
|
The nested operator is format-agnostic: every operation delegates to the
|
|
|
blocks' own methods, so each block runs its native kernels.
|
|
|
|
|
|
#### `call nested_matrix%free(info)`
|
|
|
|
|
|
Release every internal object (blocks, descriptors, global operator).
|
|
|
|
|
|
### 3.2 Solvers and preconditioners
|
|
|
|
|
|
`a_glob` / `desc_glob` work with the standard PSBLAS infrastructure:
|
|
|
|
|
|
* **Krylov methods** — `psb_krylov('CG' | 'BICGSTAB' | 'GMRES' | ..., nested_matrix%a_glob, prec, b, x, eps, nested_matrix%desc_glob, info, ...)`. Remember that CG requires an SPD operator; a genuine saddle-point operator is indefinite and needs MINRES/GMRES.
|
|
|
* **Preconditioners** — all the stock PSBLAS one-level preconditioners can be built directly on the nested operator:
|
|
|
* `'NONE'` — identity;
|
|
|
* `'DIAG'` / `'JACOBI'` — diagonal scaling (served by the nested `get_diag`, which concatenates the diagonals of the diagonal blocks; absent blocks contribute zeros);
|
|
|
* `'BJAC'` — block Jacobi with ILU factorization of the local rows (served by the nested `csgetrow`, which extracts the local rows of the global operator across all blocks).
|
|
|
|
|
|
```fortran
|
|
|
call prec%init(ctxt, 'BJAC', info)
|
|
|
call prec%build(nested_matrix%a_glob, nested_matrix%desc_glob, info)
|
|
|
```
|
|
|
|
|
|
### 3.3 Implemented base-class contract
|
|
|
|
|
|
The nested operator (`psb_d_nest_base_mat`) implements the standard
|
|
|
`psb_d_base_sparse_mat` contract by delegation to the blocks, so it can be used
|
|
|
wherever an assembled PSBLAS matrix is expected:
|
|
|
|
|
|
* **Products** — `csmv` (also transposed, `trans='T'`), `csmm` (multi-RHS),
|
|
|
`vect_mv` (encapsulated vectors: gathers/scatters through the vectors' own
|
|
|
`gth`/`sct` and runs each block through its `vect_mv`, so device block
|
|
|
formats execute their device kernels).
|
|
|
* **Access/conversions** — `get_diag`, `csgetrow` (and `csget`/`csgetblk`
|
|
|
through the base generics), `cp_to_coo`/`mv_to_coo` (and `cscnv`, `csclip`,
|
|
|
`tril`/`triu`, ... through the base generics built on the COO route).
|
|
|
* **Reductions** — `rowsum`/`arwsum`, `colsum`/`aclsum`, `maxval`,
|
|
|
`spnmi` (infinity norm), `spnm1` (1-norm).
|
|
|
* **Mutation/bookkeeping** — `scal` (left/right) and `scals` (the operator is a
|
|
|
view: scaling acts on the blocks), `clone` (shares the blocks, re-owns the
|
|
|
private index maps), `mold`, `sizeof`, `free`, `get_nzeros`, `get_fmt`.
|
|
|
|
|
|
Intentionally **not** implemented (they fail with the standard "missing
|
|
|
override" error): `cp_from_coo`/`mv_from_coo` (a nested operator cannot be
|
|
|
built from a flat matrix without the field structure), `csput` (insertions go
|
|
|
to the blocks before assembly), `cssv`/`cssm` (a triangular solve is undefined
|
|
|
for a block operator).
|
|
|
|
|
|
### 3.4 Low-level API (advanced)
|
|
|
|
|
|
`psb_d_nest_matrix` is built on lower-level pieces, available directly (see `psb_d_nest_cg_test.F90` for an end-to-end example):
|
|
|
|
|
|
* `psb_cd_nest_compose(grid_desc, desc_glob, info)` — compose the per-field descriptors into the single global descriptor with the union halo.
|
|
|
* `psb_d_nest_base_setup(nest_op, block_storage, grid_desc, desc_glob, info)` — set up the `psb_d_nest_base_mat` operator (implements the local `csmv`, `get_diag`, `csgetrow`).
|
|
|
* `psb_d_nest_rect_block(blk, nz, ia, ja, val, desc_row, desc_col, info)` — build a single (possibly rectangular) local block from global triplets, with rows localized against `desc_row` and columns against `desc_col`.
|
|
|
|
|
|
A field-split interface (`psb_d_nest_get_block`, `psb_d_nest_get_field_desc`, `psb_d_nest_restrict_field`, `psb_d_nest_prolong_field`, `psb_d_nest_apply_block`) is exposed on `psb_d_nest_base_mat` as the hook for a future block (field-split / Schur) preconditioner.
|
|
|
|
|
|
|
|
|
## 4. Tests
|
|
|
|
|
|
| Test | What it checks |
|
|
|
|------------------------------|----------------|
|
|
|
| `psb_d_nest_glob_test` | Square 2×2 operator built with `psb_d_nest_matrix`; the nested `psb_spmm` is compared bit-for-bit against the same matrix assembled monolithically in CSR. |
|
|
|
| `psb_d_nest_rect_test` | Same, with fields of different size (`nV = 2 nQ`) and genuinely **rectangular** off-diagonal blocks. |
|
|
|
| `psb_d_nest_cg_test` | Standard PSBLAS **CG** on an SPD, ill-conditioned operator (1D Laplacian reordered red-black), built on the **low-level path**, solved under every stock preconditioner (`NONE`, `DIAG`, `BJAC`/ILU(0)); requires convergence to machine precision for all of them, and that `DIAG` reproduces the `NONE` iteration count exactly (a bit-precise check of the nested `get_diag`, since the diagonal is the constant `2I`). |
|
|
|
| `psb_d_nest_builder_test` | Same CG solve as above but built through the `psb_d_nest_matrix` utility (high-level path). |
|
|
|
|
|
|
All tests run both serially and in parallel, and the result is invariant with respect to the number of MPI processes.
|
|
|
|
|
|
### Build and run
|
|
|
|
|
|
The PSBLAS library must be built/installed first (from the repository root):
|
|
|
|
|
|
```sh
|
|
|
make # or the CMake build
|
|
|
```
|
|
|
|
|
|
Then, from this directory:
|
|
|
|
|
|
```sh
|
|
|
make # builds the executables into ./runs
|
|
|
./runs/psb_d_nest_glob_test # serial
|
|
|
mpirun -np 4 ./runs/psb_d_nest_rect_test
|
|
|
mpirun -np 4 ./runs/psb_d_nest_cg_test
|
|
|
mpirun -np 4 ./runs/psb_d_nest_builder_test
|
|
|
```
|
|
|
|
|
|
Each test prints a single `[PASS]` / `[FAIL]` line (printed by rank 0).
|
|
|
|
|
|
|
|
|
## 5. Source files
|
|
|
|
|
|
Library (under `base/modules/`):
|
|
|
|
|
|
* `desc/psb_desc_nest_mod.f90` — `psb_desc_nest_type` (grid of per-field descriptors)
|
|
|
* `serial/psb_d_nest_mat_mod.f90` — `psb_d_nest_sparse_mat` (block storage)
|
|
|
* `serial/psb_d_nest_base_mat_mod.F90`— `psb_d_nest_base_mat` (the MATNEST operator: `csmv`, `get_diag`, `csgetrow`)
|
|
|
* `tools/psb_cd_nest_tools_mod.F90` — descriptor tools (`psb_cd_nest_compose`, ...)
|
|
|
* `tools/psb_d_nest_tools_mod.F90` — block tools (`psb_d_nest_rect_block`, ...)
|
|
|
* `tools/psb_d_nest_builder_mod.F90` — `psb_d_nest_matrix` frontend (init/ins/asb)
|
|
|
* `psb_d_nest_mod.f90` — umbrella module (`use psb_d_nest_mod`)
|