You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Complete the integration of the nested (MATNEST) operator into the standard
PSBLAS infrastructure:
- Preconditioners: implement get_diag and csgetrow on psb_d_nest_base_mat so
the stock one-level preconditioners build directly on the nested operator
(DIAG through the concatenated block diagonals, BJAC through the
format-agnostic csget path used by the ILU factorizations).
- Configurable block storage: psb_d_nest_rect_block and psb_d_nest_matrix%asb
accept an optional type ('CSR' default, 'CSC', 'COO') or mold (any class
extending psb_d_base_sparse_mat, e.g. the psb_ext ELL/HLL formats); the
operator is format-agnostic since every operation delegates to the blocks.
- Device-capable matvec: override vect_mv to gather/scatter through the
vectors' own gth/sct with encapsulated index vectors (device kernels on
device vectors) and to run each block through its vect_mv, so device block
formats execute their native kernels; bit-equivalent to csmv on host.
- Full psb_d_base_sparse_mat contract by delegation to the blocks: transposed
csmv (dedicated kernel, ghost contributions left to the transposed halo
exchange), multi-RHS csmm, cp_to_coo/mv_to_coo (unlocking cscnv, csclip,
tril/triu through the base generics), rowsum/arwsum/colsum/aclsum,
maxval/spnmi/spnm1, scal (left/right) and scals, clone (view semantics:
shared blocks, re-owned index maps), mold, sizeof. cp_from_coo/mv_from_coo,
csput and cssv/cssm are intentionally left to the base error (meaningless
for a block-operator view), documented in the type and in the README.
Tests: glob assembles the blocks in HLL (psb_ext) and rect in CSC, both still
bit-identical to the monolithic CSR oracle; the CG test solves under NONE,
DIAG and BJAC/ILU(0), requiring convergence to the exact solution for all of
them and DIAG bit-identical to NONE (exactness check of the nested get_diag).
README updated with the user API reference, the preconditioner section and
the implemented-contract section.
Author: Simone Staccone (Stack-1)
|
2 weeks ago | |
|---|---|---|
| .. | ||
| auxil | 4 weeks ago | |
| comm | 2 weeks ago | |
| desc | 2 weeks ago | |
| penv | 4 weeks ago | |
| psblas | 2 weeks ago | |
| serial | 2 weeks ago | |
| tools | 2 weeks ago | |
| Makefile | 2 weeks ago | |
| README.F2003 | 8 years ago | |
| cutil.c | 1 year ago | |
| error.f90 | 3 months ago | |
| psb_base_mod.f90 | 3 months ago | |
| psb_cbind_const_mod.F90 | 3 months ago | |
| psb_check_mod.f90 | 3 months ago | |
| psb_config.h.in | 5 months ago | |
| psb_const_mod.F90 | 3 weeks ago | |
| psb_d_nest_mod.f90 | 2 weeks ago | |
| psb_error_impl.F90 | 1 year ago | |
| psb_error_mod.F90 | 3 months ago | |
| psb_fakempi.c | 1 year ago | |
| psb_fakempi.h | 1 year ago | |
| psb_penv_mod.F90 | 8 years ago | |
| psb_realloc_mod.F90 | 4 weeks ago | |
| psb_timers_mod.f90 | 3 months ago | |
| psb_types.h | 1 year ago | |
| psi_c_mod.F90 | 4 weeks ago | |
| psi_d_mod.F90 | 4 weeks ago | |
| psi_i2_mod.F90 | 4 weeks ago | |
| psi_i_mod.F90 | 4 weeks ago | |
| psi_l_mod.F90 | 4 weeks ago | |
| psi_mod.f90 | 3 months ago | |
| psi_s_mod.F90 | 4 weeks ago | |
| psi_z_mod.F90 | 4 weeks ago | |
README.F2003
Design principles for this directory.
1. What is a sparse matrix?
It is an object which does have some properties (number of rows,
number of columns, whether it is a triangle, and in that case
upper/lower, unit/nonunit), a state (null, build, assembled,
update), a type (real/complex, single/double), and a storage
format.
Thus we have a three-level inheritance chain:
i. The base object, defining the methods to set/query the various
properties, and allocate and free. Some of the property
getters/setters, allocate and free depend on the storage
format, so at this level they will just throw an error.
ii. The X_base_object, where X=s,d,c,z thus defining the
type. At this level we define the computational interfaces to
MV and SV, since they require the type of the vectors/scalars
involved (should also add NRMI here!!!!), but again they will
be empty shells. We also define the interface to CSPUT,
required to build the object, and TO_COO,FROM_COO (see
below).
iii. The X_YYY_object where the real implementation of the
MV/SV/NRMI/CSPUT/ALLOCATE/FREE/TO_COO/FROM_COO takes place.
2. What is a sparse matrix (take 2)?
The above structure by itself does not allow a sparse matrix to
switch among different storage formats during its life. To do this,
we define all of the above to be INNER objects, encapsulated in an
OUTER object which is what the rest of the library sees, as
follows:
type :: psbn_d_sparse_mat
class(psbn_d_base_sparse_mat), allocatable :: a
end type psbn_d_sparse_mat
type(psbn_d_sparse_mat) :: a
In this way we can have an outer object whose type is stable
both statically (at compile time) and at runtime, while at runtime
the type of the inner object switches from COO to CSR to whatever as
needed. All of the methods are simply thrown onto the corresponding
methods of the (allocatable, polymorphic) component A%A as needed
(provided the component is allocated, that is).
This is what is called a STATE design pattern (different from the
internal state we discussed above).
As an example, consider the allocate/build/assembly cycle:
the outer code would do the following:
1. Allocate(psbn_d_coo_sparse_mat :: a%a)
2. During the build loop a call to A%CSINS() gets translated into
CALL A%A%CSINS()
3. At assembly time the code would do the following
subroutine psb_spasb(a,....)
type(psbn_d_sparse_mat), intent(inout) :: a
class(psbn_d_base_sparse_mat), allocatable :: temp
select case (TYPE)
case('CSR')
allocate(psbn_d_csr_sparse_mat :: temp, stat=info)
end select
call temp%from_coo(a%a)
call a%a%free()
call move_alloc(temp,a%a)
4. Note in the above that to_coo, from_coo are defined so that every
conceivable storage representation provides just 2 conversion
routines, avoiding quadratic explosion. But since all have to
provide them, the to_coo/from_coo is defined in d_base_mat_mod
together with d_coo_sparse_mat, which enjoys the "crown prince"
status with respect to all the other types derived from
d_base_sparse_mat (its "siblings").
5. How does a user add a new storage format? Very simple. After
deriving the class and implementing all the necessary methods,
the user declares in the program a dummy variable of the new
inner type
type(X_YYY_sparse_mat) :: reftype
then calls
call psb_spasb(a,....,mold=reftype)
In psb_spasb we have
class(psbn_d_base_sparse_mat), intent(in), optional :: mold
if (present(mold)) then
allocate(temp,source=mold,stat=info)
end select
call temp%from_coo(a%a)
call a%a%free()
call move_alloc(temp,a%a)
AND IT'S DONE! Nothing else in the library requires the explicit
knowledge of type of MOLD.
3. Data precisoin (aka KIND / aka byte size)
Data precision is a bit of a thorny issue here, because it is used
by the Fortran language to disambiguate generic interfaces. This
means that we must be careful when choosing precision for data
structures. On the other hand, we want to have some freedom of
choice. The sticky point here is how to deal with integers, because
real and complex are already standardized on S/D/C/Z.
Integers are tricky because we do not want to use large integer
sizes (read: 8 bytes) unless they are really necessary; moreover,
the GPU code currently use 4 byte integers (and with good
reason). However if we want to tackle large index spaces, we will
need at some point 8-byte integers.
So, here is the plan.
A. We have two basic integer kinds, 4-byte PSB_MPK_ which takes its
name from being the kind that is going to be passed to MPI for
all arguments other than data buffers, and 8-byte PSB_EPK_, to
be used as necessary; the PSB_SIZEOF functions which return
data structure sizes (sometimes summed over all processes) are
always 8-byte.
B. At all levels where a function/subroutine is supposed to be
interfaced with an array of integers, there should be two
versions, distinguished by an M or an E in the specific name,
all adding to a generic set. This applies to the internal
utilities, such as sorting and reallocation.
C. For computation we have I and L, as in psb_ipk_ and
psb_lpk_. The idea is that I<=L, and I is used for almost
everything, e.g. for the integer parts of the sparse matrix data
structures. L is only used for a very small subset of data, and
specifically for the indices in *GLOBAL* numbering mode, hence
the I<=L constraint.
D. The values for I and L can be remapped independently at
configure time over M and E; thus, if a sparse matrix routine is
reallocating integer data through the generic names of the
utilities, the PSB_IPK_ is remapped at compile time onto
PSB_MPK_ or PSB_EPK_ as needed.
E. Because we must have I<=L, this means that supported
configurations are (I=4,L=4), (I=4,L=8), (I=8,L=8). Default
is (I=4,L=8), because it allows us to go to multi-billion linear
systems while still keeping all local data structures on 4-byte
integers.
F. Thus, care must be taken in defining specific interfaces: to
reiterate, if we are dealing with an interface which accepts an
integer array, it should be defined with M and E (which are
always distinct) and not I/L (which might be
indistinguishable). Example in case:
interface psb_realloc
Subroutine psb_r_m_m_rk1(len,rrax,info,pad,lb)
integer(psb_mpk_),Intent(in) :: len
integer(psb_mpk_), allocatable, intent(inout) :: rrax(:)
integer(psb_ipk_) :: info
integer(psb_mpk_), optional, intent(in) :: pad
integer(psb_mpk_), optional, intent(in) :: lb
G. The INFO argument and others related to error handling should
always be PSB_IPK_;
H. Arguments related to MPI interfacing should always be PSB_MPK_
I. Encapsulated types such as psb_i_base_vect_mod can still be I
and L, because if the name of the type is different, the types
are interpreted as distinguishable even when the contents are
identical.
L. This means that most user-level interfaces will deal in I and L,
not M and E, which are going to be used mostly in the
internals.
M. Actually, the user will probably never see M, but will (for
sizeof & friends) see E.