Internal doc changes.

5 years ago · 6b2fa31ae1
parent cc9ef42464
commit 6b2fa31ae1
8 changed files with 148 additions and 32 deletions
--- a/base/tools/psb_c_glob_transpose.F90
+++ b/base/tools/psb_c_glob_transpose.F90
@ -70,14 +70,28 @@
 !  3. Perform a local transpose;
 !  4. Split the matrix: all local entries stay, all halo entries go into
 !     the send buffers, and are converted to global numbering;
-!  5. Do the all-to-all with our simple a2av (the exchange is with the halo
-!     pattern, so the full MPI A2AV is almost certainly too heavy)
+!  5. Do the all-to-all (see below for a discussion of the alternative
+!     communication strategies)
 !  6. The receive is in the extra section of the ACOO buffer; convert
 !     the row indices to local numbering, and discard extra ones (there will
 !     be some)
-!  7. If desc_rx was required, make sure to insert the column indices
+!  7. If desc_rx was requested, make sure to insert the (new) column indices
 !  8. Cleanup and sort the output matrix
-!  9. Copy back into AIN or ATRANS if requested. 
+!  9. Copy back into AIN or ATRANS if requested.
+!
+!  There are three possible exchange algorithms:
+!  1. Use MPI_Alltoallv
+!  2. Use psb_simple_a2av
+!  3. Use psb_simple_triad_a2av
+!  Default choice is 3. The MPI variant has proved to be inefficient;
+!  that is because it is not persistent, therefore you pay the initialization price
+!  every time, and it is not optimized for a sparse communication pattern,
+!  most MPI implementations assume that all communications are non-empty.
+!  The PSB_SIMPLE variants reuse the same communicator, and go for a simplistic
+!  sequence of sends/receive that is quite efficient for a sparse communication
+!  pattern. To be refined/reviewed in the future to compare with neighbour
+!  persistent collectives. 
+!
 !
 #undef  SP_A2AV_MPI
 #undef  SP_A2AV_XI
--- a/base/tools/psb_csphalo.F90
+++ b/base/tools/psb_csphalo.F90
@ -31,11 +31,26 @@
 !    
 ! File: psb_csphalo.f90
 !
-! Subroutine: psb_csphalo
+! Subroutine: psb_csphalo psb_lcsphalo
 !  This routine does the retrieval of remote matrix rows.                   
-!  Note that retrieval is done through GTBLK, therefore it should work      
-!  for any matrix format in A; as for the output, default is CSR. 
-!    
+!  Retrieval is done through GETROW, therefore it should work      
+!  for any matrix format in A; as for the output, default is CSR.
+!  
+!  There is also a specialized version lc_CSR whose interface
+!  is adapted for the needs of c_par_csr_spspmm. 
+!
+!  There are three possible exchange algorithms:
+!  1. Use MPI_Alltoallv
+!  2. Use psb_simple_a2av
+!  3. Use psb_simple_triad_a2av
+!  Default choice is 3. The MPI variant has proved to be inefficient;
+!  that is because it is not persistent, therefore you pay the initialization price
+!  every time, and it is not optimized for a sparse communication pattern,
+!  most MPI implementations assume that all communications are non-empty.
+!  The PSB_SIMPLE variants reuse the same communicator, and go for a simplistic
+!  sequence of sends/receive that is quite efficient for a sparse communication
+!  pattern. To be refined/reviewed in the future to compare with neighbour
+!  persistent collectives. 
 ! 
 ! Arguments: 
 !    a        - type(psb_cspmat_type)   The local part of input matrix A
--- a/base/tools/psb_d_glob_transpose.F90
+++ b/base/tools/psb_d_glob_transpose.F90
@ -70,14 +70,28 @@
 !  3. Perform a local transpose;
 !  4. Split the matrix: all local entries stay, all halo entries go into
 !     the send buffers, and are converted to global numbering;
-!  5. Do the all-to-all with our simple a2av (the exchange is with the halo
-!     pattern, so the full MPI A2AV is almost certainly too heavy)
+!  5. Do the all-to-all (see below for a discussion of the alternative
+!     communication strategies)
 !  6. The receive is in the extra section of the ACOO buffer; convert
 !     the row indices to local numbering, and discard extra ones (there will
 !     be some)
-!  7. If desc_rx was required, make sure to insert the column indices
+!  7. If desc_rx was requested, make sure to insert the (new) column indices
 !  8. Cleanup and sort the output matrix
-!  9. Copy back into AIN or ATRANS if requested. 
+!  9. Copy back into AIN or ATRANS if requested.
+!
+!  There are three possible exchange algorithms:
+!  1. Use MPI_Alltoallv
+!  2. Use psb_simple_a2av
+!  3. Use psb_simple_triad_a2av
+!  Default choice is 3. The MPI variant has proved to be inefficient;
+!  that is because it is not persistent, therefore you pay the initialization price
+!  every time, and it is not optimized for a sparse communication pattern,
+!  most MPI implementations assume that all communications are non-empty.
+!  The PSB_SIMPLE variants reuse the same communicator, and go for a simplistic
+!  sequence of sends/receive that is quite efficient for a sparse communication
+!  pattern. To be refined/reviewed in the future to compare with neighbour
+!  persistent collectives. 
+!
 !
 #undef  SP_A2AV_MPI
 #undef  SP_A2AV_XI
--- a/base/tools/psb_dsphalo.F90
+++ b/base/tools/psb_dsphalo.F90
@ -31,11 +31,26 @@
 !    
 ! File: psb_dsphalo.f90
 !
-! Subroutine: psb_dsphalo
+! Subroutine: psb_dsphalo psb_ldsphalo
 !  This routine does the retrieval of remote matrix rows.                   
-!  Note that retrieval is done through GTBLK, therefore it should work      
-!  for any matrix format in A; as for the output, default is CSR. 
-!    
+!  Retrieval is done through GETROW, therefore it should work      
+!  for any matrix format in A; as for the output, default is CSR.
+!  
+!  There is also a specialized version ld_CSR whose interface
+!  is adapted for the needs of d_par_csr_spspmm. 
+!
+!  There are three possible exchange algorithms:
+!  1. Use MPI_Alltoallv
+!  2. Use psb_simple_a2av
+!  3. Use psb_simple_triad_a2av
+!  Default choice is 3. The MPI variant has proved to be inefficient;
+!  that is because it is not persistent, therefore you pay the initialization price
+!  every time, and it is not optimized for a sparse communication pattern,
+!  most MPI implementations assume that all communications are non-empty.
+!  The PSB_SIMPLE variants reuse the same communicator, and go for a simplistic
+!  sequence of sends/receive that is quite efficient for a sparse communication
+!  pattern. To be refined/reviewed in the future to compare with neighbour
+!  persistent collectives. 
 ! 
 ! Arguments: 
 !    a        - type(psb_dspmat_type)   The local part of input matrix A
--- a/base/tools/psb_s_glob_transpose.F90
+++ b/base/tools/psb_s_glob_transpose.F90
@ -70,14 +70,28 @@
 !  3. Perform a local transpose;
 !  4. Split the matrix: all local entries stay, all halo entries go into
 !     the send buffers, and are converted to global numbering;
-!  5. Do the all-to-all with our simple a2av (the exchange is with the halo
-!     pattern, so the full MPI A2AV is almost certainly too heavy)
+!  5. Do the all-to-all (see below for a discussion of the alternative
+!     communication strategies)
 !  6. The receive is in the extra section of the ACOO buffer; convert
 !     the row indices to local numbering, and discard extra ones (there will
 !     be some)
-!  7. If desc_rx was required, make sure to insert the column indices
+!  7. If desc_rx was requested, make sure to insert the (new) column indices
 !  8. Cleanup and sort the output matrix
-!  9. Copy back into AIN or ATRANS if requested. 
+!  9. Copy back into AIN or ATRANS if requested.
+!
+!  There are three possible exchange algorithms:
+!  1. Use MPI_Alltoallv
+!  2. Use psb_simple_a2av
+!  3. Use psb_simple_triad_a2av
+!  Default choice is 3. The MPI variant has proved to be inefficient;
+!  that is because it is not persistent, therefore you pay the initialization price
+!  every time, and it is not optimized for a sparse communication pattern,
+!  most MPI implementations assume that all communications are non-empty.
+!  The PSB_SIMPLE variants reuse the same communicator, and go for a simplistic
+!  sequence of sends/receive that is quite efficient for a sparse communication
+!  pattern. To be refined/reviewed in the future to compare with neighbour
+!  persistent collectives. 
+!
 !
 #undef  SP_A2AV_MPI
 #undef  SP_A2AV_XI
--- a/base/tools/psb_ssphalo.F90
+++ b/base/tools/psb_ssphalo.F90
@ -31,11 +31,26 @@
 !    
 ! File: psb_ssphalo.f90
 !
-! Subroutine: psb_ssphalo
+! Subroutine: psb_ssphalo psb_lssphalo
 !  This routine does the retrieval of remote matrix rows.                   
-!  Note that retrieval is done through GTBLK, therefore it should work      
-!  for any matrix format in A; as for the output, default is CSR. 
-!    
+!  Retrieval is done through GETROW, therefore it should work      
+!  for any matrix format in A; as for the output, default is CSR.
+!  
+!  There is also a specialized version ls_CSR whose interface
+!  is adapted for the needs of s_par_csr_spspmm. 
+!
+!  There are three possible exchange algorithms:
+!  1. Use MPI_Alltoallv
+!  2. Use psb_simple_a2av
+!  3. Use psb_simple_triad_a2av
+!  Default choice is 3. The MPI variant has proved to be inefficient;
+!  that is because it is not persistent, therefore you pay the initialization price
+!  every time, and it is not optimized for a sparse communication pattern,
+!  most MPI implementations assume that all communications are non-empty.
+!  The PSB_SIMPLE variants reuse the same communicator, and go for a simplistic
+!  sequence of sends/receive that is quite efficient for a sparse communication
+!  pattern. To be refined/reviewed in the future to compare with neighbour
+!  persistent collectives. 
 ! 
 ! Arguments: 
 !    a        - type(psb_sspmat_type)   The local part of input matrix A
--- a/base/tools/psb_z_glob_transpose.F90
+++ b/base/tools/psb_z_glob_transpose.F90
@ -70,14 +70,28 @@
 !  3. Perform a local transpose;
 !  4. Split the matrix: all local entries stay, all halo entries go into
 !     the send buffers, and are converted to global numbering;
-!  5. Do the all-to-all with our simple a2av (the exchange is with the halo
-!     pattern, so the full MPI A2AV is almost certainly too heavy)
+!  5. Do the all-to-all (see below for a discussion of the alternative
+!     communication strategies)
 !  6. The receive is in the extra section of the ACOO buffer; convert
 !     the row indices to local numbering, and discard extra ones (there will
 !     be some)
-!  7. If desc_rx was required, make sure to insert the column indices
+!  7. If desc_rx was requested, make sure to insert the (new) column indices
 !  8. Cleanup and sort the output matrix
-!  9. Copy back into AIN or ATRANS if requested. 
+!  9. Copy back into AIN or ATRANS if requested.
+!
+!  There are three possible exchange algorithms:
+!  1. Use MPI_Alltoallv
+!  2. Use psb_simple_a2av
+!  3. Use psb_simple_triad_a2av
+!  Default choice is 3. The MPI variant has proved to be inefficient;
+!  that is because it is not persistent, therefore you pay the initialization price
+!  every time, and it is not optimized for a sparse communication pattern,
+!  most MPI implementations assume that all communications are non-empty.
+!  The PSB_SIMPLE variants reuse the same communicator, and go for a simplistic
+!  sequence of sends/receive that is quite efficient for a sparse communication
+!  pattern. To be refined/reviewed in the future to compare with neighbour
+!  persistent collectives. 
+!
 !
 #undef  SP_A2AV_MPI
 #undef  SP_A2AV_XI
--- a/base/tools/psb_zsphalo.F90
+++ b/base/tools/psb_zsphalo.F90
@ -31,11 +31,26 @@
 !    
 ! File: psb_zsphalo.f90
 !
-! Subroutine: psb_zsphalo
+! Subroutine: psb_zsphalo psb_lzsphalo
 !  This routine does the retrieval of remote matrix rows.                   
-!  Note that retrieval is done through GTBLK, therefore it should work      
-!  for any matrix format in A; as for the output, default is CSR. 
-!    
+!  Retrieval is done through GETROW, therefore it should work      
+!  for any matrix format in A; as for the output, default is CSR.
+!  
+!  There is also a specialized version lz_CSR whose interface
+!  is adapted for the needs of z_par_csr_spspmm. 
+!
+!  There are three possible exchange algorithms:
+!  1. Use MPI_Alltoallv
+!  2. Use psb_simple_a2av
+!  3. Use psb_simple_triad_a2av
+!  Default choice is 3. The MPI variant has proved to be inefficient;
+!  that is because it is not persistent, therefore you pay the initialization price
+!  every time, and it is not optimized for a sparse communication pattern,
+!  most MPI implementations assume that all communications are non-empty.
+!  The PSB_SIMPLE variants reuse the same communicator, and go for a simplistic
+!  sequence of sends/receive that is quite efficient for a sparse communication
+!  pattern. To be refined/reviewed in the future to compare with neighbour
+!  persistent collectives. 
 ! 
 ! Arguments: 
 !    a        - type(psb_zspmat_type)   The local part of input matrix A