Literate version of DOS calculation with PEXSI

Table of Contents

-*- mode: Org -*-

1 Introduction

The inertia-count technology can be used to estimate directly the integrated density of states. From this it is relatively easy to get an approximation to the DOS.

2 Auxiliary module for holding the plan

The plan feature allows, among other things, the re-use of factorization information.

3 The PEXSI DOS code

3.1 Main structure

This is the main structure of the code.

module m_pexsi_dos
#ifdef SIESTA__PEXSI
    use precision, only  : dp

  public :: pexsi_dos

CONTAINS

<<routine-header>>
<<routine-variables>>
!  --------  for serial compilation
#ifndef MPI
    call die("PEXSI needs MPI")
#else
<<define-communicators>>
<<prepare-plan>>
<<re-distribute-matrices>>
<<set-options>>
<<load-hs-matrices>>
<<factorization>>
<<compute-int-dos>>
<<clean-up>>
#endif

CONTAINS

<<support-routines>>

end subroutine pexsi_dos
#endif
end module m_pexsi_dos

3.2 Routine header

subroutine pexsi_dos(no_u, no_l, nspin_in,  &
     maxnh, numh, listhptr, listh, H, S, qtot, ef_in)

<<used-modules>>

  implicit          none

  integer, intent(in)  :: maxnh, no_u, no_l, nspin_in
  integer, intent(in), target  :: listh(maxnh), numh(no_l), listhptr(no_l)
  real(dp), intent(in), target :: H(maxnh,nspin_in), S(maxnh)
  real(dp), intent(in)        :: qtot ! Total number of electrons
  real(dp), intent(in)        :: ef_in  ! Fermi energy

3.2.1 Used modules

    use fdf
    use parallel, only   : SIESTA_worker, BlockSize
    use parallel, only   : SIESTA_Group, SIESTA_Comm
    use m_mpi_utils, only: globalize_max
    use m_mpi_utils, only: broadcast
    use units,       only: eV
    use m_redist_spmatrix, only: aux_matrix, redistribute_spmatrix
    use class_Distribution
    use alloc,             only: re_alloc, de_alloc
#ifdef MPI
    use mpi_siesta
#endif

    use iso_c_binding
    use f_ppexsi_interface, only: f_ppexsi_options
    use f_ppexsi_interface, only: f_ppexsi_plan_finalize
    use f_ppexsi_interface, only: f_ppexsi_plan_initialize
    use f_ppexsi_interface, only: f_ppexsi_inertia_count_real_symmetric_matrix
    use f_ppexsi_interface, only: f_ppexsi_load_real_symmetric_hs_matrix
    use f_ppexsi_interface, only: f_ppexsi_set_default_options
    use f_ppexsi_interface, &
          only: f_ppexsi_symbolic_factorize_real_symmetric_matrix

3.3 Routine variables

The local variables for the routine must be declared in a certain place for the compiler, but it is more clear to introduce them as they are needed. The routine-variables noweb-ref will be used for this throughout this document.

integer        :: ih, i
integer        :: info
logical        :: write_ok
!------------
external         :: timer

3.4 Define communicators

World_Comm, which is in principle set to Siesta's copy of MPI_Comm_World, is the global communicator.

Some variables need to be broadcast since they were assigned only by the SIESTA worker subset of nodes. They are renamed for clarity.

integer          :: World_Comm, mpirank, ierr
!
real(dp)  :: numElectronExact, ef
integer   :: norbs, scf_step
!
integer   :: nspin
!
! Our global communicator is a duplicate of the passed communicator
!
call MPI_Comm_Dup(true_MPI_Comm_World, World_Comm, ierr)
call mpi_comm_rank( World_Comm, mpirank, ierr )

call timer("pexsi_dos", 1)  

if (SIESTA_worker) then

   ! rename and broadcast some intent(in) variables, which are only
   ! defined for the Siesta subset

   norbs = no_u
   nspin = nspin_in
   numElectronExact = qtot 
   ef = ef_in

endif
!
call broadcast(norbs,comm=World_Comm)
call broadcast(numElectronExact,World_Comm)
call broadcast(nspin,World_Comm)
call broadcast(ef,World_Comm)

Now we need to define the Siesta distribution object and the communicator and distribution object for the first team of PEXSI workers, for the purposes of re-distribution of the relevant matrices. The PEXSI library takes care of further redistribution among teams.

Note that the first team of PEXSI workers starts at the root node. This means that there is overlap between the Siesta workers and the PEXSI workers. While this is in principle more economical (and convenient for information exchange), it can pose problems later on.

I will leave it like that, as I do not yet know how to move information among disjoint communicators (use of an intercommunicator?)

For spin, things are a bit more complicated. We need to make sure that the distributions are defined (via actual ranks) with respect to the same reference bridge communicator. For now, this is WorldComm.

integer :: PEXSI_Pole_Group, PEXSI_Spatial_Group, World_Group
integer, allocatable :: pexsi_pole_ranks_in_world(:)
integer  :: nworkers_SIESTA
integer, allocatable :: siesta_ranks_in_world(:)
integer :: PEXSI_Pole_Group_in_World
integer, allocatable :: PEXSI_Pole_ranks_in_World_Spin(:,:)
integer :: PEXSI_Pole_Comm, PEXSI_Spatial_Comm, PEXSI_Spin_Comm
integer :: numNodesTotal
integer :: npPerPole
logical  :: PEXSI_worker
!
type(Distribution)   :: dist1
type(Distribution), allocatable, target   :: dist2_spin(:)
type(Distribution), pointer :: dist2
integer  :: pbs, color, spatial_rank, spin_rank

Define the Siesta distribution. Note that this is known to all nodes.

call MPI_Comm_Group(World_Comm,World_Group, ierr)
call MPI_Group_Size(SIESTA_Group, nworkers_SIESTA, ierr)
allocate(siesta_ranks_in_world(nworkers_SIESTA))
call MPI_Group_translate_ranks( SIESTA_Group, nworkers_SIESTA, &
     (/ (i,i=0,nworkers_SIESTA-1) /), &
     World_Group, siesta_ranks_in_world, ierr )
call newDistribution(dist1,World_Comm,siesta_ranks_in_world, &
                     TYPE_BLOCK_CYCLIC,BlockSize,"bc dist")
deallocate(siesta_ranks_in_world)
call MPI_Barrier(World_Comm,ierr)

For possibly spin-polarized calculations, we split the communicator.

call mpi_comm_size( World_Comm, numNodesTotal, ierr )

npPerPole  = fdf_get("PEXSI.np-per-pole",4)
if (nspin*npPerPole > numNodesTotal) &
          call die("PEXSI.np-per-pole is too big for MPI size")

! "Row" communicator for independent PEXSI operations on each spin
! The name refers to "spatial" degrees of freedom.
color = mod(mpirank,nspin)    ! {0,1} for nspin = 2, or {0} for nspin = 1
call MPI_Comm_Split(World_Comm, color, mpirank, PEXSI_Spatial_Comm, ierr)

! "Column" communicator for spin reductions
color = mpirank/nspin       
call MPI_Comm_Split(World_Comm, color, mpirank, PEXSI_Spin_Comm, ierr)

! Group and Communicator for first-pole team of PEXSI workers
!
call MPI_Comm_Group(PEXSI_Spatial_Comm, PEXSI_Spatial_Group, Ierr)
call MPI_Group_incl(PEXSI_Spatial_Group, npPerPole,   &
     (/ (i,i=0,npPerPole-1) /),&
     PEXSI_Pole_Group, Ierr)
call MPI_Comm_create(PEXSI_Spatial_Comm, PEXSI_Pole_Group,&
     PEXSI_Pole_Comm, Ierr)


call mpi_comm_rank( PEXSI_Spatial_Comm, spatial_rank, ierr )
call mpi_comm_rank( PEXSI_Spin_Comm, spin_rank, ierr )
PEXSI_worker = (spatial_rank < npPerPole)   ! Could be spin up or spin down

! PEXSI blocksize 
pbs = norbs/npPerPole

! Careful with this. For the purposes of matrix transfers,
! we need the ranks of the Pole group
! in the "bridge" communicator/group (World)

allocate(pexsi_pole_ranks_in_world(npPerPole))
call MPI_Comm_Group(World_Comm, World_Group, Ierr)

call MPI_Group_translate_ranks( PEXSI_Pole_Group, npPerPole, &
     (/ (i,i=0,npPerPole-1) /), &
     World_Group, pexsi_pole_ranks_in_world, ierr )

! What we need is to include the actual world ranks
! in the distribution object
allocate (PEXSI_Pole_ranks_in_World_Spin(npPerPole,nspin))
call MPI_AllGather(pexsi_pole_ranks_in_world,npPerPole,MPI_integer,&
     PEXSI_Pole_Ranks_in_World_Spin(1,1),npPerPole, &
     MPI_integer,PEXSI_Spin_Comm,ierr)

! Create distributions known to all nodes
allocate(dist2_spin(nspin))
do ispin = 1, nspin
   call newDistribution(dist2_spin(ispin), World_Comm, &
                        PEXSI_Pole_Ranks_in_World_Spin(:,ispin),  &
                        TYPE_PEXSI, pbs, "px dist")
enddo
deallocate(pexsi_pole_ranks_in_world,PEXSI_Pole_Ranks_in_World_Spin)
call MPI_Barrier(World_Comm,ierr)

3.5 Prepare plan

The plan is created, used, and destroyed in this routine. Presumably one could re-use the factorizations carried out during the scf cycle, but this is left for a future version.

integer(c_intptr_t)    :: plan
  integer :: numProcRow, numProcCol
  integer :: outputFileIndex
! -- Prepare plan
call get_row_col(npPerPole,numProcRow,numProcCol)

  ! Set the outputFileIndex to be the pole index.
  ! Starting from PEXSI v0.8.0, the first processor for each pole outputs
  ! information

  if( mod( mpirank, npPerPole ) .eq. 0 ) then
    outputFileIndex = mpirank / npPerPole;
  else
    outputFileIndex = -1;
  endif

   plan = f_ppexsi_plan_initialize(&
      PEXSI_Spatial_Comm,&
      numProcRow,&
      numProcCol,&
      outputFileIndex,&
      info) 
   call check_info(info,"plan_initialize in DOS")

3.6 Re-distribute matrices

This is slightly unseemly, but it works. The aux_matrix derived types are used to store and retrieve the matrices in either side. The code is in external auxiliary modules.

type(aux_matrix), allocatable, target :: m1_spin(:)
type(aux_matrix) :: m2
type(aux_matrix), pointer :: m1
integer :: nrows, nnz, nnzLocal, numColLocal
integer, pointer, dimension(:) ::  colptrLocal=> null(), rowindLocal=>null()
!
real(dp), pointer, dimension(:) :: &
        HnzvalLocal=>null(), SnzvalLocal=>null(),  &
        DMnzvalLocal => null() , EDMnzvalLocal => null(), &
        FDMnzvalLocal => null()
!
integer :: ispin, pexsi_spin
pexsi_spin = spin_rank+1  ! {1,2}
! This is done serially on the Siesta side, each time
! filling in the structures in one PEXSI set

allocate(m1_spin(nspin))
do ispin = 1, nspin

   m1 => m1_spin(ispin)

   if (SIESTA_worker) then
      m1%norbs = norbs
      m1%no_l  = no_l
      m1%nnzl  = sum(numH(1:no_l))
      m1%numcols => numH
      m1%cols    => listH
      allocate(m1%vals(2))
      m1%vals(1)%data => S(:)
      m1%vals(2)%data => H(:,ispin)

   endif  ! SIESTA_worker

   call timer("redist_orbs_fwd", 1)

   ! Note that we cannot simply wrap this in a pexsi_spin test, as
   ! there are Siesta nodes in both spin sets.
   ! We must discriminate the PEXSI workers by the distribution info
   dist2 => dist2_spin(ispin)
   call redistribute_spmatrix(norbs,m1,dist1,m2,dist2,World_Comm)

   call timer("redist_orbs_fwd", 2)

   if (PEXSI_worker .and. (pexsi_spin == ispin) ) then

      nrows = m2%norbs          ! or simply 'norbs'
      numColLocal = m2%no_l
      nnzLocal    = m2%nnzl
      call MPI_AllReduce(nnzLocal,nnz,1,MPI_integer,MPI_sum,PEXSI_Pole_Comm,ierr)

      call re_alloc(colptrLocal,1,numColLocal+1,"colptrLocal","pexsi_solver")
      colptrLocal(1) = 1
      do ih = 1,numColLocal
         colptrLocal(ih+1) = colptrLocal(ih) + m2%numcols(ih)
      enddo

      rowindLocal => m2%cols
      SnzvalLocal => m2%vals(1)%data
      HnzvalLocal => m2%vals(2)%data

      call re_alloc(DMnzvalLocal,1,nnzLocal,"DMnzvalLocal","pexsi_solver")
      call re_alloc(EDMnzvalLocal,1,nnzLocal,"EDMnzvalLocal","pexsi_solver")
      call re_alloc(FDMnzvalLocal,1,nnzLocal,"FDMnzvalLocal","pexsi_solver")

      call memory_all("after setting up H+S for PEXSI (PEXSI_workers)",PEXSI_Pole_Comm)

   endif ! PEXSI worker
enddo

! Make these available to all
! (Note that the values are those on process 0, which is in the spin=1 set
! In fact, they are only needed for calls to the interface, so the broadcast
! could be over PEXSI_Spatial_Comm only.

call MPI_Bcast(nrows,1,MPI_integer,0,World_Comm,ierr)
call MPI_Bcast(nnz,1,MPI_integer,0,World_Comm,ierr)

call memory_all("after setting up H+S for PEXSI",World_comm)

3.7 Set options

We use the options interface to get a template with default values, and then fill in a few custom options based on fdf variables. Note that the options derived type is of limited usefulness when the simple DFT driver is not used. The most important entries are then the factorization flag and the number of processors per pole.

type(f_ppexsi_options) :: options
!
integer                :: isSIdentity
integer                :: verbosity
!
call f_ppexsi_set_default_options( options )

isSIdentity = 0

! Ordering flag:
!   1: Use METIS
!   0: Use PARMETIS/PTSCOTCH
options%ordering = fdf_get("PEXSI.ordering",1)

! Number of processors for symbolic factorization
! Only relevant for PARMETIS/PT_SCOTCH
options%npSymbFact = fdf_get("PEXSI.np-symbfact",1)

verbosity = fdf_get("PEXSI.verbosity",1)
options%verbosity = verbosity

3.8 Load H and S matrices

In this version H and S are symmetric. We associate them with the plan (I really do not know very well what happens behind the scenes. Presumably no copy is made.)

call f_ppexsi_load_real_symmetric_hs_matrix(&
      plan,&
      options,&
      nrows,&
      nnz,&
      nnzLocal,&
      numColLocal,&
      colptrLocal,&
      rowindLocal,&
      HnzvalLocal,&
      isSIdentity,&
      SnzvalLocal,&
      info) 

call check_info(info,"load_real_sym_hs_matrix")

3.9 Factorization

Presumably it could be done once and for all for one of the spins, but we have no way to transfer the info from the spin-up to the spin-down manifold.

call f_ppexsi_symbolic_factorize_real_symmetric_matrix(&
     plan, &
     options,&
     info)
call check_info(info,"symbolic_factorize_real_symmetric_matrix")

options%isSymbolicFactorize = 0 ! We do not need it anymore

3.10 Compute integrated DOS by inertia count

integer  :: npoints, nShifts, numNodesSpatial
real(dp), allocatable :: edos(:), intdos(:)

Since we use the processor teams corresponding to the different poles, the number of points in the energy interval should be a multiple of numNodesSpatial/npPerPole, where numNodesSpatial is the number of total nodes per spin component. The number of points is rounded up to make it a multiple of the number of samples per call, as there is no point (energy?) in leaving processors idle.

call mpi_comm_size( PEXSI_Spatial_Comm, numNodesSpatial, ierr )
nShifts = numNodesSpatial/npPerPole

npoints = fdf_get("PEXSI.DOS.npoints",200)

npoints = ceiling(dble(npoints)/nShifts) * nShifts

allocate (edos(npoints),intdos(npoints))

The range of energies is given in the fdf file. In principle it could be an absolute range, but it is referred to the Fermi level by default.

real(dp) :: emin, emax, delta
logical  :: ef_reference
emin = fdf_get("PEXSI.DOS.emin",-1.0_dp,"Ry")
emax = fdf_get("PEXSI.DOS.emax",+1.0_dp,"Ry")

ef_reference = fdf_get("PEXSI.DOS.Ef.Reference",.true.)
if (ef_reference) then
   emin = emin + ef
   emax = emax + ef
endif
  delta = (emax-emin)/(npoints-1)
  do j = 1, npoints
     edos(j) = emin + (j-1)*delta
  enddo

Now we call the inertia-counting routine.

call timer("pexsi-raw-inertia-ct", 1)

if(mpirank == 0) then
   write (6,"(a,f12.5,a,f12.5,a,a,i4)")  &
                'Calling inertia_count for DOS: [', &
                emin/eV, ",", emax/eV, "] (eV)", &
                " Nshifts: ", npoints
endif

call f_ppexsi_inertia_count_real_symmetric_matrix(&
  plan,&
  options,&
  npoints,&
  edos,&
  intdos,&
  info) 
call check_info(info,"inertia_count_real_symmetric_matrix in DOS")

call timer("pexsi-raw-inertia-ct", 2)

We need to gather the information for both spins, at least on the global root node, for output. This could be done only by a group of npPerPole processors. We implicitly assume that the global rank 0 is also rank 0 in the spin communicator!

real(dp), allocatable :: intdos_spin(:,:)
integer :: j, lun
allocate(intdos_spin(npoints,nspin))
call MPI_Gather( intdos, npoints, MPI_double_precision, &
     intdos_spin(1,1), npoints, MPI_double_precision, &
     0, PEXSI_Spin_Comm, ierr )

if (mpirank == 0) then
   call io_assign(lun)
   open(unit=lun,file="PEXSI_INTDOS",form="formatted",status="unknown", &
        position="rewind",action="write")
   write(lun,"(2f15.6,i6,i2,a)") ef/eV, qtot, npoints, nspin, &
        "# (Ef, qtot, npoints, nspin) / npoints lines: E(eV), IntDos(E)"
   ! Note that the inertia count procedure counts the number of eigenvalues,
   ! so for nspin=1 we have to multiply by two to get the number of states
   do j=1,npoints
      if (nspin == 1) then
         write(lun,"(f15.6,f15.2)") edos(j)/eV, 2*intdos(j)
      else
         write(lun,"(f15.6,2f15.2)") edos(j)/eV, intdos_spin(j,1), intdos_spin(j,2)
      endif
   enddo
endif

call timer("pexsi_dos", 2)

3.11 Clean up

We deallocate the auxiliary derived types for the sparse matrices, finalize the plan, and clean up the communicators and groups and the distributions.

deallocate(edos,intdos,intdos_spin)

if (SIESTA_worker) then
   ! Its pointers were not actually allocated
   deallocate(m1%vals)
endif

call delete(dist1)
do ispin = 1, nspin
   call delete(dist2_spin(ispin))
enddo
deallocate(dist2_spin)
deallocate(m1_spin)

if (PEXSI_worker) then

   call de_alloc(colptrLocal,"colptrLocal","pexsi_dos")

   call de_alloc(m2%numcols,"m2%numcols","m_pexsi_dos")
   call de_alloc(m2%cols,"m2%cols","m_pexsi_dos")
   do j=1,size(m2%vals)
      call de_alloc(m2%vals(j)%data,"m2%vals(j)%data","m_pexsi_dos")
   enddo
   deallocate(m2%vals)

endif

call f_ppexsi_plan_finalize( plan, info )

call MPI_Comm_Free(PEXSI_Spatial_Comm, ierr)
call MPI_Comm_Free(PEXSI_Spin_Comm, ierr)
call MPI_Comm_Free(World_Comm, ierr)

! This communicator was created from a subgroup.
! As such, it is MPI_COMM_NULL for those processes
! not in the subgroup (non PEXSI_workers). Only
! defined communicators can be freed
if (PEXSI_worker) then
   call MPI_Comm_Free(PEXSI_Pole_Comm, ierr)
endif

call MPI_Group_Free(PEXSI_Spatial_Group, ierr)
call MPI_Group_Free(PEXSI_Pole_Group, ierr)
call MPI_Group_Free(World_Group, ierr)

3.12 TODO Support routines

Several routines

<<get-row-col>>
<<check-info>>

3.12.1 Row and column partition of npPerPole

subroutine get_row_col(np,nrow,ncol)
integer, intent(in)  :: np
integer, intent(out) :: nrow, ncol
!
! Finds the factors nrow and ncol such that nrow*ncol=np,
! are as similar as possible, and nrow>=ncol.
! For prime np, ncol=1, nrow=np.

ncol  = floor(sqrt(dble(np)))
do
  nrow = np/ncol
  if (nrow*ncol == np) exit
  ncol = ncol - 1
enddo
end subroutine get_row_col

3.12.2 Error dispatcher

subroutine check_info(info,str)
integer, intent(in) :: info
character(len=*), intent(in) :: str

    if(mpirank == 0) then
       if (info /= 0) then
          write(6,*) trim(str) // " info : ", info
          call die("Error exit from " // trim(str) // " routine")
       endif
      call pxfflush(6)
    endif       
end subroutine check_info

Author: Alberto Garcia

Created: 2016-09-14 Wed 11:32

Emacs 24.5.1 (Org mode 8.2.10)

Validate