Debugging Hints
Here we list some of common assertion failure, errors, wrong outputs, and the solutions.
Ground State Calculation
[2021-05-09]
Assertion:
block2/parallel_mpi.hpp:330: void block2::MPICommunicator<S>::reduce_sum(double*, size_t, int) [with S = block2::SU2Long; size_t = long unsigned int]: Assertion `ierr == 0' failed.
Conditions: More than one MPI processors, QCTypes.Conventional
, Random.rand_seed(0)
, and gaopt
. Random assertion failure.
Reason: A different gaopt reordering was used in different mpi processors. Then the error happens during the initialization of environments. Then there will be an array-size mismatching due to the difference in integrals.
Solution: Broadcast the orbital reordering indices before reordering the integral.
[2021-05-10]
Assertion:
block2/operator_functions.hpp:185: void block2::OperatorFunctions<S>::tensor_rotate(const std::shared_ptr<block2::SparseMatrix<S> >&, const std::shared_ptr<block2::SparseMatrix<S> >&, const std::shared_ptr<block2::SparseMatrix<S> >&, const std::shared_ptr<block2::SparseMatrix<S> >&, bool, double) const [with S = block2::SZLong]: Assertion `a->get_type() == SparseMatrixTypes::Normal && c->get_type() == SparseMatrixTypes::Normal && rot_bra->get_type() == SparseMatrixTypes::Normal && rot_ket->get_type() == SparseMatrixTypes::Normal' failed.
Conditions: Loaded MPO, CSR.
Reason: The non-CSR OperatorFunctions
is used for calculation requiring CSR matrices, after loading MPO.
Solution: Change csr_opf = OperatorFunctions(cg)
to csr_opf = CSROperatorFunctions(cg)
.
[2021-05-11]
Output:
Sweep = 15 | Direction = backward | Bond dimension = 2000 | Noise = 1.00e-07 | Dav threshold = 1.00e-08
<-- Site = 11- 12 .. Mmps = 3 Ndav = 1 E = -36.8356589402 Error = 0.00e+00 FLOPS = 4.12e+06 Tdav = 0.02 T = 0.17
<-- Site = 10- 11 .. Mmps = 10 Ndav = 1 E = -36.8356589402 Error = 0.00e+00 FLOPS = 2.91e+08 Tdav = 0.02 T = 0.18
<-- Site = 9- 10 .. Mmps = 35 Ndav = 1 E = -36.8356589402 Error = 0.00e+00 FLOPS = 9.70e+09 Tdav = 0.02 T = 0.20
<-- Site = 8- 9 .. Mmps = 126 Ndav = 1 E = -36.8356589402 Error = 0.00e+00 FLOPS = 6.96e+10 Tdav = 0.06 T = 0.40
<-- Site = 7- 8 .. Mmps = 462 Ndav = 1 E = -36.8356589402 Error = 0.00e+00 FLOPS = 1.52e+11 Tdav = 0.28 T = 1.08
<-- Site = 6- 7 .. Mmps = 1454 Ndav = 1 E = -36.8356589402 Error = 4.57e-13 FLOPS = 2.27e+11 Tdav = 0.79 T = 2.54
<-- Site = 5- 6 .. Mmps = 1679 Ndav = 12 E = -37.0888587109 Error = 1.41e-12 FLOPS = 2.83e+11 Tdav = 7.21 T = 12.32
<-- Site = 4- 5 .. Mmps = 904 Ndav = 1 E = -37.0888587109 Error = 1.53e-12 FLOPS = 1.95e+11 Tdav = 0.27 T = 1.91
<-- Site = 3- 4 .. Mmps = 490 Ndav = 1 E = -37.0888587109 Error = 8.62e-13 FLOPS = 7.32e+10 Tdav = 0.05 T = 0.60
<-- Site = 2- 3 .. Mmps = 209 Ndav = 1 E = -37.0888587109 Error = 2.47e-13 FLOPS = 9.69e+09 Tdav = 0.02 T = 0.28
<-- Site = 1- 2 .. Mmps = 64 Ndav = 1 E = -37.0888587109 Error = 9.69e-15 FLOPS = 1.06e+09 Tdav = 0.01 T = 0.26
<-- Site = 0- 1 .. Mmps = 11 Ndav = 1 E = -37.0888587109 Error = 5.58e-15 FLOPS = 5.78e+06 Tdav = 0.02 T = 0.16
Time elapsed = 187.772 | E = -37.0888587109 | DE = -6.18e-12 | DW = 1.53e-12
Time sweep = 20.100 | 2.11 TFLOP/SWP
| Tcomm = 7.916 | Tidle = 3.657 | Twait = 0.000 | Dmem = 89.2 MB (11%) | Imem = 93.8 KB (96%) | Hmem = 736 MB | Pmem = 50.8 MB
| Tread = 0.505 | Twrite = 0.553 | Tfpread = 0.462 | Tfpwrite = 0.090 | Tasync = 0.000
| Trot = 0.368 | Tctr = 0.055 | Tint = 0.016 | Tmid = 2.304 | Tdctr = 0.033 | Tdiag = 0.310 | Tinfo = 0.039
| Teff = 1.591 | Tprt = 2.578 | Teig = 8.760 | Tblk = 16.722 | Tmve = 3.376 | Tdm = 0.000 | Tsplt = 0.000 | Tsvd = 1.678
Sweep = 16 | Direction = forward | Bond dimension = 2000 | Noise = 1.00e-07 | Dav threshold = 1.00e-08
--> Site = 0- 1 .. Mmps = 3 Ndav = 1 E = -37.0888587109 Error = 0.00e+00 FLOPS = 4.51e+06 Tdav = 0.02 T = 0.18
--> Site = 1- 2 .. Mmps = 10 Ndav = 1 E = -37.0888587109 Error = 0.00e+00 FLOPS = 8.65e+08 Tdav = 0.01 T = 0.09
--> Site = 2- 3 .. Mmps = 35 Ndav = 1 E = -37.0888587109 Error = 0.00e+00 FLOPS = 1.03e+10 Tdav = 0.02 T = 0.11
--> Site = 3- 4 .. Mmps = 126 Ndav = 1 E = -37.0888587109 Error = 0.00e+00 FLOPS = 7.65e+10 Tdav = 0.05 T = 0.35
--> Site = 4- 5 .. Mmps = 462 Ndav = 1 E = -37.0888587109 Error = 0.00e+00 FLOPS = 1.61e+11 Tdav = 0.32 T = 1.25
--> Site = 5- 6 .. Mmps = 1511 Ndav = 1 E = -37.0888587109 Error = 3.25e-13 FLOPS = 2.24e+11 Tdav = 0.76 T = 2.50
--> Site = 6- 7 .. Mmps = 1805 Ndav = 17 E = -36.8356599462 Error = 1.65e-12 FLOPS = 3.10e+11 Tdav = 10.53 T = 15.73
--> Site = 7- 8 .. Mmps = 975 Ndav = 1 E = -36.8356599462 Error = 1.51e-12 FLOPS = 1.59e+11 Tdav = 0.38 T = 2.13
--> Site = 8- 9 .. Mmps = 408 Ndav = 1 E = -36.8356599462 Error = 8.11e-13 FLOPS = 7.52e+10 Tdav = 0.06 T = 0.53
--> Site = 9- 10 .. Mmps = 156 Ndav = 1 E = -36.8356599462 Error = 2.57e-13 FLOPS = 1.16e+10 Tdav = 0.02 T = 0.13
--> Site = 10- 11 .. Mmps = 57 Ndav = 1 E = -36.8356599462 Error = 1.46e-14 FLOPS = 4.29e+08 Tdav = 0.01 T = 0.18
--> Site = 11- 12 .. Mmps = 12 Ndav = 1 E = -36.8356599462 Error = 4.83e-15 FLOPS = 4.13e+06 Tdav = 0.02 T = 0.06
Time elapsed = 211.003 | E = -37.0888587109 | DE = 4.21e-12 | DW = 1.65e-12
Time sweep = 23.231 | 3.23 TFLOP/SWP
| Tcomm = 8.521 | Tidle = 2.996 | Twait = 0.000 | Dmem = 95.4 MB (10%) | Imem = 93.8 KB (96%) | Hmem = 736 MB | Pmem = 52.5 MB
| Tread = 0.550 | Twrite = 0.624 | Tfpread = 0.504 | Tfpwrite = 0.092 | Tasync = 0.000
| Trot = 0.385 | Tctr = 0.039 | Tint = 0.023 | Tmid = 2.480 | Tdctr = 0.052 | Tdiag = 0.323 | Tinfo = 0.035
| Teff = 1.734 | Tprt = 2.656 | Teig = 12.197 | Tblk = 19.563 | Tmve = 3.667 | Tdm = 0.000 | Tsplt = 0.000 | Tsvd = 1.508
Conditions: More than one MPI processors, and QCTypes.Conventional
.
Reason: We see from the output that the energy jumps between two values even in very large bond dimension.
If only one MPI is used, there is no such behavior.
This is because the input integrals h1e
and g2e
are not synchronized.
In QCTypes.Conventional
, communication between MPI procs only happens at the middle site.
After this communication, the inconsistentcy between integrals can cause an artificial change in energy.
Note that inside block2
, we do not explicitly synchronize integral. In future, for larger systems,
the integral can even be distributed, such that synchronization will not be meaningful.
Solution: Synchronizing the input integrals h1e
and g2e
can solve this problem.
[2021-05-12 | 2021-06-07]
Error Message: (note that this problem in block2main
has been fixed in commit 4f87784)
Traceback (most recent call last):
File "block2/pyblock2/driver/block2main", line 302, in <module>
mps.load_data()
RuntimeError: MPS::load_data on '/central/scratch/.../F.MPS.KET.-1' failed.
or
Traceback (most recent call last):
File "block2/pyblock2/driver/block2main", line 313, in <module>
mps.load_mutable()
RuntimeError: SparseMatrix:load_data on '/central/scratch/.../F.MPS.KET.14' failed.
or
Traceback (most recent call last):
File "block2/pyblock2/driver/block2main", line 313, in <module>
mps.load_mutable()
ValueError: cannot create std::vector larger than max_size()
Conditions: More than one MPI processors, python driver, happening with a very low probability.
Reason A: The problematic code is:
mps.load_data()
if mps.dot != dot and nroots == 1:
mps.dot = dot
mps.save_data()
And the non-root MPI proc can load data before or after the root proc saves the data. The wrong loaded data can cause the
subsequent mps.load_mutable()
to fail.
Solution A: Adding MPI.barrier()
around mps.save_data()
.
Reason B: The problematic code is:
mps.load_mutable()
mps.save_mutable()
And the non-root MPI proc can load data before or after the root proc saves the data. This may cause simultaneously reading and writing into the same file (with a very low probability).
Solution B: Adding MPI.barrier()
between mps.load_mutable()
and mps.save_mutable()
.
Linear
[2021-05-14]
Assertion:
block2/moving_environment.hpp:110: block2::MovingEnvironment<S>::MovingEnvironment(const std::shared_ptr<block2::MPO<S> >&, const std::shared_ptr<block2::MPS<S> >&, const std::shared_ptr<block2::MPS<S> >&, const string&) [with S = block2::SU2Long; std::string = std::__cxx11::basic_string<char>]: Assertion `bra->center == ket->center && bra->dot == ket->dot' failed.
Conditions: Different bra and ket.
Reason: The bra and ket for initialization of MovingEnvironment do not have the same center.
Solution: Initializing bra or ket with consistent center, or do a sweep to align the MPS center.
[2021-05-14]
Assertion:
block2/operator_functions.hpp:194: void block2::OperatorFunctions<S>::tensor_rotate(const std::shared_ptr<block2::SparseMatrix<S> >&, const std::shared_ptr<block2::SparseMatrix<S> >&, const std::shared_ptr<block2::SparseMatrix<S> >&, const std::shared_ptr<block2::SparseMatrix<S> >&, bool, double) const [with S = block2::SU2Long]: Assertion `adq == cdq && a->info->n >= c->info->n' failed.
Conditions 1: Different bra and ket.
Reason 1: The bra and ket has different MPSInfo, but the two MPSInfo has the same tag. When saving to/loading from disk, the information stored in the two MPSInfo can interfere with each other.
Solution 1: Use different tags for different MPSInfo.
Conditions 2: MPSInfo in MPS differs from data in MPS.
Reason 2: An MPS has been loaded in from disk with a wrong MPSInfo.
Solution 2: Load in MPSInfo as well or make sure MPSInfo is correct.
[2021-05-14]
Assertion:
block2/csr_matrix_functions.hpp:387: static void block2::CSRMatrixFunctions::multiply(const MatrixRef&, bool, const block2::CSRMatrixRef&, bool, const MatrixRef&, double, double): Assertion `(conja ? a.m : a.n) == (conjb ? b.n : b.m)' failed.
Conditions: Different bra and ket, CSR, IdentityMPO with bra and ket with different bases.
Reason: Wrong basis was used in the constructor of IdentityMPO.
Solution: Change IdentityMPO(mpo_bra.basis, mpo_bra.basis, ...)
to IdentityMPO(mpo_bra.basis, mpo_ket.basis, ...)
.
[2021-05-18]
Assertion:
block2/csr_matrix_functions.hpp:396: static void block2::CSRMatrixFunctions::multiply(const MatrixRef&, bool, const block2::CSRMatrixRef&, bool, const MatrixRef&, double, double): Assertion `st == SPARSE_STATUS_SUCCESS' failed.
Conditions: CSR, SeqTypes.Tasked
.
Reason: SeqTypes.Tasked
cannot be used together with CSR.
Solution: Change Global.threading.seq_type = SeqTypes.Tasked
to Global.threading.seq_type = SeqTypes.Nothing
.
[2021-05-22]
Assertion:
block2/sparse_matrix.hpp:552: void block2::SparseMatrixInfo<S, typename std::enable_if<std::integral_constant<bool, (sizeof (S) == sizeof (unsigned int))>::value>::type>::save_data(std::ostream&, bool) const [with S = block2::SU2Long; typename std::enable_if<std::integral_constant<bool, (sizeof (S) == sizeof (unsigned int))>::value>::type = void; std::ostream = std::basic_ostream<char>]: Assertion `n != -1' failed.
Conditions: mps.save_mutable
.
Reason: Some MPS tensors are deallocated (unloaded) after mps.flip_fused_form(...)
or mps.move_left(...)
.
Solution: Call mps.load_mutable()
after using mps.flip_fused_form(...)
or mps.move_left(...)
,
so that mps.save_mutable()
will be successful.
[2021-05-31]
Error:
exceeding allowed memory
Conditions: Linear
with tme != nullptr
.
Reason: By default, no bond dimension truncation is performed for MPS in Linear::tme
.
Solution: Set target_bra_bond_dim
and target_ket_bond_dim
fields in Linear
to a suitable bond dimension.
[2021-06-07]
Assertion:
block2/parallel_rule.hpp:215: void block2::ParallelRule<S>::distributed_apply(T, const std::vector<std::shared_ptr<block2::OpExpr<S> > >&, const std::vector<std::shared_ptr<block2::OpExpr<S> > >&, std::vector<std::shared_ptr<block2::SparseMatrix<S> > >&) const [with T = block2::ParallelTensorFunctions<S>::right_contract(const std::shared_ptr<block2::OperatorTensor<S> >&, const std::shared_ptr<block2::OperatorTensor<S> >&, std::shared_ptr<block2::OperatorTensor<S> >&, const std::shared_ptr<block2::Symbolic<S> >&, block2::OpNamesSet) const [with S = block2::SZLong]::<lambda(const std::vector<std::shared_ptr<block2::OpExpr<block2::SZLong> >, std::allocator<std::shared_ptr<block2::OpExpr<block2::SZLong> > > >&)>; S = block2::SZLong]: Assertion `expr->get_type() == OpTypes::ExprRef' failed.
Conditions: ParallelMPO
.
Reason: The problematic code is:
impo = IdentityMPOSCI(hamil)
impo = ParallelMPO(impo, ParallelRuleIdentity(MPI))
On most cases, ParallelMPO
may not work with unsimplified MPO. The MPO should first be simplified and then parallelized.
Solution: Use ClassicParallelMPO
(may have bad performance) or change the code to
impo = IdentityMPOSCI(hamil)
impo = SimplifiedMPO(impo, Rule())
impo = ParallelMPO(impo, ParallelRuleIdentity(MPI))
[2021-06-08]
Assertion:
block2/sparse_matrix.hpp:1548: void block2::SparseMatrix<S>::swap_to_fused_left(const std::shared_ptr<block2::SparseMatrix<S> >&, const block2::StateInfo<S>&, const block2::StateInfo<S>&, const block2::StateInfo<S>&, const block2::StateInfo<S>&, const block2::StateInfo<S>&, const block2::StateInfo<S>&, const block2::StateInfo<S>&, const std::shared_ptr<block2::CG<S> >&) [with S = block2::SZLong]: Assertion `mat->info->is_wavefunction' failed.
Conditions: IdentityMPO used in MPI simulation without ParallelMPO
.
Reason: The problematic code is:
impo = IdentityMPOSCI(hamil)
me = MovingEnvironment(impo, mps1, mps2)
Solution: Use ParallelMPO
(vide supra):
impo = IdentityMPOSCI(hamil)
impo = SimplifiedMPO(impo, Rule())
impo = ParallelMPO(impo, ParallelRuleIdentity(MPI))
[2021-08-20]
Assertion:
dmrg/mps.hpp:1547: void block2::MPS<S>::move_left(const std::shared_ptr<block2::CG<S> >&, const std::shared_ptr<block2::ParallelRule<S> >&) [with S = block2::SU2Long]: Assertion `tensors[center - 1]->info->n != 0' failed.
Reason: A SZ MPS is loaded for use in a SU2 code.
[2021-12-14]
Assertion:
core/matrix_functions.hpp:307: static void block2::GMatrixFunctions<double>::multiply(const MatrixRef&, uint8_t, const MatrixRef&, uint8_t, const MatrixRef&, double, double): Assertion `a.n >= b.m && c.m == a.m && c.n >= b.n' failed.
Reason: For transition reduced density matrix, if bra and ket are the MPSs with the same tag, they must be the same object. For example, this is the case when they are the same ith root from a state-averaged MultiMPS. Therefore, one should not “extract” the same root twice with the same tag. This will cause the conflict in the disk storage. This was a bug in the main driver for onedot transition 1/2 reduced density matrix with more than one root.
MRCI/SCI computations
[2021-06-08]
Error:
find_site_op_info cant find q:< N=? SZ=? PG=? >iSite=??
Conditions: Issue with quantum number setup.
Reason: This can happen if symmetry is used but the integrals don’t obey symmetry.
Solution: Add the following code. Attention: This will change the fcidump. Use with case and check symmetrize_error
symmetrize_error = fcidump.symmetrize(orb_sym)
Library Import
[2022-08-18]
Error:
*** Error in `python': double free or corruption (out)
Reason: This can happen if there are two pybind11 libraries, but they are built with differrent compiler versions. When import both libraries in the same python script, this error can happen.
Solution: One workaround is to write two python scripts so that the two pybind11 libraries are not imported in the same script. Otherwise, one needs to compile both extensions manually, or use the pip version for both libraries, so that they can be used together.
[2022-08-19]
Error:
/.../libmkl_avx2.so: undefined symbol: mkl__blas_write_lock_dgemm_hashtable
INTEL MKL ERROR: /.../libmkl_avx2.so.1: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
OSError: /.../libmkl_def.so: undefined symbol: mkl_dnn_getTtl_F32
Solution: This can be solved by add link flags -Wl,--no-as-needed
with all absolute *so
path for mkl libraries.
Note that flags -Wl,-rpath -Wl,/../lib -L/.../lib
should not be used.
A special case is when both the “so.1” (2021 MKL) and “so” (2019 ML) are used. We need to make sure the block2.so
library is linked to only pure “so.1” or only pure “so”.
[2022-10-11]
Error:
svd dead loop, with full CPU usage. CPU has avx512.
Solution: Update MKL from 2019 to 2021.4 or 2022.1.
[2022-12-26]
Error:
Insert ``cout`` in openMP parallel code will cause stuck. Because it is not thread-safe. Use ``printf`` instead.