Global Settings
In block2, we try to minimize the use of global variables.
Two global variables (frame_()
and threading_()
) have been used for controlling
global settings such as stack memory, scartch folder and threading schemes.
Note that in block2
the distributed parallelization scheme is handled
locally.
Threading
-
enum block2::ThreadingTypes
An indicator for where the openMP shared-memory threading should be activated. In the case of nested openMP, the total number of nested threading layers is determined from this enumeration.
For each enumerator, the number in brackets is the total number of threading layers.
Values:
-
enumerator SequentialGEMM
[0] seq mkl
-
enumerator BatchedGEMM
[1] parallel mkl
-
enumerator Quanta
[1] openmp quanta + seq mkl
-
enumerator QuantaBatchedGEMM
[2] openmp quanta + parallel mkl
-
enumerator Operator
[1] openmp operator
-
enumerator OperatorBatchedGEMM
[2] openmp operator + parallel mkl
-
enumerator OperatorQuanta
[2] openmp operator + openmp quanta
-
enumerator OperatorQuantaBatchedGEMM
[3] openmp operator + openmp quanta + parallel mkl
-
enumerator Global
[1] openmp for general non-core-algorithm tasks
-
enumerator SequentialGEMM
-
enum block2::SeqTypes
Method of GEMM (dense matrix multiplication) parallelism. For CSR matrix multiplication, the only possbile case is
SeqTypes::None
, but one can still useSeqTypes::Simple
and it will only parallelize dense matrix multiplication.Values:
-
enumerator None
GEMM are not parallelized. Parallelism may happen inside each GEMM, if a threaded version of MKL is linked.
-
enumerator Simple
GEMM written to the different outputs are parallelized, otherwise they are executed in sequential. With this mode, the code will sort and divide GEMM to several groups (batches). Inside each batch, the output addresses are guarenteed to be different. The
cblas_dgemm_batch
is invoked to compute each batch.
-
enumerator Auto
DGEMM automatically divided into several batches only when there are data dependency. Conflicts of output are automatically resolved by introducing temporary arrays. The
cblas_dgemm_batch
is invoked to compute each batch. This option normally requires a large amount of time for preprocessing and it will introduce a large number of temporary arrays, which is not memory friendly.
-
enumerator Tasked
GEMM will be evenly divided into
n_threads
groups, Different groups are executed in different threads. Since different threads may write into the same output array, there is an additional reduction step after all GEMM finishes. This mode is mainly implemented for Davidson matrix-vector step (tensor_product_multiply
), where the size of the output array (wavefunction) is small compared to that of all input arrays. For blocking/rotation step,SeqTypes::Tasked
has no effect and it is equivalent toSeqTypes::None
. Thecblas_dgemm_batch
is not used in this mode.
-
enumerator SimpleTasked
This is the same as
SeqTypes::Tasked
for the Davidson matrix-vector step, and the same asSeqTypes::Simple
for other steps.
-
enumerator None
-
struct Threading
Global information for threading schemes.
Public Functions
-
inline bool openmp_available() const
Whether openmp compiler option is set.
-
inline bool tbb_available() const
Whether tbb memory allocator is used.
-
inline bool mkl_available() const
Whether MKL math library is used.
-
inline bool blis_available() const
Whether BLIS math library is used.
-
inline bool complex_available() const
Whether complex number extension is used.
-
inline bool single_precision_available() const
Whether single precision extension is used.
-
inline bool ksymm_available() const
Whether K symmetry extension is used.
-
inline string get_mkl_version() const
Check version of the linked MKL library.
- Returns:
A version string of the linked MKL library if MKL is linked, or an empty string otherwise.
-
inline string get_mkl_threading_type() const
Return a string indicating which threaded MKL library is linked.
-
inline string get_seq_type() const
Return a string indicating which
SeqTypes
is used.
-
inline int get_thread_id() const
If inside a openMP parallel region, return the id of the current thread.
-
inline int activate_global() const
Set number of threads for a general task. Parallelism inside MKL will be deactivated for a general task.
- Returns:
Number of threads for general tasks. Returns 1 if openMP should not be used for a general task.
-
inline int activate_global_mkl() const
Set number of threads for a general task with parallelism inside MKL. Parallelism outside MKL will be deactivated.
- Returns:
Number of threads for general tasks. Returns 1 if MKL is not supported.
-
inline int activate_normal() const
Set number of threads for a normal (parallelism over renormalized operators) task.
- Returns:
Number of threads for parallelism over renormalized operators.
-
inline int activate_operator() const
Set number of threads for parallelism over renormalized operators.
- Returns:
Number of threads for parallelism over renormalized operators.
-
inline int activate_quanta() const
Set number of threads for parallelism over symmetry sectors.
- Returns:
Number of threads for parallelism over symmetry sectors.
-
inline Threading()
Default constructor. Uses
ThreadingTypes::Global | ThreadingTypes::BatchedGEMM
with maximal available number of threads, andSeqTypes::None
for dense matrix multiplication.
-
inline Threading(ThreadingTypes type, int nta = -1, int ntb = -1, int ntc = -1, int ntd = -1)
Constructor.
- Parameters:
type – Type of the threading scheme.
nta – Number of threads for a general task (if
ThreadingTypes::Global
is set) or number of threads in the first threading layer.ntb – Number of threads in the first threading layer for a non-general threaded task (if
ThreadingTypes::Global
is set) or number of threads in the second threading layer.ntc – Number of threads in the second threading layer for a non-general threaded task (if
ThreadingTypes::Global
is set) or number of threads in the third threading layer.ntd – Number of threads in the third threading layer for a non-general threaded task (if
ThreadingTypes::Global
is set).
Public Members
-
ThreadingTypes type
Type of the threading scheme.
-
int n_threads_op = 0
Number of threads for parallelism over renormalized operators.
-
int n_threads_quanta = 0
Number of threads for parallelism over symmetry sectors.
-
int n_threads_mkl = 0
Number of threads for parallelism within dense matrix multiplications.
-
int n_threads_global = 0
Number of threads for general tasks.
-
int n_levels = 0
Number of nested threading layers.
-
inline bool openmp_available() const
-
inline shared_ptr<Threading> &block2::threading_()
Implementation of the
threading
global variable.
-
threading
Global variable containing information for shared-memory parallelism schemes and number of threads used for each threading layer.
Allocators
-
template<typename T>
struct Allocator Abstract memory allocator.
- Template Parameters:
T – The type of the element in the array.
Subclassed by block2::StackAllocator< T >, block2::VectorAllocator< T >
Public Functions
-
inline Allocator()
Default constructor.
-
virtual ~Allocator() = default
Default destructor.
-
inline virtual T *allocate(size_t n)
Allocate a length n array.
- Parameters:
n – Number of elements in the array.
- Returns:
The allocated pointer.
-
inline virtual complex<T> *complex_allocate(size_t n)
Allocate a length n complex array.
- Parameters:
n – Number of elements in the array.
- Returns:
The allocated pointer.
-
inline virtual void deallocate(void *ptr, size_t n)
Deallocate a length n array.
- Parameters:
ptr – The pointer to be deallocated.
n – Number of elements in the array.
-
inline virtual void complex_deallocate(void *ptr, size_t n)
Deallocate a length n complex array.
- Parameters:
ptr – The pointer to be deallocated.
n – Number of elements in the array.
-
template<typename T>
struct StackAllocator : public block2::Allocator<T> Stack memory allocator.
- Template Parameters:
T – The type of the element in the array.
Subclassed by block2::TemporaryAllocator< T >
Public Functions
-
inline StackAllocator(T *ptr, size_t max_size)
Constructor.
- Parameters:
ptr – Pointer to the first elemenet in the stack. The stack should be pre-allocated.
max_size – Total size of the stack (in number of elements).
-
inline StackAllocator()
Default constructor.
-
inline virtual T *allocate(size_t n) override
Allocate a length n array.
- Parameters:
n – Number of elements in the array.
- Returns:
The allocated pointer.
-
inline virtual void deallocate(void *ptr, size_t n) override
Deallocate a length n array. Must be invoked in the reverse order of allocation.
- Parameters:
ptr – The pointer to be deallocated.
n – Number of elements in the array.
-
inline virtual T *reallocate(T *ptr, size_t n, size_t new_n) override
Change the allocated size in middle of stack memory and introduce a shift for moving memory after it.
- Parameters:
ptr – The allocated pointer.
n – Number of elements in original allocation.
new_n – Number of elements in the new allocation.
- Returns:
The new pointer.
Public Members
-
size_t size
Total size of the stack (in number of elements).
-
size_t used
Occupied size of the stack (in number of elements).
-
size_t shift
Temporary shift introduced due to deallocation in the middle of the stack.
Friends
-
inline friend ostream &operator<<(ostream &os, const StackAllocator &c)
Print the status of the allocator.
- Parameters:
os – The output stream.
c – The object to be printed.
- Returns:
The output stream.
-
template<typename T>
struct VectorAllocator : public block2::Allocator<T> Vector memory allocator.
- Template Parameters:
T – The type of the element in the array.
Public Functions
-
inline VectorAllocator()
Default constructor.
-
inline virtual T *allocate(size_t n) override
Allocate a length n array.
- Parameters:
n – Number of elements in the array.
- Returns:
The allocated pointer.
-
inline virtual void deallocate(void *ptr, size_t n) override
Deallocate a length n array. Note that explicit deallocation is not required for vector allocator. Can be invoked in arbitrary order.
- Parameters:
ptr – The pointer to be deallocated.
n – Number of elements in the array.
-
inline virtual T *reallocate(T *ptr, size_t n, size_t new_n) override
Change the allocated size for one allocated block.
- Parameters:
ptr – The allocated pointer.
n – Number of elements in original allocation.
new_n – Number of elements in the new allocation.
- Returns:
The new pointer.
-
inline virtual shared_ptr<Allocator<T>> copy() const override
Return a copy of the allocator. When deep-copying objects using VectorAllocator, the other object should have an independent allocator, since VectorAllocator is not global.
- Returns:
The copy of this allocator.
Friends
-
inline friend ostream &operator<<(ostream &os, const VectorAllocator &c)
Print the status of the allocator.
- Parameters:
os – The output stream.
c – The object to be printed.
- Returns:
The output stream.
-
inline shared_ptr<StackAllocator<uint32_t>> &block2::ialloc_()
Implementation of the
ialloc
global variable.
Implementation of the
dalloc
global variable.
-
ialloc
Global variable for the integer stack memory allocator.
Data Frame
-
template<typename FL>
struct DataFrame DataFrame includes several (n_frames = 2) frames. Each frame includes one integer stack memory and one double stack memory. The two frames are used alternatively to avoid data copying.
Public Functions
-
inline DataFrame(size_t isize = 1 << 28, size_t dsize = 1 << 30, const string &save_dir = "node0", double dmain_ratio = 0.7, double imain_ratio = 0.7, int n_frames = 2)
Constructor.
- Parameters:
isize – Max size (in bytes) of all integer stacks.
dsize – Max size (in bytes) of all double stacks.
save_dir – Scartch folder for renormalized operators.
dmain_ratio – The fraction of stack space occupied by the main double stacks.
imain_ratio – The fraction of stack space occupied by the main integer stacks.
n_frames – Number of data frames.
-
inline ~DataFrame()
Destructor.
-
inline void activate(int i)
Activate one data frame.
- Parameters:
i – The index of the data frame to be activated.
-
inline void reset(int i)
Reset one data frame, marking all stack memory as unused.
- Parameters:
i – The index of the data frame to be reset.
-
inline void reset_buffer(int i)
Reset saving and loading buffers for one data frame. Contents in the loading buffer will be deleted. Unsaved contents in the saving buffer will be saved in disk.
- Parameters:
i – The index of the data frame.
-
inline void rename_data(const string &old_filename, const string &new_filename) const
Rename one scratch file.
- Parameters:
old_filename – original filename.
new_filename – new filename.
-
inline void load_data_from(int i, istream &ifs) const
Load one data frame from input stream.
- Parameters:
i – The index of the data frame.
ifs – The input stream.
-
inline void load_data(int i, const string &filename) const
Load one data frame from disk.
- Parameters:
i – The index of the data frame.
filename – The filename for the data frame.
-
inline void save_data_to(int i, ostream &ofs) const
Save one data frame into output stream.
- Parameters:
i – The index of the data frame.
ofs – The output stream.
-
inline void save_data(int i, const string &filename) const
Save one data frame to disk.
- Parameters:
i – The index of the data frame.
filename – The filename for the data frame.
-
inline void deallocate()
Deallocate the memory allocated for all stacks. Note that this method is automatically invoked at deconstruction.
-
inline size_t memory_used() const
Return the current used memory in all stacks.
- Returns:
The current used memory in Bytes.
-
inline void update_peak_used_memory() const
Update prak used memory statistics.
-
inline void reset_peak_used_memory() const
Reset prak used memory statistics to zero.
Public Members
-
string save_dir
Scartch folder for renormalized operators.
-
string mps_dir
Scartch folder for MPS (default is the same as save_dir).
-
string mpo_dir
Scartch folder for MPO (default is the same as save_dir, only used when minimal_memory_usage is true).
-
string restart_dir = ""
If not empty, save MPS to this dir after each sweep.
-
string restart_dir_per_sweep = ""
if not empty, save MPS to this dir with sweep index as suffix, so that MPS from all sweeps will be kept in individual dirs.
-
string restart_dir_optimal_mps = ""
If not empty, save MPS to this dir whenever an optimal solution is reached in one sweep. For DMRG, this is the MPS with the lowest energy. Note that if the best solution from the current sweep is worse than the best solution from the previous sweep (for example in a reverse schedule), the best solution from the current sweep is saved.
-
string restart_dir_optimal_mps_per_sweep = ""
If not empty, save the optimal MPS from each sweep to this dir with sweep index as suffix.
-
string prefix = "F"
Filename prefix for common scratch files (such as MPS tensors).
-
string prefix_distri = "F0"
Filename prefix for distributed scratch files (such as renormalized operators). When distributed parallelization is used, different procs will have different values for this data.
-
bool prefix_can_write = true
Whether this proc should be able to write common scratch files (such as MPS tensors).
-
bool partition_can_write = true
Whether this proc should be able to write renormalized operators.
-
size_t isize
Max number of elements in all integer stacks.
-
size_t dsize
Max number of elements in all double stacks.
-
int n_frames
Total number of data frames.
-
int i_frame
The index of Current activated data frame.
-
mutable double tread = 0
IO Time cost for reading scratch files.
-
mutable double twrite = 0
IO Time cost for writing scratch files.
-
mutable double tasync = 0
IO Time cost for async writing scratch files.
-
mutable double fpread = 0
IO Time cost for reading scratch files with floating-point decompression.
-
mutable double fpwrite = 0
IO Time cost for writing scratch files with floating-point compression.
-
mutable Timer _t
Temporary timer.
-
mutable Timer _t2
Auxiliary temporary timer.
-
vector<shared_ptr<StackAllocator<uint32_t>>> iallocs
Integer stacks allocators.
-
vector<shared_ptr<StackAllocator<FL>>> dallocs
Double stacks allocators.
-
mutable vector<size_t> peak_used_memory
Peak used memory by stacks (in Bytes). Even indices are for double stacks. Odd indices are for interger stacks.
-
mutable vector<string> present_filenames
The filename for the current stack memory content for each data frame. Used for tracking loading and saving buffering to avoid loading the same data into memory.
-
mutable vector<pair<string, shared_ptr<stringstream>>> load_buffers
Buffers for loading. Skpping reading a file with certain filename, if the contents of the file with that filename is in the loading buffer.
-
mutable vector<pair<string, shared_ptr<stringstream>>> save_buffers
Buffers for async saving.
-
mutable vector<shared_future<void>> save_futures
Async saving files.
-
bool load_buffering = false
Whether load buffering should be used. If true, memory usage will increase.
-
bool save_buffering = false
Whether async saving and saving buffering should be used. If true, memory usage will increase.
-
bool use_main_stack = true
Whether main stack should be used for storing blocked operators in enlarged blocks. If false, these blocked operators will be stored in dynamically allocated memory.
-
bool minimal_disk_usage = false
Whether temporary renormalized operator files should be deleted as soon as possible. If true, will save roughly half of required storage for renormalized operators.
-
bool minimal_memory_usage = false
Whether MPO should be build in minimal memory mode by saving intermediates to disk. In this mode, MPO should have different tags.
Public Static Functions
Save the data in buffer stream into disk.
- Parameters:
filename – The filename for saving data.
ss – The buffer stream.
tasync – Pointer to the time recorder for async saving.
-
inline DataFrame(size_t isize = 1 << 28, size_t dsize = 1 << 30, const string &save_dir = "node0", double dmain_ratio = 0.7, double imain_ratio = 0.7, int n_frames = 2)
Global variable for accessing global stack memory and file I/O in scratch space.
Miscellanies
-
inline auto block2::check_signal_() -> void (*&)()
Function pointer for signal checking.
-
inline void block2::print_trace()
Print calling stack when an error happens. Not working for non-unix systems.