Tools¶
Floating-Point Number Compression¶
For data like FCIDUMP
integrals, the array elements do not need to be stored in its full binary form.
Storage or memory can be saved by using flexible bit representation for numbers of different magnitudes.
The FPCodec
class implements the lossless and lossy compression and decompression of floating-point
number arrays. Lossless compression is possible when all numebrs are close in their order of magnitudes.
-
template<typename
T
>
structblock2
::
FPtraits
¶ Floating-point number representation details.
- tparam T
The floating type to implement.
-
template<>
structblock2
::
FPtraits
<float>¶ Representation details for single precision numbers.
Public Types
-
typedef uint32_t
U
¶ The representation integer type.
-
typedef uint32_t
-
template<>
structblock2
::
FPtraits
<double>¶ Representation details for double precision numbers.
Public Types
-
typedef uint64_t
U
¶ The representation integer type.
-
typedef uint64_t
-
template<typename
T
, typenameU
= typename FPtraits<T>::U, intmbits
= FPtraits<T>::mbits, intebits
= FPtraits<T>::ebits>
inline stringblock2
::
binary_repr
(T d)¶ Represent the binary form of a floating-point or integer number as a string.
- Template Parameters
T – The floating type to implement.
U – The representation integer type.
mbits – Number of bits in significand.
ebits – Number of bits in exponent.
- Parameters
d – The number to be represented.
- Returns
The binary form of the number as a string.
-
template<typename
T
, typenameU
>
structblock2
::
BitsCodec
¶ Read/write bits from/into a floating-point number array.
- tparam T
The floating type for the elemenet of the array.
- tparam U
The corresponding integer/representation type of T.
Public Functions
-
inline
BitsCodec
(T *op_data)¶ Constructor.
- Parameters
op_data – The floating-point number array for storing the bits.
-
inline void
begin_decode
()¶ Prepare for decoding.
-
template<typename
X
>
inline voidencode
(X x, int l)¶ Encode data and write into the array.
- Template Parameters
X – The type of the data.
- Parameters
x – The data to encode.
l – The number of bits of the data.
-
template<typename
X
>
inline voiddecode
(X &x, int l)¶ Read from the array and decode data.
- Template Parameters
X – The type of the data.
- Parameters
x – The output decoded data.
l – The number of bits of the data.
-
inline size_t
finish_encode
()¶ Finalize encoding.
-
template<typename
T
, typenameU
= typename FPtraits<T>::U, intmbits
= FPtraits<T>::mbits, intebits
= FPtraits<T>::ebits>
structblock2
::
FPCodec
¶ Codec for compressing/decompressing array of floating-point numbers.
- tparam T
Floating type to implement.
- tparam U
The corresponding integer type.
- tparam mbits
Number of bits in significand.
- tparam ebits
Number of bits in exponent.
Public Functions
-
inline
FPCodec
()¶ Default constructor.
-
inline
FPCodec
(T prec, size_t chunk_size)¶ Constructor.
- Parameters
prec – Floating-point number precision.
chunk_size – Length of the array elements that should be processed at one time.
-
inline size_t
decode
(T *ip_data, size_t len, T *op_data) const¶ Decode data.
- Parameters
ip_data – The compressed floating-point array.
len – Length of the original floating-point array.
op_data – Output array for storing original data. Memory should be pre-allocated with length = len.
- Returns
Length of the array for the compressed data.
-
inline size_t
encode
(T *ip_data, size_t len, T *op_data) const¶ Encode data.
- Parameters
ip_data – The original floating-point array.
len – Length of the original floating-point array.
op_data – Output array for storing compressed data. Memory should be pre-allocated with length >= len + 1.
- Returns
Length of the array for the compressed data.
-
inline void
write_array
(ostream &ofs, T *data, size_t len) const¶ Compress array of floating-point data and write into file stream.
- Parameters
ofs – Output stream.
data – The original floating-point array.
len – The length of the original floating-point array.
-
inline void
read_array
(istream &ifs, T *data, size_t len) const¶ Read from file stream and deompress the data.
- Parameters
ifs – Input stream.
data – The floating-point array for storing the original data.
len – The length of the original floating-point array.
-
inline void
read_chunks
(istream &ifs, size_t len, vector<vector<T>> &chunks, size_t &chunk_size) const¶ Read from file stream (but not decompress the data).
- Parameters
ifs – Input stream.
len – The length of the original floating-point array.
chunks – For storing chunks of the compressed data.
chunk_size – Number of original array elements in each chunk (output).
Public Members
-
mutable size_t
ndata
= 0¶ Length of the array of the data that has been compressed.
-
mutable size_t
ncpsd
= 0¶ Length of the array for the compressed data.
-
size_t
chunk_size
= 4096¶ Length of the array elements that should be processed at one time.
-
size_t
n_parallel_chunks
= 4096¶ Number of chunks to be processed in the same batch.
-
template<typename
T
>
structblock2
::
CompressedVector
¶ An array with data stored in compressed form. Only support single thread.
- tparam T
Elemenet type.
Subclassed by block2::CompressedVectorMT< T >
Public Functions
-
inline
CompressedVector
(size_t arr_len, T prec, size_t chunk_size, int ncache = 4)¶ Constructor.
- Parameters
arr_len – Size of the array.
prec – Precision for compression.
chunk_size – Number of array elements in each chunk.
ncache – Number of cached decompressed chunks.
-
inline
CompressedVector
(istream &ifs, size_t arr_len, T prec, int ncache = 4)¶ Constructor.
- Parameters
ifs – Input stream, from which the compressed data is obtained.
arr_len – Size of the array.
prec – Precision for compression.
ncache – Number of cached decompressed chunks.
-
inline
CompressedVector
(T *arr, size_t arr_len, T prec, int ncache = 4)¶ Constructor.
- Parameters
arr – The compressed data as an array.
arr_len – Size of the array.
prec – Precision for compression.
ncache – Number of cached decompressed chunks.
-
virtual
~CompressedVector
() = default¶ Deconstructor.
-
inline void
clear
()¶ Set all array elements to zero.
-
inline void
shrink_to_fit
()¶ Minimize the storage cost of the compressed data (after the data has been changed).
-
inline void
finalize
()¶ Write all cached data into compressed form.
-
inline virtual T &
operator[]
(size_t i)¶ Write one element into the array.
- Parameters
i – Array index.
- Returns
The array elemenet.
-
inline virtual T
operator[]
(size_t i) const¶ Read one element from the array.
- Parameters
i – Array index.
- Returns
The array elemenet.
-
inline size_t
size
() const¶ Get the size of the array.
Public Members
-
size_t
arr_len
¶ Size of the array.
-
size_t
chunk_size
¶ Number of array elements in each chunk.
-
int
ncache
¶ Number of cached decompressed chunks.
-
mutable int
icache
¶ Index of next available cache (head of the circular queue).
-
mutable vector<bool>
cache_dirty
¶ Whether each cached chunk has been changed.
-
template<typename
T
>
structblock2
::
CompressedVectorMT
: public block2::CompressedVector<T>¶ A read-only array with data stored in compressed form, with multi-threading support.
- tparam T
Elemenet type.
Public Functions
Constructor.
- Parameters
ref_cv – Pointer to the single-thread compressed array.
ntg – Number of threads.
-
inline virtual T &
operator[]
(size_t i)¶ Write one element into the array (not supported, will cause assertion failure).
- Parameters
i – Array index.
- Returns
The array elemenet.
-
inline virtual T
operator[]
(size_t i) const¶ Read one element from the array.
- Parameters
i – Array index.
- Returns
The array elemenet.
-
inline size_t
size
() const¶ Get the size of the array.
Public Members
-
mutable vector<int>
icaches
¶ Index of next available cache (head of the circular queue) for each thread.
-
mutable vector<vector<pair<size_t, vector<T>>>>
cache_datas
¶ Cached data of decompressed chunks for each thread.
-
shared_ptr<CompressedVector<T>>
ref_cv
¶ Pointer to the single-thread compressed array.
Kuhn-Munkres Algorithm¶
The KuhnMunkres
class implements the Kuhn-Munkres algorithm for
finding the best matching in a bipartite with the lowest cost.
-
struct
block2
::
KuhnMunkres
¶ Kuhn-Munkres algorithm for finding the best matching with lowest cost. Complexity: O(n^3).
Public Functions
-
inline
KuhnMunkres
(const vector<double> &cost_matrix, int m, int n = 0)¶ Constructor.
- Parameters
cost_matrix – Flattened matrix of cost.
m – Number of rows.
n – Number of columns.
-
inline bool
match
(vector<int> &x, int u)¶ Find an augmenting alternating path in the equality subgraph. If an equality subgraph has a perfect matching, the it is a maximum-weight matching in the graph.
- Parameters
x – Current matching.
u – Starting vertex.
- Returns
true
if an augmenting alternating path is found.
-
inline pair<double, vector<int>>
solve
()¶ Find the lowest cost and a matching.
- Returns
the lowest cost and an index array x, with x[i] = j meaning that column index i should be matched to row index j.
Public Members
-
vector<double>
cost
¶ Flattened n x n matrix of cost.
-
int
n
¶ Number of rows or columns.
-
double
inf
¶ Infinity (constant).
-
double
eps
¶ Machine precision (constant).
-
vector<double>
lx
¶ Left feasible labeling (working array).
-
vector<double>
ly
¶ Right feasible labeling (working array).
-
vector<double>
slack
¶ Slack working array.
-
vector<char>
st
¶ Candidate augmenting alternating path (working array).
-
inline
Number Theory Algorithms¶
The Prime
class includes some number theory algorithms necessary for
integer factorization and finding primitive roots, which is used in some
Fast Fourier Transform algorithms.
-
struct
block2
::
Prime
¶ Number theory algorithms for prime numbers and primitive root.
Public Functions
-
inline
Prime
()¶ Default constructor.
-
inline void
init_primes
(int np = 50000)¶ Initialize set of small primes, using sieve of Eratosthenes.
- Parameters
np – Maximal number considered for finding primes.
-
inline void
factors
(LL x, vector<pair<LL, int>> &pp)¶ Integer factorization.
- Parameters
x – Integer x.
pp – (output) Set of prime factors and their occurrence.
-
inline LL
euler
(LL n)¶ Euler’s totient function.
- Parameters
n – Integer n (not necessarily prime).
- Returns
phi(n), which counts the positive integers up to the given integer n that are relatively prime to n.
-
inline bool
is_prime
(LL n)¶ Prime testing.
- Parameters
n – Integer n to be tested (n < 2^63 - 1).
- Returns
true
if n is a prime,false
if n is not a prime.
-
inline int
primitive_root
(LL p)¶ Find one of primitive roots modulo n. A number g is a primitive root modulo n if every number a coprime to n is congruent to a power of g modulo n.
- Parameters
p – Modulus p.
- Returns
a primitive root g.
-
inline void
primitive_roots
(LL p, vector<LL> &gg)¶ Find all primitive roots modulo n.
- Parameters
p – Modulus p.
gg – (output) All primitive roots modulo n.
Public Members
-
int
np
¶ Maximal number considered for generation of set of small primes.
-
vector<int>
primes
¶ Set of small primes.
Public Static Functions
-
static inline int
pmod
(LL x, LL n)¶ Return positive x mod n.
-
static inline void
exgcd
(LL a, LL b, LL &d, LL &x, LL &y)¶ Extended Euclidean algorithm. Find integers x and y such that a x + b y = gcd(a, b).
- Parameters
a – Integer a.
b – Integer b.
d – (output) Greatest Common Divisor gcd(a, b).
x – (output) Integer x.
y – (output) Integer y.
-
static inline LL
gcd
(LL a, LL b)¶ Euclidean algorithm. Find Greatest Common Divisor gcd(a, b).
- Parameters
a – Integer a.
b – Integer b.
- Returns
Greatest Common Divisor gcd(a, b).
-
static inline LL
inv
(LL a, LL n)¶ Find modular multiplicative inverse of a (mod n). Using extended Euclidean algorithm.
- Parameters
a – Integer a.
n – Modulus n.
- Returns
a^{-1} (mod n) if it exists, otherwise -1.
-
static inline LL
power
(LL n, int i)¶ Find n to the power of i using binary algorithm.
- Parameters
n – Integer n.
i – Integer i (i >= 0).
- Returns
n^i.
-
static inline LL
sqrt
(LL x)¶ Find the largest number r such that r * r <= x. Using binary algorithm.
- Parameters
x – Integer x.
- Returns
floor(sqrt(x)).
-
static inline LL
quick_multiply
(LL x, LL y, LL p)¶ Find (x * y) mod p. Note that the expression x * y % p may overflow if the intermediate x * y > 2^63 - 1. This function will not overflow (if p <= 2^62 - 1)
- Parameters
x – Integer x.
y – Integer y.
p – Modulus p.
- Returns
(x * y) mod p.
-
static inline LL
quick_power
(LL n, LL i, LL p)¶ Find (n ^ i) mod p using binary algorithm.
- Parameters
n – Integer n.
i – Integer i (i >= 0).
p – Modulus p.
- Returns
(n ^ i) mod p.
-
static inline bool
miller_rabin
(LL a, LL n)¶ Miller-Rabin primality test.
Note
If n < 3215031751, it is enough to test a = 2, 3, 5, and 7; if n < 341550071728321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.
- Parameters
a – Base number a.
n – The number to be tested for prime (n >= 3).
- Returns
true
if n is likely to be a prime.false
if n is not a prime.
-
static inline LL
pollard_rho
(LL n)¶ Pollard’s rho algorithm. Find a factor of integer n.
- Parameters
n – Integer n.
- Returns
A (not necessarily prime) factor of n.
-
inline
Fast Fourier Transform (FFT)¶
A collection of FFT algorithms is implemented here. The implementation focuses more on flexibility and readability, rather than optimal performance (but the performance should be acceptable).
The block2
implementation is faster than numpy
implementation for some special
array lengths, such as 5929741 = 181 ** 3
and 5929742 = 2 * 7 * 13 * 31 * 1051
.
-
template<int
base
>
structblock2
::
BasicFFT
¶ Fast Fourier Transform (FFT) with array length of base^k (k >= 0). The complexity is O(n log_base n * base^2).
- tparam base
The radix (base = 2 is specialized)
Public Functions
-
inline
BasicFFT
()¶ Constructor.
-
inline void
init
(size_t n)¶ Precompute for array length n for both forward and backward FFT.
- Parameters
n – The array length, must be a power of
base
.
-
inline void
fft
(complex<double> *arr, size_t n, bool forth)¶ Perform inplace FFT.
- Parameters
arr – A pointer to the array of complex numbers.
n – Number of elements in the array, must be a power of
base
.forth – Whether this is forward transform.
Public Members
-
vector<size_t>
r
¶ The index permutation array.
-
vector<complex<double>>
wb
¶ The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.
-
vector<complex<double>>
wf
¶ The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.
Public Static Functions
-
static inline size_t
pad
(size_t n)¶ Find smallest number x such that x = base^k >= n.
- Parameters
n – The array length.
- Returns
The padded array length.
-
template<>
structblock2
::
BasicFFT
<2>¶ Radix-2 Fast Fourier Transform (FFT) with complexity O(nlog n).
Public Functions
-
inline
BasicFFT
()¶ Constructor.
-
inline void
init
(size_t n)¶ Precompute for array length n for both forward and backward FFT.
- Parameters
n – The array length must be a power of 2.
-
inline void
fft
(complex<double> *arr, size_t n, bool forth)¶ Perform inplace FFT.
- Parameters
arr – A pointer to the array of complex numbers.
n – Number of elements in the array, must be a power of 2.
forth – Whether this is forward transform.
Public Members
-
vector<size_t>
r
¶ The index permutation array.
-
vector<complex<double>>
wb
¶ The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.
-
vector<complex<double>>
wf
¶ The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.
Public Static Functions
-
static inline size_t
pad
(size_t n)¶ Find smallest number x such that x = 2^k >= n.
- Parameters
n – The array length.
- Returns
The padded array length.
-
inline
-
template<typename
B
= BasicFFT<2>>
structblock2
::
RaderFFT
¶ Rader’s FFT algorithm for prime array length. This algorithm transforms a FFT of length n to two (forward and backward) FFTs of length m, where m is the padded array length for 2 * n - 3 for the backend FFT.
- tparam B
The backend FFT for computing the padded FFT.
Public Functions
-
inline
RaderFFT
()¶ Default constructor.
Constructor.
- Parameters
prime – Instance for prime number algorithms.
-
inline void
init
(size_t n)¶ Precompute for array length n for both forward and backward FFT.
- Parameters
n – The array length, which must be a prime number.
-
inline void
fft
(complex<double> *arr, size_t n, bool forth)¶ Perform inplace FFT.
- Parameters
arr – A pointer to the array of complex numbers.
n – Number of elements in the array, which must be a prime number.
forth – Whether this is forward transform.
Public Members
-
vector<LL>
wb
¶ Precomputed inverse of the power of primitive root g^{-k} mod n for k = 0, 1, …, n - 1.
-
vector<LL>
wf
¶ Precomputed power of primitive root g^k mod n for k = 0, 1, …, n - 1.
-
vector<complex<double>>
cb
¶ FFT transformed exp(i2pi g^{-k}/n).
-
vector<complex<double>>
cf
¶ FFT transformed exp(-i2pi g^{-k}/n).
-
vector<complex<double>>
arx
¶ Working space for padded array.
-
size_t
nn
¶ Padded array length.
-
template<typename
B
= BasicFFT<2>>
structblock2
::
BluesteinFFT
¶ Bluestein’s FFT algorithm for arbitrary array length. This algorithm transforms a FFT of length n to two (forward and backward) FFTs of length m, where m is the padded array length for 2 * n for the backend FFT.
- tparam B
The backend FFT for computing the padded FFT.
Public Functions
-
inline
BluesteinFFT
()¶ Default constructor.
Constructor.
- Parameters
prime – Instance for prime number algorithms (ignored).
-
inline void
init
(size_t n)¶ Precompute for array length n for both forward and backward FFT.
- Parameters
n – The array length.
-
inline void
fft
(complex<double> *arr, size_t n, bool forth)¶ Perform inplace FFT.
- Parameters
arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
Public Members
-
vector<complex<double>>
wb
¶ The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.
-
vector<complex<double>>
wf
¶ The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.
-
vector<complex<double>>
cb
¶ FFT transformed exp(i2pi [((k - n) - (k - n)(k - n)) / 2]/n).
-
vector<complex<double>>
cf
¶ FFT transformed exp(-i2pi [((k - n) - (k - n)(k - n)) / 2]/n).
-
vector<complex<double>>
arx
¶ Working space for padded array.
-
size_t
nn
¶ Padded array length.
-
struct
block2
::
DFT
¶ Naive Discrete Fourier Transform (DFT) algorithm with complexity O(n^2).
Public Functions
-
inline
DFT
()¶ Default constructor.
Constructor.
- Parameters
prime – Instance for prime number algorithms (ignored).
-
inline void
init
(size_t n)¶ Precompute for array length n for both forward and backward FFT. For DFT this method does nothing.
- Parameters
n – The array length.
-
inline void
fft
(complex<double> *arr, size_t n, bool forth)¶ Perform inplace DFT.
Forward DFT: \(X[k] = \sum_{j=0}^{n - 1} x[j] \exp(-2 \pi \mathrm{i} jk/n)\)
Backward DFT: \(X[k] = \frac{1}{n} \sum_{j=0}^{n - 1} x[j] \exp(2 \pi \mathrm{i} jk/n)\)
- Parameters
arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
-
inline
-
template<typename
F
, intP
, int...Q
>
structblock2
::
FactorizedFFT
¶ FFT algorithm using different radix FFT backends. The array length is first factorized, then different FFT backends will be used and then the results is merged using the Cooley-Tukey FFT algorithm.
- tparam F
The prime number FFT backend.
- tparam P
Using Radix-P FFT backend.
- tparam Q
Using Radix-(Q1, Q2, …) FFT backend.
Public Functions
-
inline
FactorizedFFT
()¶ Default constructor.
-
inline
FactorizedFFT
(int max_factor)¶ Constructor.
- Parameters
max_factor – Maximal radix that should be checked for radix based FFT.
-
inline void
fft_internal
(complex<double> *arr, size_t p, size_t q, bool forth, int b) override¶ Perform independent FFTs for p arrays, each with length q.
- Parameters
arr – A pointer to the array of complex numbers (as a matrix).
p – Number of rows (FFTs).
q – Number of columns (length of each FFT).
forth – Whether this is forward transform.
b – Radix. Zero if radix based FFT should not be used.
-
template<typename
F
, intP
>
structblock2
::
FactorizedFFT
<F, P>¶ FFT algorithm using different radix FFT backends. The array length is first factorized, then different FFT backends will be used and then the results is merged using the Cooley-Tukey FFT algorithm.
- tparam F
The prime number FFT backend.
- tparam P
Using Radix-P FFT backend.
Public Functions
-
inline
FactorizedFFT
()¶ Default constructor.
-
inline
FactorizedFFT
(int max_factor)¶ Constructor.
- Parameters
max_factor – Maximal radix that should be checked for radix based FFT.
-
inline void
init
(size_t n)¶ Precompute for array length n for both forward and backward FFT. For FactorizedFFT this method does nothing.
- Parameters
n – The array length.
-
inline virtual void
fft_internal
(complex<double> *arr, size_t p, size_t q, bool forth, int b)¶ Perform independent FFTs for p arrays, each with length q.
- Parameters
arr – A pointer to the array of complex numbers (as a matrix).
p – Number of rows (FFTs).
q – Number of columns (length of each FFT).
forth – Whether this is forward transform.
b – Radix. Zero if radix based FFT should not be used.
-
inline void
cooley_tukey
(complex<double> *arr, size_t n, bool forth, const size_t *pr, const int *b, size_t np)¶ Cooley-Tukey FFT algorithm for FFT with array length being a composite number.
- Parameters
arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
pr – A pointer to the array of factors in n;
b – A pointer to the array of radices for each factor. pr should be multiple of b, if b is not zero.
np – Number of factors.
-
inline void
fft
(complex<double> *arr, size_t n, bool forth)¶ Perform inplace FFT.
Forward FFT: \(X[k] = \sum_{j=0}^{n - 1} x[j] \exp(-2 \pi \mathrm{i} jk/n)\)
Backward FFT: \(X[k] = \frac{1}{n} \sum_{j=0}^{n - 1} x[j] \exp(2 \pi \mathrm{i} jk/n)\)
- Parameters
arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
-
typedef FactorizedFFT<RaderFFT<>, 2, 3, 5, 7, 11>
block2
::
FFT2
¶ FFT with small prime factorization implemeted using Rader’s algorithm.
-
typedef FactorizedFFT<BluesteinFFT<>, 2, 3, 5, 7, 11>
block2
::
FFT
¶ FFT with small prime factorization implemeted using Bluestein’s algorithm.