Tools
Floating-Point Number Compression
For data like FCIDUMP
integrals, the array elements do not need to be stored in its full binary form.
Storage or memory can be saved by using flexible bit representation for numbers of different magnitudes.
The FPCodec
class implements the lossless and lossy compression and decompression of floating-point
number arrays. Lossless compression is possible when all numebrs are close in their order of magnitudes.
-
template<typename T>
struct FPtraits Floating-point number representation details.
- Template Parameters:
T – The floating type to implement.
-
template<>
struct FPtraits<float> Representation details for single precision numbers.
Public Types
-
typedef uint32_t U
The representation integer type.
-
typedef uint32_t U
-
template<>
struct FPtraits<double> Representation details for double precision numbers.
Public Types
-
typedef uint64_t U
The representation integer type.
-
typedef uint64_t U
-
template<typename T, typename U = typename FPtraits<T>::U, int mbits = FPtraits<T>::mbits, int ebits = FPtraits<T>::ebits>
inline string block2::binary_repr(T d) Represent the binary form of a floating-point or integer number as a string.
- Template Parameters:
T – The floating type to implement.
U – The representation integer type.
mbits – Number of bits in significand.
ebits – Number of bits in exponent.
- Parameters:
d – The number to be represented.
- Returns:
The binary form of the number as a string.
-
template<typename T, typename U>
struct BitsCodec Read/write bits from/into a floating-point number array.
- Template Parameters:
T – The floating type for the elemenet of the array.
U – The corresponding integer/representation type of T.
Public Functions
-
inline BitsCodec(T *op_data)
Constructor.
- Parameters:
op_data – The floating-point number array for storing the bits.
-
inline void begin_decode()
Prepare for decoding.
-
template<typename X>
inline void encode(X x, int l) Encode data and write into the array.
- Template Parameters:
X – The type of the data.
- Parameters:
x – The data to encode.
l – The number of bits of the data.
-
template<typename X>
inline void decode(X &x, int l) Read from the array and decode data.
- Template Parameters:
X – The type of the data.
- Parameters:
x – The output decoded data.
l – The number of bits of the data.
-
inline size_t finish_encode()
Finalize encoding.
-
template<typename T, typename U = typename FPtraits<T>::U, int mbits = FPtraits<T>::mbits, int ebits = FPtraits<T>::ebits>
struct FPCodec Codec for compressing/decompressing array of floating-point numbers.
- Template Parameters:
T – Floating type to implement.
U – The corresponding integer type.
mbits – Number of bits in significand.
ebits – Number of bits in exponent.
Public Functions
-
inline FPCodec()
Default constructor.
-
inline FPCodec(T prec, size_t chunk_size)
Constructor.
- Parameters:
prec – Floating-point number precision.
chunk_size – Length of the array elements that should be processed at one time.
-
inline size_t decode(T *ip_data, size_t len, T *op_data) const
Decode data.
- Parameters:
ip_data – The compressed floating-point array.
len – Length of the original floating-point array.
op_data – Output array for storing original data. Memory should be pre-allocated with length = len.
- Returns:
Length of the array for the compressed data.
-
inline size_t encode(T *ip_data, size_t len, T *op_data) const
Encode data.
- Parameters:
ip_data – The original floating-point array.
len – Length of the original floating-point array.
op_data – Output array for storing compressed data. Memory should be pre-allocated with length >= len + 1.
- Returns:
Length of the array for the compressed data.
-
inline void write_array(ostream &ofs, T *data, size_t len) const
Compress array of floating-point data and write into file stream.
- Parameters:
ofs – Output stream.
data – The original floating-point array.
len – The length of the original floating-point array.
-
inline void read_array(istream &ifs, T *data, size_t len) const
Read from file stream and deompress the data.
- Parameters:
ifs – Input stream.
data – The floating-point array for storing the original data.
len – The length of the original floating-point array.
-
inline void read_chunks(istream &ifs, size_t len, vector<vector<T>> &chunks, size_t &chunk_size) const
Read from file stream (but not decompress the data).
- Parameters:
ifs – Input stream.
len – The length of the original floating-point array.
chunks – For storing chunks of the compressed data.
chunk_size – Number of original array elements in each chunk (output).
Public Members
-
mutable size_t ndata = 0
Length of the array of the data that has been compressed.
-
mutable size_t ncpsd = 0
Length of the array for the compressed data.
-
size_t chunk_size = 4096
Length of the array elements that should be processed at one time.
-
size_t n_parallel_chunks = 4096
Number of chunks to be processed in the same batch.
-
template<typename T>
struct CompressedVector An array with data stored in compressed form. Only support single thread.
- Template Parameters:
T – Elemenet type.
Subclassed by block2::CompressedVectorMT< T >
Public Functions
-
inline CompressedVector(size_t arr_len, T prec, size_t chunk_size, int ncache = 4)
Constructor.
- Parameters:
arr_len – Size of the array.
prec – Precision for compression.
chunk_size – Number of array elements in each chunk.
ncache – Number of cached decompressed chunks.
-
inline CompressedVector(istream &ifs, size_t arr_len, T prec, int ncache = 4)
Constructor.
- Parameters:
ifs – Input stream, from which the compressed data is obtained.
arr_len – Size of the array.
prec – Precision for compression.
ncache – Number of cached decompressed chunks.
-
inline CompressedVector(T *arr, size_t arr_len, T prec, int ncache = 4)
Constructor.
- Parameters:
arr – The compressed data as an array.
arr_len – Size of the array.
prec – Precision for compression.
ncache – Number of cached decompressed chunks.
-
virtual ~CompressedVector() = default
Deconstructor.
-
inline void clear()
Set all array elements to zero.
-
inline void shrink_to_fit()
Minimize the storage cost of the compressed data (after the data has been changed).
-
inline void finalize()
Write all cached data into compressed form.
-
inline virtual T &operator[](size_t i)
Write one element into the array.
- Parameters:
i – Array index.
- Returns:
The array elemenet.
-
inline virtual T operator[](size_t i) const
Read one element from the array.
- Parameters:
i – Array index.
- Returns:
The array elemenet.
-
inline size_t size() const
Get the size of the array.
Public Members
-
size_t arr_len
Size of the array.
-
size_t chunk_size
Number of array elements in each chunk.
-
int ncache
Number of cached decompressed chunks.
-
mutable int icache
Index of next available cache (head of the circular queue).
-
mutable vector<bool> cache_dirty
Whether each cached chunk has been changed.
-
template<typename T>
struct CompressedVectorMT : public block2::CompressedVector<T> A read-only array with data stored in compressed form, with multi-threading support.
- Template Parameters:
T – Elemenet type.
Public Functions
Constructor.
- Parameters:
ref_cv – Pointer to the single-thread compressed array.
ntg – Number of threads.
-
inline virtual T &operator[](size_t i)
Write one element into the array (not supported, will cause assertion failure).
- Parameters:
i – Array index.
- Returns:
The array elemenet.
-
inline virtual T operator[](size_t i) const
Read one element from the array.
- Parameters:
i – Array index.
- Returns:
The array elemenet.
-
inline size_t size() const
Get the size of the array.
Public Members
-
mutable vector<int> icaches
Index of next available cache (head of the circular queue) for each thread.
-
mutable vector<vector<pair<size_t, vector<T>>>> cache_datas
Cached data of decompressed chunks for each thread.
-
shared_ptr<CompressedVector<T>> ref_cv
Pointer to the single-thread compressed array.
Kuhn-Munkres Algorithm
The KuhnMunkres
class implements the Kuhn-Munkres algorithm for
finding the best matching in a bipartite with the lowest cost.
-
struct KuhnMunkres
Kuhn-Munkres algorithm for finding the best matching with lowest cost. Complexity: O(n^3).
Public Functions
-
inline KuhnMunkres(const vector<double> &cost_matrix, int m, int n = 0)
Constructor.
- Parameters:
cost_matrix – Flattened matrix of cost.
m – Number of rows.
n – Number of columns.
-
inline bool match(vector<int> &x, int u)
Find an augmenting alternating path in the equality subgraph. If an equality subgraph has a perfect matching, the it is a maximum-weight matching in the graph.
- Parameters:
x – Current matching.
u – Starting vertex.
- Returns:
true
if an augmenting alternating path is found.
-
inline pair<double, vector<int>> solve()
Find the lowest cost and a matching.
- Returns:
the lowest cost and an index array x, with x[i] = j meaning that column index i should be matched to row index j.
Public Members
-
vector<double> cost
Flattened n x n matrix of cost.
-
int n
Number of rows or columns.
-
double inf
Infinity (constant).
-
double eps
Machine precision (constant).
-
vector<double> lx
Left feasible labeling (working array).
-
vector<double> ly
Right feasible labeling (working array).
-
vector<double> slack
Slack working array.
-
vector<char> st
Candidate augmenting alternating path (working array).
-
inline KuhnMunkres(const vector<double> &cost_matrix, int m, int n = 0)
Number Theory Algorithms
The Prime
class includes some number theory algorithms necessary for
integer factorization and finding primitive roots, which is used in some
Fast Fourier Transform algorithms.
-
struct Prime
Number theory algorithms for prime numbers and primitive root.
Public Functions
-
inline Prime()
Default constructor.
-
inline void init_primes(int np = 50000)
Initialize set of small primes, using sieve of Eratosthenes.
- Parameters:
np – Maximal number considered for finding primes.
-
inline void factors(LL x, vector<pair<LL, int>> &pp)
Integer factorization.
- Parameters:
x – Integer x.
pp – (output) Set of prime factors and their occurrence.
-
inline LL euler(LL n)
Euler’s totient function.
- Parameters:
n – Integer n (not necessarily prime).
- Returns:
phi(n), which counts the positive integers up to the given integer n that are relatively prime to n.
-
inline bool is_prime(LL n)
Prime testing.
- Parameters:
n – Integer n to be tested (n < 2^63 - 1).
- Returns:
true
if n is a prime,false
if n is not a prime.
-
inline int primitive_root(LL p)
Find one of primitive roots modulo n. A number g is a primitive root modulo n if every number a coprime to n is congruent to a power of g modulo n.
- Parameters:
p – Modulus p.
- Returns:
a primitive root g.
-
inline void primitive_roots(LL p, vector<LL> &gg)
Find all primitive roots modulo n.
- Parameters:
p – Modulus p.
gg – (output) All primitive roots modulo n.
Public Members
-
int np
Maximal number considered for generation of set of small primes.
-
vector<int> primes
Set of small primes.
Public Static Functions
-
static inline int pmod(LL x, LL n)
Return positive x mod n.
-
static inline void exgcd(LL a, LL b, LL &d, LL &x, LL &y)
Extended Euclidean algorithm. Find integers x and y such that a x + b y = gcd(a, b).
- Parameters:
a – Integer a.
b – Integer b.
d – (output) Greatest Common Divisor gcd(a, b).
x – (output) Integer x.
y – (output) Integer y.
-
static inline LL gcd(LL a, LL b)
Euclidean algorithm. Find Greatest Common Divisor gcd(a, b).
- Parameters:
a – Integer a.
b – Integer b.
- Returns:
Greatest Common Divisor gcd(a, b).
-
static inline LL inv(LL a, LL n)
Find modular multiplicative inverse of a (mod n). Using extended Euclidean algorithm.
- Parameters:
a – Integer a.
n – Modulus n.
- Returns:
a^{-1} (mod n) if it exists, otherwise -1.
-
static inline LL power(LL n, int i)
Find n to the power of i using binary algorithm.
- Parameters:
n – Integer n.
i – Integer i (i >= 0).
- Returns:
n^i.
-
static inline LL sqrt(LL x)
Find the largest number r such that r * r <= x. Using binary algorithm.
- Parameters:
x – Integer x.
- Returns:
floor(sqrt(x)).
-
static inline LL quick_multiply(LL x, LL y, LL p)
Find (x * y) mod p. Note that the expression x * y % p may overflow if the intermediate x * y > 2^63 - 1. This function will not overflow (if p <= 2^62 - 1)
- Parameters:
x – Integer x.
y – Integer y.
p – Modulus p.
- Returns:
(x * y) mod p.
-
static inline LL quick_power(LL n, LL i, LL p)
Find (n ^ i) mod p using binary algorithm.
- Parameters:
n – Integer n.
i – Integer i (i >= 0).
p – Modulus p.
- Returns:
(n ^ i) mod p.
-
static inline bool miller_rabin(LL a, LL n)
Miller-Rabin primality test.
Note
If n < 3215031751, it is enough to test a = 2, 3, 5, and 7; if n < 341550071728321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.
- Parameters:
a – Base number a.
n – The number to be tested for prime (n >= 3).
- Returns:
true
if n is likely to be a prime.false
if n is not a prime.
-
static inline LL pollard_rho(LL n)
Pollard’s rho algorithm. Find a factor of integer n.
- Parameters:
n – Integer n.
- Returns:
A (not necessarily prime) factor of n.
-
inline Prime()
Fast Fourier Transform (FFT)
A collection of FFT algorithms is implemented here. The implementation focuses more on flexibility and readability, rather than optimal performance (but the performance should be acceptable).
The block2
implementation is faster than numpy
implementation for some special
array lengths, such as 5929741 = 181 ** 3
and 5929742 = 2 * 7 * 13 * 31 * 1051
.
-
template<int base>
struct BasicFFT Fast Fourier Transform (FFT) with array length of base^k (k >= 0). The complexity is O(n log_base n * base^2).
- Template Parameters:
base – The radix (base = 2 is specialized)
Public Functions
-
inline BasicFFT()
Constructor.
-
inline void init(size_t n)
Precompute for array length n for both forward and backward FFT.
- Parameters:
n – The array length, must be a power of
base
.
-
inline void fft(complex<double> *arr, size_t n, bool forth)
Perform inplace FFT.
- Parameters:
arr – A pointer to the array of complex numbers.
n – Number of elements in the array, must be a power of
base
.forth – Whether this is forward transform.
Public Members
-
vector<size_t> r
The index permutation array.
-
vector<complex<double>> wb
The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.
-
vector<complex<double>> wf
The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.
Public Static Functions
-
static inline size_t pad(size_t n)
Find smallest number x such that x = base^k >= n.
- Parameters:
n – The array length.
- Returns:
The padded array length.
-
template<>
struct BasicFFT<2> Radix-2 Fast Fourier Transform (FFT) with complexity O(nlog n).
Public Functions
-
inline BasicFFT()
Constructor.
-
inline void init(size_t n)
Precompute for array length n for both forward and backward FFT.
- Parameters:
n – The array length must be a power of 2.
-
inline void fft(complex<double> *arr, size_t n, bool forth)
Perform inplace FFT.
- Parameters:
arr – A pointer to the array of complex numbers.
n – Number of elements in the array, must be a power of 2.
forth – Whether this is forward transform.
Public Members
-
vector<size_t> r
The index permutation array.
-
vector<complex<double>> wb
The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.
-
vector<complex<double>> wf
The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.
Public Static Functions
-
static inline size_t pad(size_t n)
Find smallest number x such that x = 2^k >= n.
- Parameters:
n – The array length.
- Returns:
The padded array length.
-
inline BasicFFT()
-
template<typename B = BasicFFT<2>>
struct RaderFFT Rader’s FFT algorithm for prime array length. This algorithm transforms a FFT of length n to two (forward and backward) FFTs of length m, where m is the padded array length for 2 * n - 3 for the backend FFT.
- Template Parameters:
B – The backend FFT for computing the padded FFT.
Public Functions
-
inline RaderFFT()
Default constructor.
Constructor.
- Parameters:
prime – Instance for prime number algorithms.
-
inline void init(size_t n)
Precompute for array length n for both forward and backward FFT.
- Parameters:
n – The array length, which must be a prime number.
-
inline void fft(complex<double> *arr, size_t n, bool forth)
Perform inplace FFT.
- Parameters:
arr – A pointer to the array of complex numbers.
n – Number of elements in the array, which must be a prime number.
forth – Whether this is forward transform.
Public Members
-
vector<LL> wb
Precomputed inverse of the power of primitive root g^{-k} mod n for k = 0, 1, …, n - 1.
-
vector<LL> wf
Precomputed power of primitive root g^k mod n for k = 0, 1, …, n - 1.
-
vector<complex<double>> cb
FFT transformed exp(i2pi g^{-k}/n).
-
vector<complex<double>> cf
FFT transformed exp(-i2pi g^{-k}/n).
-
vector<complex<double>> arx
Working space for padded array.
-
size_t nn
Padded array length.
-
template<typename B = BasicFFT<2>>
struct BluesteinFFT Bluestein’s FFT algorithm for arbitrary array length. This algorithm transforms a FFT of length n to two (forward and backward) FFTs of length m, where m is the padded array length for 2 * n for the backend FFT.
- Template Parameters:
B – The backend FFT for computing the padded FFT.
Public Functions
-
inline BluesteinFFT()
Default constructor.
Constructor.
- Parameters:
prime – Instance for prime number algorithms (ignored).
-
inline void init(size_t n)
Precompute for array length n for both forward and backward FFT.
- Parameters:
n – The array length.
-
inline void fft(complex<double> *arr, size_t n, bool forth)
Perform inplace FFT.
- Parameters:
arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
Public Members
-
vector<complex<double>> wb
The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.
-
vector<complex<double>> wf
The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.
-
vector<complex<double>> cb
FFT transformed exp(i2pi [((k - n) - (k - n)(k - n)) / 2]/n).
-
vector<complex<double>> cf
FFT transformed exp(-i2pi [((k - n) - (k - n)(k - n)) / 2]/n).
-
vector<complex<double>> arx
Working space for padded array.
-
size_t nn
Padded array length.
-
struct DFT
Naive Discrete Fourier Transform (DFT) algorithm with complexity O(n^2).
Public Functions
-
inline DFT()
Default constructor.
Constructor.
- Parameters:
prime – Instance for prime number algorithms (ignored).
-
inline void init(size_t n)
Precompute for array length n for both forward and backward FFT. For DFT this method does nothing.
- Parameters:
n – The array length.
-
inline void fft(complex<double> *arr, size_t n, bool forth)
Perform inplace DFT.
Forward DFT: \(X[k] = \sum_{j=0}^{n - 1} x[j] \exp(-2 \pi \mathrm{i} jk/n)\)
Backward DFT: \(X[k] = \frac{1}{n} \sum_{j=0}^{n - 1} x[j] \exp(2 \pi \mathrm{i} jk/n)\)
- Parameters:
arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
-
inline DFT()
-
template<typename F, int P, int... Q>
struct FactorizedFFT FFT algorithm using different radix FFT backends. The array length is first factorized, then different FFT backends will be used and then the results is merged using the Cooley-Tukey FFT algorithm.
- Template Parameters:
F – The prime number FFT backend.
P – Using Radix-P FFT backend.
Q – Using Radix-(Q1, Q2, …) FFT backend.
Public Functions
-
inline FactorizedFFT()
Default constructor.
-
inline FactorizedFFT(int max_factor)
Constructor.
- Parameters:
max_factor – Maximal radix that should be checked for radix based FFT.
-
inline void fft_internal(complex<double> *arr, size_t p, size_t q, bool forth, int b) override
Perform independent FFTs for p arrays, each with length q.
- Parameters:
arr – A pointer to the array of complex numbers (as a matrix).
p – Number of rows (FFTs).
q – Number of columns (length of each FFT).
forth – Whether this is forward transform.
b – Radix. Zero if radix based FFT should not be used.
-
template<typename F, int P>
struct FactorizedFFT<F, P> FFT algorithm using different radix FFT backends. The array length is first factorized, then different FFT backends will be used and then the results is merged using the Cooley-Tukey FFT algorithm.
- Template Parameters:
F – The prime number FFT backend.
P – Using Radix-P FFT backend.
Public Functions
-
inline FactorizedFFT()
Default constructor.
-
inline FactorizedFFT(int max_factor)
Constructor.
- Parameters:
max_factor – Maximal radix that should be checked for radix based FFT.
-
inline void init(size_t n)
Precompute for array length n for both forward and backward FFT. For FactorizedFFT this method does nothing.
- Parameters:
n – The array length.
-
inline virtual void fft_internal(complex<double> *arr, size_t p, size_t q, bool forth, int b)
Perform independent FFTs for p arrays, each with length q.
- Parameters:
arr – A pointer to the array of complex numbers (as a matrix).
p – Number of rows (FFTs).
q – Number of columns (length of each FFT).
forth – Whether this is forward transform.
b – Radix. Zero if radix based FFT should not be used.
-
inline void cooley_tukey(complex<double> *arr, size_t n, bool forth, const size_t *pr, const int *b, size_t np)
Cooley-Tukey FFT algorithm for FFT with array length being a composite number.
- Parameters:
arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
pr – A pointer to the array of factors in n;
b – A pointer to the array of radices for each factor. pr should be multiple of b, if b is not zero.
np – Number of factors.
-
inline void fft(complex<double> *arr, size_t n, bool forth)
Perform inplace FFT.
Forward FFT: \(X[k] = \sum_{j=0}^{n - 1} x[j] \exp(-2 \pi \mathrm{i} jk/n)\)
Backward FFT: \(X[k] = \frac{1}{n} \sum_{j=0}^{n - 1} x[j] \exp(2 \pi \mathrm{i} jk/n)\)
- Parameters:
arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
-
typedef FactorizedFFT<RaderFFT<>, 2, 3, 5, 7, 11> block2::FFT2
FFT with small prime factorization implemeted using Rader’s algorithm.
-
typedef FactorizedFFT<BluesteinFFT<>, 2, 3, 5, 7, 11> block2::FFT
FFT with small prime factorization implemeted using Bluestein’s algorithm.