Tools

Floating-Point Number Compression

For data like FCIDUMP integrals, the array elements do not need to be stored in its full binary form. Storage or memory can be saved by using flexible bit representation for numbers of different magnitudes. The FPCodec class implements the lossless and lossy compression and decompression of floating-point number arrays. Lossless compression is possible when all numebrs are close in their order of magnitudes.

template<typename T> struct FPtraits

Floating-point number representation details.

Template Parameters:: T – The floating type to implement.

Public Types

typedef T U: The representation integer type.

Public Static Attributes

static const int mbits = sizeof(T) * 8: Number of bits in significand.

static const int ebits = 0: Number of bits in exponent.

template<> struct FPtraits<float>

Representation details for single precision numbers.

Public Types

typedef uint32_t U: The representation integer type.

Public Static Attributes

static const int mbits = 23: Number of bits in significand.

static const int ebits = 8: Number of bits in exponent.

template<> struct FPtraits<double>

Representation details for double precision numbers.

Public Types

typedef uint64_t U: The representation integer type.

Public Static Attributes

static const int mbits = 52: Number of bits in significand.

static const int ebits = 11: Number of bits in exponent.

template<typename T, typename U = typename FPtraits<T>::U, int mbits = FPtraits<T>::mbits, int ebits = FPtraits<T>::ebits> inline string block2::binary_repr(T d)

Represent the binary form of a floating-point or integer number as a string.

Template Parameters:

T – The floating type to implement.
U – The representation integer type.
mbits – Number of bits in significand.
ebits – Number of bits in exponent.

Parameters:

d – The number to be represented.

Returns:

The binary form of the number as a string.

template<typename T, typename U> struct BitsCodec

Read/write bits from/into a floating-point number array.

Template Parameters:

T – The floating type for the elemenet of the array.
U – The corresponding integer/representation type of T.

Public Functions

inline BitsCodec(T *op_data)

Constructor.

Parameters:: op_data – The floating-point number array for storing the bits.

inline void begin_decode(): Prepare for decoding.

template<typename X> inline void encode(X x, int l)

Encode data and write into the array.

Template Parameters:

X – The type of the data.

Parameters:

x – The data to encode.
l – The number of bits of the data.

template<typename X> inline void decode(X &x, int l)

Read from the array and decode data.

Template Parameters:

X – The type of the data.

Parameters:

x – The output decoded data.
l – The number of bits of the data.

inline size_t finish_encode(): Finalize encoding.

Public Members

U buf: Current buffer.

T *op_data: The floating-point number array for storing the bits.

size_t d_offset: The position in the array of the current buffer.

int i_offset: Number of bits already used in the current buffer.

Public Static Attributes

static const int i_length = sizeof(U) * 8: Number of bits in single array element.

template<typename T, typename U = typename FPtraits<T>::U, int mbits = FPtraits<T>::mbits, int ebits = FPtraits<T>::ebits> struct FPCodec

Codec for compressing/decompressing array of floating-point numbers.

Template Parameters:

T – Floating type to implement.
U – The corresponding integer type.
mbits – Number of bits in significand.
ebits – Number of bits in exponent.

Public Functions

inline FPCodec(): Default constructor.

inline FPCodec(T prec)

Constructor.

Parameters:: prec – Floating-point number precision.

inline FPCodec(T prec, size_t chunk_size)

Constructor.

Parameters:

prec – Floating-point number precision.
chunk_size – Length of the array elements that should be processed at one time.

inline size_t decode(T *ip_data, size_t len, T *op_data) const

Decode data.

Parameters:

ip_data – The compressed floating-point array.
len – Length of the original floating-point array.
op_data – Output array for storing original data. Memory should be pre-allocated with length = len.

Returns:

Length of the array for the compressed data.

inline size_t encode(T *ip_data, size_t len, T *op_data) const

Encode data.

Parameters:

ip_data – The original floating-point array.
len – Length of the original floating-point array.
op_data – Output array for storing compressed data. Memory should be pre-allocated with length >= len + 1.

Returns:

Length of the array for the compressed data.

inline void write_array(ostream &ofs, T *data, size_t len) const

Compress array of floating-point data and write into file stream.

Parameters:

ofs – Output stream.
data – The original floating-point array.
len – The length of the original floating-point array.

inline void read_array(istream &ifs, T *data, size_t len) const

Read from file stream and deompress the data.

Parameters:

ifs – Input stream.
data – The floating-point array for storing the original data.
len – The length of the original floating-point array.

inline void read_chunks(istream &ifs, size_t len, vector<vector<T>> &chunks, size_t &chunk_size) const

Read from file stream (but not decompress the data).

Parameters:

ifs – Input stream.
len – The length of the original floating-point array.
chunks – For storing chunks of the compressed data.
chunk_size – Number of original array elements in each chunk (output).

Public Members

T prec: Precision for compression.

U prec_u: Integer representation of the precision.

mutable size_t ndata = 0: Length of the array of the data that has been compressed.

size_t ncpsd = 0: Length of the array for the compressed data.

mutable size_t ncpsd_last = 0: Length of the array written in the last call.

size_t chunk_size = 4096: Length of the array elements that should be processed at one time.

size_t n_parallel_chunks = 4096: Number of chunks to be processed in the same batch.

Public Static Attributes

static const int m = mbits : Number of bits in significand.

static const U e = U(1) << mbits : Exponent least significant bit mask.

static const U s = e << ebits : Sign bit mask.

static const U x = ~(e + s - 1): Exponent mask.

template<typename T> struct CompressedVector

An array with data stored in compressed form. Only support single thread.

Template Parameters:: T – Elemenet type.

Subclassed by block2::CompressedVectorMT< T >

Public Functions

inline CompressedVector(size_t arr_len, T prec, size_t chunk_size, int ncache = 4)

Constructor.

Parameters:

arr_len – Size of the array.
prec – Precision for compression.
chunk_size – Number of array elements in each chunk.
ncache – Number of cached decompressed chunks.

inline CompressedVector(istream &ifs, size_t arr_len, T prec, int ncache = 4)

Constructor.

Parameters:

ifs – Input stream, from which the compressed data is obtained.
arr_len – Size of the array.
prec – Precision for compression.
ncache – Number of cached decompressed chunks.

inline CompressedVector(T *arr, size_t arr_len, T prec, int ncache = 4)

Constructor.

Parameters:

arr – The compressed data as an array.
arr_len – Size of the array.
prec – Precision for compression.
ncache – Number of cached decompressed chunks.

virtual ~CompressedVector() = default: Deconstructor.

inline void clear(): Set all array elements to zero.

inline void shrink_to_fit(): Minimize the storage cost of the compressed data (after the data has been changed).

inline void finalize(): Write all cached data into compressed form.

inline virtual T &operator[](size_t i)

Write one element into the array.

Parameters:: i – Array index.
Returns:: The array elemenet.

inline virtual T operator[](size_t i) const

Read one element from the array.

Parameters:: i – Array index.
Returns:: The array elemenet.

inline size_t size() const: Get the size of the array.

Public Members

size_t arr_len: Size of the array.

size_t chunk_size: Number of array elements in each chunk.

int ncache: Number of cached decompressed chunks.

mutable int icache: Index of next available cache (head of the circular queue).

mutable vector<vector<T>> cp_data: Chunks of compressed data.

mutable vector<pair<size_t, vector<T>>> cache_data: Cached data of decompressed chunks.

mutable vector<bool> cache_dirty: Whether each cached chunk has been changed.

FPCodec<T> fpc: Floating-point number compression driver.

template<typename T> struct CompressedVectorMT : public block2::CompressedVector<T>

A read-only array with data stored in compressed form, with multi-threading support.

Template Parameters:: T – Elemenet type.

Public Functions

inline CompressedVectorMT(const shared_ptr<CompressedVector<T>> &ref_cv, int ntg)

Constructor.

Parameters:

ref_cv – Pointer to the single-thread compressed array.
ntg – Number of threads.

inline virtual T &operator[](size_t i)

Write one element into the array (not supported, will cause assertion failure).

Parameters:: i – Array index.
Returns:: The array elemenet.

inline virtual T operator[](size_t i) const

Read one element from the array.

Parameters:: i – Array index.
Returns:: The array elemenet.

inline size_t size() const: Get the size of the array.

Public Members

mutable vector<int> icaches: Index of next available cache (head of the circular queue) for each thread.

mutable vector<vector<pair<size_t, vector<T>>>> cache_datas: Cached data of decompressed chunks for each thread.

shared_ptr<CompressedVector<T>> ref_cv: Pointer to the single-thread compressed array.

Kuhn-Munkres Algorithm

The KuhnMunkres class implements the Kuhn-Munkres algorithm for finding the best matching in a bipartite with the lowest cost.

struct KuhnMunkres

Kuhn-Munkres algorithm for finding the best matching with lowest cost. Complexity: O(n^3).

Public Functions

inline KuhnMunkres(const vector<double> &cost_matrix, int m, int n = 0)

Constructor.

Parameters:

cost_matrix – Flattened matrix of cost.
m – Number of rows.
n – Number of columns.

inline bool match(vector<int> &x, int u)

Find an augmenting alternating path in the equality subgraph. If an equality subgraph has a perfect matching, the it is a maximum-weight matching in the graph.

Parameters:

x – Current matching.
u – Starting vertex.

Returns:

true if an augmenting alternating path is found.

inline pair<double, vector<int>> solve()

Find the lowest cost and a matching.

Returns:: the lowest cost and an index array x, with x[i] = j meaning that column index i should be matched to row index j.

Public Members

vector<double> cost: Flattened n x n matrix of cost.

int n: Number of rows or columns.

double inf: Infinity (constant).

double eps: Machine precision (constant).

vector<double> lx: Left feasible labeling (working array).

vector<double> ly: Right feasible labeling (working array).

vector<double> slack: Slack working array.

vector<char> st: Candidate augmenting alternating path (working array).

Number Theory Algorithms

The Prime class includes some number theory algorithms necessary for integer factorization and finding primitive roots, which is used in some Fast Fourier Transform algorithms.

struct Prime

Number theory algorithms for prime numbers and primitive root.

Public Functions

inline Prime(): Default constructor.

inline void init_primes(int np = 50000)

Initialize set of small primes, using sieve of Eratosthenes.

Parameters:: np – Maximal number considered for finding primes.

inline void factors(LL x, vector<pair<LL, int>> &pp)

Integer factorization.

Parameters:

x – Integer x.
pp – (output) Set of prime factors and their occurrence.

inline LL euler(LL n)

Euler’s totient function.

Parameters:: n – Integer n (not necessarily prime).
Returns:: phi(n), which counts the positive integers up to the given integer n that are relatively prime to n.

inline bool is_prime(LL n)

Prime testing.

Parameters:: n – Integer n to be tested (n < 2^63 - 1).
Returns:: true if n is a prime, false if n is not a prime.

inline int primitive_root(LL p)

Find one of primitive roots modulo n. A number g is a primitive root modulo n if every number a coprime to n is congruent to a power of g modulo n.

Parameters:: p – Modulus p.
Returns:: a primitive root g.

inline void primitive_roots(LL p, vector<LL> &gg)

Find all primitive roots modulo n.

Parameters:

p – Modulus p.
gg – (output) All primitive roots modulo n.

Public Members

int np: Maximal number considered for generation of set of small primes.

vector<int> primes: Set of small primes.

Public Static Functions

static inline LL pmod(LL x, LL n): Return positive x mod n.

static inline void exgcd(LL a, LL b, LL &d, LL &x, LL &y)

Extended Euclidean algorithm. Find integers x and y such that a x + b y = gcd(a, b).

Parameters:

a – Integer a.
b – Integer b.
d – (output) Greatest Common Divisor gcd(a, b).
x – (output) Integer x.
y – (output) Integer y.

static inline LL gcd(LL a, LL b)

Euclidean algorithm. Find Greatest Common Divisor gcd(a, b).

Parameters:

a – Integer a.
b – Integer b.

Returns:

Greatest Common Divisor gcd(a, b).

static inline LL inv(LL a, LL n)

Find modular multiplicative inverse of a (mod n). Using extended Euclidean algorithm.

Parameters:

a – Integer a.
n – Modulus n.

Returns:

a^{-1} (mod n) if it exists, otherwise -1.

static inline LL power(LL n, int i)

Find n to the power of i using binary algorithm.

Parameters:

n – Integer n.
i – Integer i (i >= 0).

Returns:

n^i.

static inline LL sqrt(LL x)

Find the largest number r such that r * r <= x. Using binary algorithm.

Parameters:: x – Integer x.
Returns:: floor(sqrt(x)).

static inline LL quick_multiply(LL x, LL y, LL p)

Find (x * y) mod p. Note that the expression x * y % p may overflow if the intermediate x * y > 2^63 - 1. This function will not overflow (if p <= 2^62 - 1)

Parameters:

x – Integer x.
y – Integer y.
p – Modulus p.

Returns:

(x * y) mod p.

static inline LL quick_power(LL n, LL i, LL p)

Find (n ^ i) mod p using binary algorithm.

Parameters:

n – Integer n.
i – Integer i (i >= 0).
p – Modulus p.

Returns:

(n ^ i) mod p.

static inline bool miller_rabin(LL a, LL n)

Miller-Rabin primality test.

Note

If n < 3215031751, it is enough to test a = 2, 3, 5, and 7; if n < 341550071728321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.

Parameters:

a – Base number a.
n – The number to be tested for prime (n >= 3).

Returns:

true if n is likely to be a prime. false if n is not a prime.

static inline LL pollard_rho(LL n)

Pollard’s rho algorithm. Find a factor of integer n.

Parameters:: n – Integer n.
Returns:: A (not necessarily prime) factor of n.

Fast Fourier Transform (FFT)

A collection of FFT algorithms is implemented here. The implementation focuses more on flexibility and readability, rather than optimal performance (but the performance should be acceptable).

The block2 implementation is faster than numpy implementation for some special array lengths, such as 5929741 = 181 ** 3 and 5929742 = 2 * 7 * 13 * 31 * 1051.

template<int base> struct BasicFFT

Fast Fourier Transform (FFT) with array length of base^k (k >= 0). The complexity is O(n log_base n * base^2).

Template Parameters:: base – The radix (base = 2 is specialized)

Public Functions

inline BasicFFT(): Constructor.

inline void init(size_t n)

Precompute for array length n for both forward and backward FFT.

Parameters:: n – The array length, must be a power of base.

inline void fft(complex<double> *arr, size_t n, bool forth)

Perform inplace FFT.

Parameters:

arr – A pointer to the array of complex numbers.
n – Number of elements in the array, must be a power of base.
forth – Whether this is forward transform.

Public Members

vector<size_t> r: The index permutation array.

vector<complex<double>> wb: The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.

vector<complex<double>> wf: The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.

array<array<complex<double>, base>, base> xw[2]: The precomputed primitive base-th root of 1 exp(-/+ i2pi jk/base) for forward/backward FFT.

Public Static Functions

static inline size_t pad(size_t n)

Find smallest number x such that x = base^k >= n.

Parameters:: n – The array length.
Returns:: The padded array length.

template<> struct BasicFFT<2>

Radix-2 Fast Fourier Transform (FFT) with complexity O(nlog n).

Public Functions

inline BasicFFT(): Constructor.

inline void init(size_t n)

Precompute for array length n for both forward and backward FFT.

Parameters:: n – The array length must be a power of 2.

inline void fft(complex<double> *arr, size_t n, bool forth)

Perform inplace FFT.

Parameters:

arr – A pointer to the array of complex numbers.
n – Number of elements in the array, must be a power of 2.
forth – Whether this is forward transform.

Public Members

vector<size_t> r: The index permutation array.

vector<complex<double>> wb: The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.

vector<complex<double>> wf: The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.

Public Static Functions

static inline size_t pad(size_t n)

Find smallest number x such that x = 2^k >= n.

Parameters:: n – The array length.
Returns:: The padded array length.

template<typename B = BasicFFT<2>> struct RaderFFT

Rader’s FFT algorithm for prime array length. This algorithm transforms a FFT of length n to two (forward and backward) FFTs of length m, where m is the padded array length for 2 * n - 3 for the backend FFT.

Template Parameters:: B – The backend FFT for computing the padded FFT.

Public Functions

inline RaderFFT(): Default constructor.

inline RaderFFT(const shared_ptr<Prime> &prime)

Constructor.

Parameters:: prime – Instance for prime number algorithms.

inline void init(size_t n)

Precompute for array length n for both forward and backward FFT.

Parameters:: n – The array length, which must be a prime number.

inline void fft(complex<double> *arr, size_t n, bool forth)

Perform inplace FFT.

Parameters:

arr – A pointer to the array of complex numbers.
n – Number of elements in the array, which must be a prime number.
forth – Whether this is forward transform.

Public Members

vector<LL> wb: Precomputed inverse of the power of primitive root g^{-k} mod n for k = 0, 1, …, n - 1.

vector<LL> wf: Precomputed power of primitive root g^k mod n for k = 0, 1, …, n - 1.

vector<complex<double>> cb: FFT transformed exp(i2pi g^{-k}/n).

vector<complex<double>> cf: FFT transformed exp(-i2pi g^{-k}/n).

vector<complex<double>> arx: Working space for padded array.

size_t nn: Padded array length.

B b: The backend FFT instance.

shared_ptr<Prime> prime: Instance for prime number algorithms.

template<typename B = BasicFFT<2>> struct BluesteinFFT

Bluestein’s FFT algorithm for arbitrary array length. This algorithm transforms a FFT of length n to two (forward and backward) FFTs of length m, where m is the padded array length for 2 * n for the backend FFT.

Template Parameters:: B – The backend FFT for computing the padded FFT.

Public Functions

inline BluesteinFFT(): Default constructor.

inline BluesteinFFT(const shared_ptr<Prime> &prime)

Constructor.

Parameters:: prime – Instance for prime number algorithms (ignored).

inline void init(size_t n)

Precompute for array length n for both forward and backward FFT.

Parameters:: n – The array length.

inline void fft(complex<double> *arr, size_t n, bool forth)

Perform inplace FFT.

Parameters:

arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.

Public Members

vector<complex<double>> wb: The precomputed primitive nth root of 1 exp(i2pi k/n) for backward FFT.

vector<complex<double>> wf: The precomputed primitive nth root of 1 exp(-i2pi k/n) for forward FFT.

vector<complex<double>> cb: FFT transformed exp(i2pi [((k - n) - (k - n)(k - n)) / 2]/n).

vector<complex<double>> cf: FFT transformed exp(-i2pi [((k - n) - (k - n)(k - n)) / 2]/n).

vector<complex<double>> arx: Working space for padded array.

size_t nn: Padded array length.

B b: The backend FFT instance.

struct DFT

Naive Discrete Fourier Transform (DFT) algorithm with complexity O(n^2).

Public Functions

inline DFT(): Default constructor.

inline DFT(const shared_ptr<Prime> &prime)

Constructor.

Parameters:: prime – Instance for prime number algorithms (ignored).

inline void init(size_t n)

Precompute for array length n for both forward and backward FFT. For DFT this method does nothing.

Parameters:: n – The array length.

inline void fft(complex<double> *arr, size_t n, bool forth)

Perform inplace DFT.

Forward DFT: \(X[k] = \sum_{j=0}^{n - 1} x[j] \exp(-2 \pi \mathrm{i} jk/n)\)

Backward DFT: \(X[k] = \frac{1}{n} \sum_{j=0}^{n - 1} x[j] \exp(2 \pi \mathrm{i} jk/n)\)

Parameters:

arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.

template<typename F, int P, int... Q> struct FactorizedFFT

FFT algorithm using different radix FFT backends. The array length is first factorized, then different FFT backends will be used and then the results is merged using the Cooley-Tukey FFT algorithm.

Template Parameters:

F – The prime number FFT backend.
P – Using Radix-P FFT backend.
Q – Using Radix-(Q1, Q2, …) FFT backend.

Public Functions

inline FactorizedFFT(): Default constructor.

inline FactorizedFFT(int max_factor)

Constructor.

Parameters:: max_factor – Maximal radix that should be checked for radix based FFT.

inline void fft_internal(complex<double> *arr, size_t p, size_t q, bool forth, int b) override

Perform independent FFTs for p arrays, each with length q.

Parameters:

arr – A pointer to the array of complex numbers (as a matrix).
p – Number of rows (FFTs).
q – Number of columns (length of each FFT).
forth – Whether this is forward transform.
b – Radix. Zero if radix based FFT should not be used.

template<typename F, int P> struct FactorizedFFT<F, P>

FFT algorithm using different radix FFT backends. The array length is first factorized, then different FFT backends will be used and then the results is merged using the Cooley-Tukey FFT algorithm.

Template Parameters:

F – The prime number FFT backend.
P – Using Radix-P FFT backend.

Public Functions

inline FactorizedFFT(): Default constructor.

inline FactorizedFFT(int max_factor)

Constructor.

Parameters:: max_factor – Maximal radix that should be checked for radix based FFT.

inline void init(size_t n)

Precompute for array length n for both forward and backward FFT. For FactorizedFFT this method does nothing.

Parameters:: n – The array length.

inline virtual void fft_internal(complex<double> *arr, size_t p, size_t q, bool forth, int b)

Perform independent FFTs for p arrays, each with length q.

Parameters:

arr – A pointer to the array of complex numbers (as a matrix).
p – Number of rows (FFTs).
q – Number of columns (length of each FFT).
forth – Whether this is forward transform.
b – Radix. Zero if radix based FFT should not be used.

inline void cooley_tukey(complex<double> *arr, size_t n, bool forth, const size_t *pr, const int *b, size_t np)

Cooley-Tukey FFT algorithm for FFT with array length being a composite number.

Parameters:

arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.
pr – A pointer to the array of factors in n;
b – A pointer to the array of radices for each factor. pr should be multiple of b, if b is not zero.
np – Number of factors.

inline void fft(complex<double> *arr, size_t n, bool forth)

Perform inplace FFT.

Forward FFT: \(X[k] = \sum_{j=0}^{n - 1} x[j] \exp(-2 \pi \mathrm{i} jk/n)\)

Backward FFT: \(X[k] = \frac{1}{n} \sum_{j=0}^{n - 1} x[j] \exp(2 \pi \mathrm{i} jk/n)\)

Parameters:

arr – A pointer to the array of complex numbers.
n – Number of elements in the array.
forth – Whether this is forward transform.

Public Members

const int max_factor = P : Maximal radix number.

shared_ptr<Prime> prime: Instance for prime number algorithms.

typedef FactorizedFFT<RaderFFT<>, 2, 3, 5, 7, 11> block2::FFT2: FFT with small prime factorization implemeted using Rader’s algorithm.

typedef FactorizedFFT<BluesteinFFT<>, 2, 3, 5, 7, 11> block2::FFT: FFT with small prime factorization implemeted using Bluestein’s algorithm.