The LC Framework for Synthesizing High-Speed Lossless and Error-Bounded Lossy Compressors
PI: Martin Burtscher (Texas State University)
Co-PI: Sheng Di (Argonne National Laboratory)
Senior advisor: Franck Cappello (ANL)
Ph.D. students: Noushin Azami, Alex Fallin, Yiqian Liu, Brandon Burtchell, and Benila Jerald
DE-SC0022223 and DE-AC02-06CH11357
DE-SC0022223 and DE-AC02-06CH11357
LC Framework
Preprocessor library
Component library
Q_abs_f32
Lor2D_i32
Q_rel_f64
RLE_1
BIT_4
. . .
ZE_8
RE_2
. . .
Q_abs_f32
. . .
BIT_4
. . .
RE_2
input
output
LC compression pipeline�(decompressor is inverse stages in reverse order)
DE-SC0022223 and DE-AC02-06CH11357
DE-SC0022223 and DE-AC02-06CH11357
LC Algorithms
Supported lossless algorithms
Supported lossy algorithms
Component library
RLE_1
BIT_4
. . .
ZE_8
RE_2
Preprocessor library
Q_abs_f32
Lor2D_i32
Q_rel_f64
. . .
3
DE-SC0022223 and DE-AC02-06CH11357
LC Search and Statistics
Search capabilities
Result statistics
4
DE-SC0022223 and DE-AC02-06CH11357
Demo
FLDSC_1_1800_3600.dat from SDRbench suite
./generate_Device_LC-Framework.py
nvcc -O3 -arch=sm_86 -DUSE_GPU -Xcompiler "-O3 -march=native -fopenmp" -o lc lc.cu
./lc FLDSC_1_1800_3600.dat CR "" "BIT_4 RLE_4"
./lc FLDSC_1_1800_3600.dat CR "" "BIT_4 RLE_1"
./lc FLDSC_1_1800_3600.dat AL "" "BIT_4 RLE_1"
./lc
./lc FLDSC_1_1800_3600.dat CR "" "BIT_4 .+"
./lc FLDSC_1_1800_3600.dat CR "" ".+ .+"
./lc FLDSC_1_1800_3600.dat PR "" ".+ .+ .+"
./lc FLDSC_1_1800_3600.dat CR "" ".+ .+ .+"
./lc FLDSC_1_1800_3600.dat CR "" ".+ .+ L.+|R.+|Z.+|C.+"
./lc FLDSC_1_1800_3600.dat CR "" "DIFF_4 .+ .+ L.+|R.+|Z.+|C.+"
// pipeline script
./scripts/ga_search.py -s 5 -r 5 FLDSC_1_1800_3600.dat
./lc FLDSC_1_1800_3600.dat EX "" ".+ .+ L.+|R.+|Z.+|C.+"
5
DE-SC0022223 and DE-AC02-06CH11357
Performance Optimization
Internal operation
Chunk processing
6
DE-SC0022223 and DE-AC02-06CH11357
Demo (cont.)
ls -lrt
// redo for other test files
// script to find best pipeline across inputs
// ./scripts/are_components_used_or_not.py 10 FLDSC_1_1800_3600.dat.CR3.csv
./lc FLDSC_1_1800_3600.dat CR "QUANT_ABS_R_f32(0.01)" ".+ .+ L.+|R.+|Z.+|C.+"
./lc FLDSC_1_1800_3600.dat CR "QUANT_ABS_R_f32(0.1)" ".+ .+ L.+|R.+|Z.+|C.+"
./lc FLDSC_1_1800_3600.dat CR "QUANT_REL_R_f32(0.01)" ".+ .+ L.+|R.+|Z.+|C.+"
./lc FLDSC_1_1800_3600.dat EX "QUANT_REL_R_f32(0.01)" ".+ .+ L.+|R.+|Z.+|C.+"
./lc FLDSC_1_1800_3600.dat EX "QUANT_REL_R_f32(0.01)" ".+ .+ L.+|R.+|Z.+|C.+" "MAXREL_f32(0.01)"
7
DE-SC0022223 and DE-AC02-06CH11357
Extensible Libraries and Scripts
Extensible design
Included scripts
8
DE-SC0022223 and DE-AC02-06CH11357
CPU Component Interface
// CPU encoder
// returns false if the encoded data does not fit in the out array
static inline bool h_[name](int& csize, byte in[CS], byte out[CS]);
// CPU decoder
static inline void h_i[name](int& csize, byte in[CS], byte out[CS]);
// losslessly transform the first csize bytes of the 'in' array
// write the result to the 'out' array
// update csize if the transformed data has a different size than the input
// must be serial code (e.g., cannot use OpenMP)
// are allowed to change the contents of both arrays
// the two arrays are guaranteed to start at an 8-byte aligned address
9
DE-SC0022223 and DE-AC02-06CH11357
GPU Component Interface
// GPU encoder
// returns false if the encoded data does not fit in the out array
static __device__ inline bool d_[name](int& csize, byte in[CS], byte out[CS], byte temp[CS]);
// GPU decoder
static __device__ inline void d_i[name](int& csize, byte in[CS], byte out[CS], byte temp[CS]);
// losslessly transform the first csize bytes of the 'in' array
// write the result to the 'out' array
// update csize if the decoded data has a different size than the input data
// must be thread-block-local code
// are allowed to change the contents of all three arrays
// don’t allocate __shared__ memory (use temp, e.g., int* buf = (int*)&temp;)
// the three arrays are guaranteed to start at an 8-byte aligned address
10
DE-SC0022223 and DE-AC02-06CH11357
CPU Component Example
// invert data (at byte granularity)
static inline bool h_INV_1(int& csize, const byte in [CS], byte out [CS])
{
for (int i = 0; i < csize; i++) {
out[i] = ~in[i];
}
return true;
}
static inline void h_iINV_1(int& csize, const byte in [CS], byte out [CS])
{
for (int i = 0; i < csize; i++) {
out[i] = ~in[i];
}
}
11
DE-SC0022223 and DE-AC02-06CH11357
CPU Preprocessor Interface
// CPU preprocessor encoder
static inline void h_[name](int& size, byte*& data, const int paramc, const double paramv []);
// CPU preprocessor decoder
static inline void h_i[name](int& size, byte*& data, const int paramc, const double paramv []);
// transforms the 'size' bytes in the 'data' array and writes the result either back to the 'data' array or to a new array and then makes 'data' point to this new array
// if the number of bytes changes, the 'size' needs to be updated accordingly
// the data array must start at an 8-byte aligned address
// 'paramc' specifies the number of elements in the 'paramv' array
// the 'paramv' array passes the command-line arguments provided to this preprocessor (e.g., the error bound, data set dimensionality, etc.)
// this code must be manually parallelized (using OpenMP) if desired
12
DE-SC0022223 and DE-AC02-06CH11357
GPU Preprocessor Interface
// GPU preprocessor encoder
static inline void d_[name](int& size, byte*& data, const int paramc, const double paramv []);
// GPU preprocessor decoder
static inline void d_i[name](int& size, byte*& data, const int paramc, const double paramv []);
// transforms the 'size' bytes in the 'data' array and writes the result either back to the 'data' array or to a new array and then makes 'data' point to this new array
// if the number of bytes changes, the 'size' needs to be updated accordingly
// the data array must start at an 8-byte aligned address
// 'paramc' specifies the number of elements in the 'paramv' array
// the 'paramv' array passes the command-line arguments provided to this preprocessor (e.g., the error bound, data set dimensionality, etc.)
// must be a host function that launches a kernel to do the preprocessing
// the kernel is allowed to allocate and use shared memory
// 'data' must be in device memory
13
DE-SC0022223 and DE-AC02-06CH11357
CPU Preprocessor Example
// add user-provided constant (at byte granularity)
static inline void h_ADD(int& size, byte*& data, const int paramc, const double paramv [])
{
assert(paramc == 1);
const byte offset = paramv[0];
#pragma omp parallel for default(none) shared(size, data, offset)
for (int i = 0; i < size; i++) {
data[i] += offset;
}
}
static inline void h_iADD(int& size, byte*& data, const int paramc, const double paramv [])
{
...
data[i] -= offset;
...
}
14
DE-SC0022223 and DE-AC02-06CH11357
Languages and Guarantees
Supported languages
Guarantees on compressed and decompressed output files
15
DE-SC0022223 and DE-AC02-06CH11357