Applying Business Intelligence for Analyzing Big Data Using High-Performance Compressed Columnar Database System for Data Warehouse
Paper Submission ID: 136
Mohammad Shohal Bhuiyan�Department of Computer Science & Engineering,
Dhaka University of Engineering & Technology (DUET),
Gazipur
Prof. Dr. Mohammad Abdur Rouf
Dhaka University of Engineering & Technology (DUET), Gazipur
Email:marouf.cse@student.duet.ac.bd
Md. Mostafijur Rahman
Dhaka University of Engineering & Technology (DUET), Gazipur
Email:20104003@student.duet.ac.bd
Outline
2
3
Introduction
Business Intelligence
4
Business Intelligence (BI) combines:
5
Big Data
6
Data Warehouse
7
Problem Definition
8
Motivation
We have used the following existing technology in this research-
9
Existing work
10
Our Approach
11
Basic Step Diagram of BI
Fig.1 Basic Step Diagram of Proposed Algorithm
12
Proposed Prototype
Fig.2. Basic diagram of Proposed Algorithm
13
Proposed Prototype
Fig: Block diagram
Fig.3. Proposed High-Performance Compressed Columnar Database System for Data Warehouse
14
Compressed Columnar Dictionary
15
Data Insertion Using Compressed Columnar
16
Data Compression
Fig. 4. Data compression technique
17
Proposed Prototype
Fig. 4. Prototype of BI
18
Experimental Results
19
Performance - Storage capacity
Number of Tuples (Million) | Estimated Storage for Uncompressed Records(GB) | Actual Storage for SQL Server (Ss) (GB) | Actual Storage for ORACLE (So) (GB) | Estimated Storage for HPMDB Records(GB) | Actual Storage for HPMDB (Sh) (GB) |
1 | 0.02 | 0.34 | 0.34 | 0.018 | 0.011 |
5 | 0.1 | 2.3 | 2.3 | 0.09 | 0.055 |
List of SELECT Queries -
20
Q1 | SELECT * FROM Customer WHERE c_birth_year>=1991 |
Q2 | SELECT * FROM Customer WHERE c_birth_year>=1991 AND c_birth_country="Bangladesh" |
Q3 | SELECT * FROM Customer WHERE c_birth_year>=1991 AND c_last_review_date_sk='2022-03-21' |
Time (second)
Performance - Query time difference
Fig. 5. Query Time Difference with existing schemes
21
Performance - Trace of Elapsed Time
Fig. 6. Trace of Elapsed Time on Indexed and Non-Indexed Form of Compressed Database
22
Estimated Storage Space Comparison
Fig. 7. Estimated Storage Space Comparison in Different Architecture
Data Format
Programming Languages
23
Technology uses
24
Impacts and Constraints
Economic Impact
25
26
Future work
27
Key Outcomes
28
Conclusion
Reference
[1] F. Faerber, A. Kemper, P. Å. Larson, J. Levandoski, T. Neumann, and A. Pavlo, “Main memory database systems,” Found. Trends Databases, vol. 8, no. 1–2, pp. 1–130, 2017.
[2] H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, and M. Zhang, “In-Memory Big Data Management and Processing: A Survey,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 7, pp. 1920–1948, 2015.
[3] A. Kemper and T. Neumann, “HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots,” Proc. - Int. Conf. Data Eng., pp. 195–206, 2011.
[4] V. Sikka, F. Färber, A. Goel, and W. Lehner, “Sap Hana,” Proc. VLDB Endow., vol. 6, no. 11, pp. 1184–1185, 2013.
[5] N. Pansare, V. Borkar, C. Jermaine, and T. Condie, “Online aggregation for large MapReduce jobs,” Proc. VLDB Endow., vol. 4, no. 11, pp. 1135–1145, 2011.
[6] A. Habib, A. S. M. Latiful Hoque, and M. R. Hussain, “H-HIBASE: Compression enhancement of HIBASE technique using huffman coding,” J. Comput., vol. 8, no. 5, pp. 1175–1183, 2013.
[7] M. A. Rouf, “Scalable Storage in Compressed Representation for Terabyte. Data Management,” MSc Thesis, Dept. of CSE, BUET, 2006.
[8] “TPC-DS is a decision support benchmark.” [Online]. Available: http://www.tpc.org/tpcds/.
[9] He, Y. et al. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. in Proceedings - International Conference on Data Engineering, 2011
29
30
Any Questions?