Automatic Selection of Bitmap Join Indexes in Data Warehouses Using CFPGrowth++ Algorithm

Mohammed Yahyaoui, Noura Aknin, Souad Amjad, Lamia Benameur

Abstract


In the context of complex data warehousing, Typically, the analysis and decision-making process for Data Warehouses schematized in a relational star model is conducted through OLAP (On-Line Analytical Processing) queries. These queries are generally complex, characterized by several operations of selections, joins, grouping and aggregations on voluminous tables. Which requires a lot of computing time and therefore a very high response time. The cost of running OLAP decision queries on large tables is very high. The reduction of this cost becomes essential to allow decision-makers to interact within a reasonable time frame. The objective of this study is to enhance system performance by minimizing the response time of OLAP decision-making queries. The approach proposed in this article aims to search for frequent patterns for the automatic selection of binary join indexes used for reducing the execution costs of OLAP decision-making queries. To automatically generate the configuration of binary join indexes minimizing response time, an implementation of the CFPGrowth++ frequent pattern matching algorithm was well carried out and then applied to a load of queries on a test Data Warehouse created using the Analytical Processing Benchmark 1 (ABP-1) test bench, in order to validate our approach. The results of the experiment indicate that the index configuration produced by the proposed approach leads to a significant improvement in performance improvement of approximately 75%. We note that for a large portion of the load, execution time is significantly improved after applying our approach. The overall query execution time decreased compared to the general context. The overall execution time for queries decreased from 20,032.57 seconds before the application of our approach to 5,388.49 seconds after applying our approach. The experiments carried out show that the index configuration generated by the proposed approach allows a very performance gain.


Full Text:

PDF

References


REFERENCES

A. Vaisman, E. Zimányi, ‘Data Warehouse Systems - Design and Implementation’. Data-Centric Systems and Applications. Springer, 2014.

I. Kovacic, G. Christoph Schuetz, B. Neumayr, M. Schrefl, ‘OLAP Patterns: A pattern-based approach to multidimensional data analysis’, Data & Knowledge Engineering, Volume 138, 2022.

S. Chaudhuri, U. Dayal, Narasayya, V., ‘An overview of business intelligence technology’. Commun. ACM 54(8), 88–98, 2011.

A. Cuzzocrea, ’Evolving OLAP and BI towards Complex, High-Performance BigOLAP-Data-Cube-Processing Analytics Frameworks: How to

Speed-Up Large-Scale, High-Dimensional Queries over Clouds’, Procedia Computer Science 246 4169–4175, 2024.

A. B. Charef, A. Benameur, Towards NoSQL-based Data Warehouse Solution Integrating ECDIS

for Maritime Navigation Decision Support System’, Informatica 45 415–431, 2021. https://doi.org/10.31449/inf.v45i3.3204

I. A. Najm, J. M. Dahr, A. K. Hamoud, A. S. Alasady, W. A. Awadh, M. B. M. Kamel, A. M. Humadi, ’OLAP Mining with Educational Data Mart to Predict Students Performance’, Informatica 46 11–19, 2022.

H. Inmon, ‘Building the data warehouse’. John Wiley & sons, 2005. https://doi.org/10.31449/inf.v46i5.3853

R. Kimball, M. Ross, ‘The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence’, John Wiley & Sons, 2010.

D. M. Mosquera, R. Navarrete, S. L. Mora, L. Recalde, A. A. Cabrera, ’Integrating OLAP with NoSQL Databases in Big Data Environments: Systematic Mapping’, Big Data and Cognitive Computing, 8, 64, 2024.

N. Dedic, C. Stanier, ‘An evaluation of the challenges of multilingualismin data warehouse development’. In ICEIS 2016, Proceedings of the 18th International Conference on Enterprise Information Systems, Vol. 1, Rome, Italy, 196–206, 2016.

S. Roy, S. Raj, T. Chakraborty, A. Chakrabarty, A. Cortesi, S. Sen, ’Efficient OLAP query processing across cuboids in distributed data warehousing environment’, Expert Systems with Applications Volume 239, 2024.

S. Chaudhuri, V. Narasayya, ‘Self-tuning database systems: A decade of progress’. In Proceedings of the International Conference on Very Large Databases, 3–14, 2007.

R. Kimball, M. Ross, ‘The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling’, John Wiley & Sons , 2013.

M. Yahyaoui, S. Amjad, L. Benameur. I. Jellouli, ‘Efficient of bitmap join indexes for optimising star join queries in relational data warehouses’, Int. J.Computational Intelligence Studies, Vol. 9, No. 3, pp.220–233, 2020.

R. Strohm, ‘Oracle Database Concepts, 11g Release 1 (11.1)’ B28318-03, Octobre 2007.

D. Zhang, ‘B Trees’, Chapter 15 of Handbook of Data Structures and Applications, D. P. Mehta, S. Sahni (editors), Chapman & Hall/CRC, 2004.

RaslanKain ‘The index selection problem with configurations and memory limitation: A scatter search approach’, Computers & Operations Research, Volume 133, 2021.

D. Comer, ‘The difficulty of optimum index selection’. ACM Transactions on Database Systems, 3 (4), 440–445, 1978.

K. Stockinger, K. Wu, ‘Bitmap Indices for Data Warehouses, Data Warehouses and OLAP’, R. Wrembel and C. Koncilia, eds., IRM Press, 157-178, 2006.

S. Chauhuri, Datar, M. Narasayya, V. R. (2004). ‘Index selection for databases: a hardness study and a principled heuristic solution’. IEEE Transactions Knowledge on Data Engineering, Volume 16, Issue 11, Novombre 2004.

A. Rakesh, S. Ramakrishnan, ‘Fast Algorithms for Mining Association Rules’, International Conference on Very Large Databases, pp. 487-499, September 1994.

A. Netz, S. Chaudhuri, J. Bernhardt, U. Fayyad, ‘Integration of Data Mining and Relational Databases’, International Conference on Very Large Data Bases, pp. 719-722, September 2000.

M. Frank, E. Omiecinski, S. Navathe, ‘Adaptive and automated index selection in RDBMS’. Advances in Database Technology EDBT '92. 1992.

S. Agrawal, S. Chaudhuri, V. Narasayya, ‘Automated selection of materialized views and indexes in SQL databases’, proc. of VLDB conf, p.59, 2000.

A. Yishai, Feldman, R. Jacob, ‘A knowledge-based approach for index selection in relational databases’, Expert Systems with Applications, Volume 25, Issue 1, Pages 15-37, 2003.

M. Golfarelli, S. Rizzi, E. Saltarelli, ‘Index selection for data warehousing. Proceedings 4th International Workshop on Design and Management of Data Warehouses (DMDW'2002), Toronto, Canada, pp. 33-42, 2002.

Y. Zhang, M. Su, F. Wang, H. Chen, ‘HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP’, WISE Workshops, Vol. 8182 of Lecture Notes in Computer Science. Springer, 23-36, 2013.

R.U. Kiran, P.K. Reddy, ‘Novel Techniques to Reduce Search Space in Multiple Minimum Supports-Based Frequent Pattern Mining Algorithms’, EDBT/ICDT '11 21 March 2011.

H. Ya-Han, C. Yen-Liang, ’Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism’, Decision Support Systems, Volume 42, Issue 1, 2006.




DOI: https://doi.org/10.31449/inf.v49i27.7807

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.