Knowledge redundancy approach to reduce size in association rules

Julio César Díaz Vera; Guillermo Manuel Negrín Ortiz; Carlos Molina; María Amparo Vila

doi:10.31449/inf.v44i2.2839

Abstract

Association Rules Mining is one of the most studied and widely applied fields in Data Mining. However, the discovered models usually result in a very large set of rules; so the analysis capability, from the user point of view, is diminishing. Hence, it is difficult to use the found model in order to assist decision-making process. The previous handicap is heightened in presence of redundant rules in the final set. In this work a new definition of redundancy in association rules is proposed, based on user prior knowledge. A post-processing method is developed to eliminate this kind of redundancy, using association rules known by the user. Our proposal allows to find more compact models of association rules to ease its use in the decision-making process. The developed experiments have shown reduction levels that exceed 90 percent of all generated rules, using prior knowledge always below ten percent. So, our method improves the efficiency of association rules mining and the exploitation of discovered association rules.

References

J. Núñez, (2007), “Empleo de Fuzzy OLAP para Obtener Reglas que Caractericen Estrategias de Inversión”.

D. J. Newman, (2007), “UCI Repository of Machine Learning Databases”,University of California, School of Information and Computer Science, Irvine, CA.

Sisodia, Dilip Singh and Singhal, Riya and Khandal, Vijay, (2018), “Comparative performance of interestingness measures to identify redundant and noninformative rules from web usage data”, International Journal of Technology. https://doi.org/10. 14716/ijtech.v9i1.1510

Ali Yousif Hasan, (2019), “Evaluation and Validation of the Interest of the Rules Association in Data- Mining”, International Journal of Computer Science and Mobile Computing, Vol.8 Issue.3, pp. 230-239.

N. Bhargava, M. Shukla, (2016), “Survey of Interestingness Measures for Association Rules Mining: Data Mining, Data Science for Business Perspective”, International Journal of Computer Science and Information Technology (IJCSITS), Vol.6, No.2, Mar-April 2016, pp. 74-80.

Sudarsanam, Nandan and Kumar, Nishanth and Sharma, Abhishek and Ravindran, Balaraman, (2019), “Rate of change analysis for interestingness measures”, Knowledge and Information Systems. https: //doi.org/10.1007/s10115-019-01352-3

J. Blanchard, F. Guillet, P. Kuntz, (2009), “Semanticsbased classification of rule interestingness measures in Post-mining of association rules: techniques for effective knowledge extraction”, IGI Global, pp. 56-79. https://doi.org/10.4018/ 978-1-60566-404-0.ch004

V. de Carvalho, V. Oliveira, R. de Padua, S. Oliveira, (2016), “Solving the Problem of Selecting Suitable Objective Measures by Clustering Association Rules Through the Measures Themselves”, SOFSEM 2016: Theory and Practice of Computer Science. Springer Berlin Heidelberg. pp. 505-517. https://doi. org/10.1007/978-3-662-49192-8_41

V. Oliveira, D. Duarte, M. Violante, W. dos Santos, R. de Padua, S. Oliveira, (2017), “Ranking Association Rules by Clustering Through Interestingness”, in Mexican International Conference on Artificial Intelligence, pp 336-351. Annals of Data Science 1.1 (2014): pp. 25-39.

D. R. Carvalho, A. A. Freitas, N. Ebecken, (2005), “Evaluating the correlation between objective rule interestingness measures and real human interest”, Knowledge Discovery in Databases: PKDD 2005, Springer, pp. 453-461. https://doi.org/10. 1007/11564126_45

A. Silberschatz, A. Tuzhilin, (1996), “What makes patterns interesting in knowledge discovery systems”, IEEE Trans. Knowledge Data Eng, vol. 8, no. 6, pp. 970-974. https://doi.org/10.1109/69. 553165

R. Batra, M. A. Rehman, (2019), “Actionable Knowledge Dsicovery for Increasing Enterprise Profit, Using Domain Driven Data Mining.”, IEEE Acces vol.7, pp. 182924-182936. https://doi.org/ 10.1109/access.2019.2959841

R. Sehti, B. Shekar, (2019), “Subjective interestingness in Association Rule Mining: A Theoretical Analysis”, Digital Business, Springer Charm, pp. 375-389. https://doi.org/10.1007/ 978-3-319-93940-7_15

L. Greeshma, G. Pradeepini, (2016), “Unique Constraint Frequent Item Set Mining”, Advanced Computing (IACC), 2016 IEEE 6th International Conference on pp. 68-72. IEEE. https://doi.org/10. 1109/iacc.2016.23

A. Kaur, V. Aggarwal, S. K. Shankar, (2016), “An efficient algorithm for generating association rules by using constrained itemsets mining”, Recent Trends in Electronics, Information Communication Technology (RTEICT), IEEE International Conference on (pp. 99- 102). IEEE. 2016. https://doi.org/10.1109/ rteict.2016.7807791

Berrado, G. C. Runger, (2007), “Using metarules to organize and group discovered association rules”, Data Mining and Knowledge Discovery, vol. 14, no. 3, pp. 409-431. https://doi.org/10.1007/ s10618-006-0062-6

W. Liu,W. Hsu, S. Chen, (1997), “Using General Impressions to Analyze Discovered Classification Rules”, KDD, pp. 31-36

W. Liu, W. Hsu, K. Wang, S. Chen, (1999), “Visually aided exploration of interesting association rules”, Methodologies for Knowledge Discovery and Data Mining, Springer, pp. 380-389. https://doi. org/10.1007/3-540-48912-6_52

B. Liu, W. Hsu, S. Chen, Y. Ma, (2000), “Analyzing the subjective interestingness of association rules”, Intell. Syst. Their Appl. IEEE, vol. 15, no. 5, pp. 47-55. https://doi.org/10.1109/5254.889106

B. Padmanabhan, A. Tuzhilin, (2000), “Small is beautiful: discovering the minimal set of unexpected patterns”, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 54-63. https://doi.org/10. 1145/347090.347103

K. Wang, Y. Jiang, L. V. Lakshmanan, (2003), “Mining unexpected rules by pushing user dynamics”, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 246-255. https://doi.org/10.1145/ 956750.956780

Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal, (2000), “Mining minimal nonredundant association rules using frequent closed itemsets”, Proc. International Conference on Computational Logic (CL 2000), pp. 972-986. https:// doi.org/10.1007/3-540-44957-4_65

M. Quadrana, A. Bifet, R. Gavalda, (2015), “An efficient closed frequent itemset miner for the MOA stream mining system”, AI Communications 28.1: pp. 143-158. https://doi.org/10. 3233/aic-140615

L. Greeshma, G. Pradeepini, (2016), “Mining Maximal Efficient Closed Itemsets Without Any Redundancy”, Information Systems Design and Intelligent Applications. Springer India. pp. 339-347. https:// doi.org/10.1007/978-81-322-2755-7_36

G. Gasmi, S. B. Yahia, E. M. Nguifo, Y. Slimani, (2005), “A New Informative Generic Base of Association Rules”, Advances in Knowledge Discovery and Data Mining, pp. 81-90, Springer Berlin Heidelberg. https://doi.org/10.1007/11430919_11

C. L. Cherif, W. Bellegua, S. Ben Yahia, G. Guesmi, (2005), “VIE-MGB: A Visual Interactive Exploration of Minimal Generic Basis of Association Rules”, Proc. International Conferences on Concept Lattices and Applications (CLA 2005), pp.179-196.

P. Fournier-Viger, Wu C.-W., V. S. Tseng, (2014), “Novel Concise Representations of High Utility Itemsets using Generator Patterns”, Proc. 10th International Conference on Advanced Data Mining and Applications, Springer LNAI. https://doi.org/10. 1007/978-3-319-14717-8_3

Y. Xu, Y. Li, G. Shaw, (2011), “Reliable representations for association rules”, Data and Knowledge Engineering, vol. 70, no. 6, pp. 555-575. https://doi. org/10.1016/j.datak.2011.02.003

Phan-Luong, (2001), “The representative basis for association rules”, Proc. IEEE. International Conference on Data Mining (ICDM 2001), pp. 639-640. https: //doi.org/10.1109/icdm.2001.989588

B. Baesens, S. Viaene, and J. Vanthienen, (2000), “Post-processing of association rules”, DTEW Res. Rep. 0020, pp. 118.

J. Hipp, U. Gntzer, (2002), “Is pushing constraints deeply into the mining algorithms really what we want?: an alternative approach for association rule mining”, ACM SIGKDD Explorations 4(1), pp.50-55. https://doi.org/10.1145/ 568574.568582

R. J. Bayardo, (2005), “The Hows, Whys, and Whens of Constraints and Itemset and Rule Discovery”, Constraint-Based Mining and Inductive Databases LNCS3848, Springer: pp.1-13. https://doi. org/10.1007/11615576_1

W. Armstrong, (1974), “Dependency structures of database relationships”, IFIP Congress, pp. 580-583.

Tirnuc, Cristina and Balcázar, José L. and Gómez- Pérez, Domingo, (2020), “Closed-SetBased Discovery of Representative Association Rules”, International Journal of Foundations of Computer Science, vol. 31, no.1, pp. 143-156. https://doi.org/ 10.1142/s0129054120400109

D. Maier, (1983), “Theory of Relational Database”.

W. A. Kosters, W. Pijls, V. Popova, (2003), “Complexity analysis of depth first and fpgrowth implementations of apriori”, Machine Learning and Data Mining in Pattern Recognition, Springer, pp. 284-292. https: //doi.org/10.1007/3-540-45065-3_25

H. Toivonen, M. Klemettinen, P. Ronkainen, K. Htnen, H. Mannila, (1995), “Pruning and grouping discovered association rules”, MLnet Wkshp. on Statistics, Machine Learning, and Discovery in Databases.

C. A. R. Hoare, (1972), “An axiomatic basis for computer programming”, Communications of the ACM, 12, pp. 334-341.

C. A. Furia, B. Meyer, S. Velder, (2014), “Loop invariants: Analysis, classification, and examples”, ACM Computing Surveys (CSUR), vol. 46, no 3, p. 34. https://doi.org/10.1145/2506375

Y. Xu, Y. Li, G. Shaw, (2011), ”Reliable representations for association rules”. Data & Knowledge Engineering, 70(6), 555-575. Elsevier. https://doi. org/10.1016/j.datak.2011.02.003

Djenouri, Y., Belhadi, A., Fournier-Viger, P., Lin, J. C. W. (2018). Discovering strong meta association rules using bees swarm optimization. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (2018, June) (pp. 195-206). Springer, Cham. https://doi.org/10.1007/ 978-3-030-04503-6_21

Knowledge redundancy approach to reduce size in association rules

Abstract

References

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information