Feature Selection Method Based on Honeybee-SMOTE for Medical Data Classification

Shobha Aswal, Neelu Jyothi Ahuja, Ritika Mehra

Abstract


Bio-Medical data analysis has an important role in clinical practices. Usually, bio-medical data have complex issues like skeweedness, redundant and irrelevant attributes etc.. Several redundant and unrelated features frequently degrade the accuracy of the classifier while using with imbalanced datasets. The selection of features becomes critical in this situation. The key goal of feature selection is to establish a feature subspace that maintains classifier accuracy even as reducing the excessive computational learning cost and casting off noise. Appropriate feature selection approaches are highly dependent on their ability to match the issue context and uncover fundamental patterns within the data. This study’s main goal is to construct a disease detection model that uses a hybrid feature-selection strategy based on Honeybee-SMOTE and classification using the c4.5 algorithm. The empirical results establish the suggested hybrid methodology's superiority over competing methods regarding the accuracy parameter, precision-parameter, recall-parameter, f1-score parameter and G-Mean parameter. The statistical analysis of the collected findings demonstrates that the suggested hybrid method outperforms and is competitive with existing state-of-the-art algorithms.

Full Text:

PDF

References


Abbass, H. A. (2001) 'MBO: Marriage in honey bees optimization a haplometrosis polygynous swarming approach', Proceedings of the IEEE Conference on Evolutionary Computation, ICEC, 1, pp. 207–214. doi: 10.1109/cec.2001.934391.

Abbass, H. A. H. (2001) 'A monogenous MBO approach to satisfiability', Proceeding of the international conference on computational intelligence for modelling, control and automation, CIMCA, (October 2001). Available at: https://www.researchgate.net/publication/2481231_A_Monogenous_MBO_Approach_to_Satisfiability.

Adamu, A. et al. (2021) 'An hybrid particle swarm optimization with crow search algorithm for feature selection', Machine Learning with Applications. Elsevier Ltd., 6(July), p. 100108. doi: 10.1016/j.mlwa.2021.100108.

Aljarah, I. et al. (2018) 'Simultaneous Feature Selection and Support Vector Machine Optimization Using the Grasshopper Optimization Algorithm', Cognitive Computation. Cognitive Computation, 10(3), pp. 478–495. doi: 10.1007/s12559-017-9542-9.

Arora, S. and Anand, P. (2019) 'Binary butterfly optimization approaches for feature selection', Expert Systems with Applications. Elsevier Ltd, 116, pp. 147–160. doi: 10.1016/j.eswa.2018.08.051.

Bunkhumpornpat, C., Sinapiromsaran, K. and Lursinsap, C. (2009) 'Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5476 LNAI, pp. 475–482. doi: 10.1007/978-3-642-01307-2_43.

Chawla, N. V. et al. (2002) 'snopes.com: Two-Striped Telamonia Spider', Journal of Artificial Intelligence Research, 16(Sept. 28), pp. 321–357. Available at: https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp.

Chen, B. et al. (2021) 'RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise', Information Sciences. Elsevier Inc., 553, pp. 397–428. doi: 10.1016/j.ins.2020.10.013.

Engelbrecht, A. P., Grobler, J. and Langeveld, J. (2019) 'Set based particle swarm optimization for the feature selection problem', Engineering Applications of Artificial Intelligence. Elsevier Ltd, 85(July), pp. 324–336. doi: 10.1016/j.engappai.2019.06.008.

Fayyad, U. and Stolorz, P. (1997) 'Data mining and KDD: Promise and challenges', Future Generation Computer Systems, 13(2–3), pp. 99–115. doi: 10.1016/s0167-739x(97)00015-0.

Haddad, O. B., Afshar, A. and Mariňo, M. A. (2011) 'Multireservoir optimization in discrete and continuous domains', Proceedings of the Institution of Civil Engineers: Water Management, 164(2), pp. 57–72. doi: 10.1680/wama.900077.

Han, H., Wang, W. Y. and Mao, B. H. (2005) 'Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3644 LNCS, pp. 878–887. doi: 10.1007/11538059_91.

Hegazy, A. E., Makhlouf, M. A. and El-Tawel, G. S. (2020) 'Improved salp swarm algorithm for feature selection', Journal of King Saud University - Computer and Information Sciences. King Saud University, 32(3), pp. 335–344. doi: 10.1016/j.jksuci.2018.06.003.

Holmes, J. H. (2013) Knowledge Discovery in Biomedical Data: Theory and Methods. Error, Methods in Biomedical Informatics: A Pragmatic Approach. Error. Elsevier Inc. doi: 10.1016/B978-0-12-401678-1.00007-5.

Kumar, L. and Bharti, K. K. (2019) An improved BPSO algorithm for feature selection, Lecture Notes in Electrical Engineering. Springer Singapore. doi: 10.1007/978-981-13-2685-1_48.

Mafarja, M. and Mirjalili, S. (2018) 'Whale optimization approaches for wrapper feature selection', Applied Soft Computing. Elsevier B.V., 62, pp. 441–453. doi: 10.1016/j.asoc.2017.11.006.

Marinaki, M., Marinakis, Y. and Zopounidis, C. (2010) 'Honey Bees Mating Optimization algorithm for financial classification problems', Applied Soft Computing Journal. Elsevier B.V., 10(3), pp. 806–812. doi: 10.1016/j.asoc.2009.09.010.

Remeseiro, B. and Bolon-Canedo, V. (2019) 'A review of feature selection methods in medical applications', Computers in Biology and Medicine. MIPRO, 112(May), pp. 25–29. doi: 10.1016/j.compbiomed.2019.103375.

Rodrigues, D. et al. (2014) 'A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest', Expert Systems with Applications. Elsevier Ltd, 41(5), pp. 2250–2258. doi: 10.1016/j.eswa.2013.09.023.

Sayed, G. I., Hassanien, A. E. and Azar, A. T. (2019) 'Feature selection via a novel chaotic crow search algorithm', Neural Computing and Applications. Neural Computing and Applications, 31(1), pp. 171–188. doi: 10.1007/s00521-017-2988-6.

Speiser, J. L. (2021) 'A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data', Journal of Biomedical Informatics. Elsevier Inc., 117(March), p. 103763. doi: 10.1016/j.jbi.2021.103763.

Tubishat, M. et al. (2021) 'Dynamic Salp swarm algorithm for feature selection', Expert Systems with Applications. Elsevier Ltd, 164(November 2019), p. 113873. doi: 10.1016/j.eswa.2020.113873.

Vieira, S. M. et al. (2013) 'Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients', Applied Soft Computing Journal, 13(8), pp. 3494–3504. doi: 10.1016/j.asoc.2013.03.021.

Wang, K. J. et al. (2014) 'A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients', Applied Soft Computing Journal. Elsevier B.V., 20, pp. 15–24. doi: 10.1016/j.asoc.2013.09.014.

Zawbaa, H. M. et al. (2018) 'Large-dimensionality small-instance set feature selection: A hybrid bio-inspired heuristic approach', Swarm and Evolutionary Computation. Elsevier B.V., 42(February), pp. 29–42. doi: 10.1016/j.swevo.2018.02.021.

Chen, C. W., Tsai, Y. H., Chang, F. R., & Lin, W. C. (2020). Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Systems, 37(5), e12553.

Rostami, M., Forouzandeh, S., Berahmand, K., & Soltani, M. (2020). Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics, 112(6), 4370-4384.




DOI: https://doi.org/10.31449/inf.v46i9.4098

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.