ML-Based Stroke Detection Model using Different Feature Selection Algorithms

Hussein Abdel-Jaber, Ahmed Abdel-Wahab Rakha, Anas Abdualqader Hadi, Nesrine Atitallah, Ali Wagdy Mohamed

Abstract


Stroke occurs in the brain due to the blockage of blood flow carrying oxygen and nutrients or due to sudden bleeding within the brain. Delaying stroke treatment can lead to serious consequences, including death. This paper proposes a model based on classification algorithms in machine learning to detect whether a stroke has occurred. The classification algorithms used in this study are k-nearest neighbours, decision tree, random forest, naïve Bayes, multilayer perceptron and support vector machine. These algorithms were applied to the classification task using different feature selection methods, namely: all features, select K best (SelectKBest), select percentile (SelectPercentile), select false-discovery rate (SelectFdr), select false-positive rate (SelectFpr) and select family-wise error (SelectFwe). This paper compares the performance of the above algorithms using the different feature selection methods to determine which algorithm provides the best classification results in terms of accuracy, recall, precision and F1-score. The decision tree algorithm shows the highest performance in accuracy, precision and F1-score, regardless of the feature selection method used. Both decision tree and random forest yield the highest and identical recall results when the ‘all features’ selection method is applied. For the other feature selection methods, decision tree consistently provides the highest recall results. Performance evaluation was conducted by comparing the proposed model to the most relevant works using different machine learning algorithms. The results indicate that the proposed model outperforms other approaches, particularly with the decision tree algorithm. Statistical results, including means, standard deviations and 95% confidence intervals for all features and the target variable in the stroke dataset, were obtained. Trade-offs between precision and recall results for the compared algorithms are also presented.


Full Text:

PDF

References


J. Pan, G. Wu, J. Yu, D. Geng, J. Zhang, and Y. Wang, “Detecting the Early Infarct Core on Non-Contrast CT Images with a Deep Learning Residual Network,” Journal of Stroke and Cerebrovascular Diseases, vol. 30, no. 6, p. 105752, Jun. 2021.

L. Cui, S. Han, S. Qi, Y. Duan, Y. Kang, and Y. Luo, “Deep symmetric three-dimensional convolutional neural networks for identifying acute ischemic stroke via diffusion-weighted images,” Journal of X-Ray Science and Technology, vol. 29, no. 4, pp. 551–566, Jul. 2021.

M. Shao, Z. Zhou, G. Bin, Y. Bai, and S. Wu, “A Wearable Electrocardiogram Telemonitoring System for Atrial Fibrillation Detection,” Sensors, vol. 20, no. 3, p. 606, Jan. 2020.

M. Kene, D. Ballard, D. Vinson, A. Rauchwerger, H. Iskin, and A. Kim, “Emergency Physician Attitudes, Preferences, and Risk Tolerance for Stroke as a Potential Cause of Dizziness Symptoms,” Western Journal of Emergency Medicine, vol. 16, no. 5, pp. 768–776, Sep. 2015.

Y. Miah, C. N. E. Prima, S. J. Seema, M. Mahmud, and M. Shamim Kaiser, “Performance Comparison of Machine Learning Techniques in Identifying Dementia from Open Access Clinical Datasets,” Advances on Smart and Soft Computing, pp. 79–89, Oct. 2020.

Hussain and S. J. Park, “Big-ECG: Cardiographic Predictive Cyber-Physical System for Stroke Management,” IEEE Access, vol. 9, pp. 123146–123164, 2021.

M. S. Sirsat, E. Fermé, and J. Câmara, “Machine Learning for Brain Stroke: A Review,” Journal of Stroke and Cerebrovascular Diseases, vol. 29, no. 10, p. 105162, Oct. 2020.

T. I. Shoily, T. Islam, S. Jannat, S. A. Tanna, T. M. Alif, and R. R. Ema, “Detection of Stroke Disease using Machine Learning Algorithms,” 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Jul. 2019.

Y. Yang, J. Zheng, Z. Du, Y. Li, and Y. Cai, “Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study,” JMIR Medical Informatics, vol. 9, no. 11, p. e30277, Nov. 2021.

T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis, and M. Monirujjaman Khan, “Stroke Disease Detection and Prediction Using Robust Learning Approaches,” Journal of Healthcare Engineering, vol. 2021, pp. 1–12, Nov. 2021.

B. Akter, A. Rajbongshi, S. Sazzad, R. Shakil, J. Biswas, and U. Sara, “A Machine Learning Approach to Detect the Brain Stroke Disease,” 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), Jan. 2022.

E. M. Alanazi, A. Abdou, and J. Luo, “Predicting Risk of Stroke From Lab Tests Using Machine Learning Algorithms: Development and Evaluation of Prediction Models,” JMIR Formative Research, vol. 5, no. 12, p. e23440, Dec. 2021.

E. Dritsas and M. Trigka, “Stroke Risk Prediction with Machine Learning Techniques,” Sensors, vol. 22, no. 13, p. 4670, Jun. 2022.

T. Ahammad, “Risk factors identification for stroke prognosis using machine learning algorithms,” Jordanian Journal of Computers and Information Technology, no. 0, p. 1, 2022.

S. Dev, H. Wang, C. S. Nwosu, N. Jain, B. Veeravalli, and D. John, “A predictive analytics approach for stroke prediction using machine learning and neural networks,” Healthcare Analytics, vol. 2, p. 100032, Nov. 2022.

C. Sharma, S. Sharma, M. Kumar, and A. Sodhi, “Early Stroke Prediction Using Machine Learning,” 2022 International Conference on Decision Aid Sciences and Applications (DASA), Mar. 2022.

O. Shobayo, O. Zachariah, M. O. Odusami, and B. Ogunleye, “Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm,” Analytics, vol. 2, no. 3, pp. 604–617, Aug. 2023.

Md. Shafiul Azam, Md. Habibullah, and H. Kabir Rana, “Performance Analysis of Various Machine Learning Approaches in Stroke Prediction,” International Journal of Computer Applications, vol. 175, no. 21, pp. 11–15, Sep. 2020.

M. U. Emon, M. S. Keya, T. I. Meghla, Md. M. Rahman, M. S. A. Mamun, and M. S. Kaiser, “Performance Analysis of Machine Learning Approaches in Stroke Prediction,” 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Nov. 2020.

N. Biswas, K. M. M. Uddin, S. T. Rikta, and S. K. Dey, “A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach,” Healthcare Analytics, vol. 2, p. 100116, Nov. 2022.

Rishabh, “healthcare-dataset-stroke-data,” Kaggle, 18-Mar-2021. [Online]. Available: https://www.kaggle.com/code/rishabh057/healthcare-dataset-stroke-data.

O. Kramer, “K-Nearest Neighbors,” Dimensionality Reduction with Unsupervised Nearest Neighbors, pp. 13–23, 2013.

B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021.

B. Shaik and S. Srinivasan, “A Brief Survey on Random Forest Ensembles in Classification Model,” International Conference on Innovative Computing and Communications, pp. 253–260, Nov. 2018.

Bhavani and B. Santhosh Kumar, “A Review of State Art of Text Classification Algorithms,” 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Apr. 2021.

J. Singh and R. Banerjee, “A Study on Single and Multi-layer Perceptron Neural Network,” 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Mar. 2019.

S. Suthaharan, “Support Vector Machine,” Machine Learning Models and Algorithms for Big Data Classification, pp. 207–235, 2016.

Hamza Quddus, “How is the univariate feature selection used in machine learning?,” Educative. [Online]. Available: https://www.educative.io/answers/how-is-the-univariate-feature-selection-used-in-machine-learning.

C. Banerjee, “P value and Feature Selection - Chandradip Banerjee - Medium,” Medium, 14-Nov-2023.




DOI: https://doi.org/10.31449/inf.v48i17.6096

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.