A Random Forest-Based Machine Learning Framework with PCA, SMOTE, and SHAP for Efficient and Interpretable Coronary Artery Disease Prediction

Aswani T, Dr.Jose Moses Gummadi, Dr.Sharada G.

Abstract


Given that coronary artery disease (CAD) is a major global cause of morbidity and mortality, there is an urgent need for precise and scalable diagnostic tools. While conventional machine learning (ML) models such as XGBoost and Gradient Boosting have demonstrated good predictive performance, they suffer from limitations, including weak handling of class imbalance, redundant feature spaces, and lack of interpretability. This work proposes an optimized Random Forest-based framework for CAD prediction to address these gaps, integrating advanced feature engineering and optimization techniques. Specifically, dimensionality reduction is achieved using principal component analysis (PCA), class imbalance is handled through the Synthetic Minority Oversampling Technique (SMOTE), and hyperparameter optimization is performed via GridSearchCV, tuning parameters such as the number of estimators, maximum depth, and minimum samples split. Additionally, SHAP (Shapley Additive exPlanations) values enhance interpretability by illustrating the contribution of each feature to the model's predictions; for example, features such as chest pain type and cholesterol level are shown to influence CAD outcomes significantly. The proposed framework is evaluated on the UCI Heart Disease dataset comprising 303 samples. Experimental results demonstrate that the optimized Random Forest model achieves an accuracy of 95.0%, outperforming Gradient Boosting (93.08%) and XGBoost (92.4%) classifiers. This framework provides a clinically relevant, interpretable, and scalable solution for CAD prediction, bridging the gap between technical advancements and their practical deployment in healthcare environments.


Full Text:

PDF


DOI: https://doi.org/10.31449/inf.v49i22.7998

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.