Optimizing Public Hospital Budgets Using Ensemble Machine Learning and SHAP Analysis for Interpretable Cost Prediction
Abstract
Public hospitals are in a position of growing economic pressure, and frugal resource management is necessary. Unfortunately, most traditional cost forecasting models do not capture healthcare costs' dynamic and non-linear nature. This paper offers a financial optimization framework based on AI with Ensemble Machine learning techniques that are interpretable. This methodology identifies the data preprocessing, feature engineering, and model training with the optimized Random Forest and XGBoost algorithms and SHAP (Shapley Additive exPlanations) analysis for model interpretability. The results report that generating our optimized XGBoost model led to an R² score of 0.89, outperforming Random Forest (R² = 0.88) and our baseline models. It also achieved a Mean Absolute Error (MAE) of 2502.36 and a Mean Squared Error (MSE) of 11230456.12, which is very high in predictive accuracy. Interpretability is achieved using SHAP (Shapley Additive exPlanations) analysis, which identifies key cost-driving factors such as smoking status, BMI, and age, enabling more transparent and informed decision-making by stakeholders. With the framework, we present a scalable predictive budgeting and decision-making solution in public healthcare institutions
Full Text:
PDFReferences
A. Shiwlani, M. Khan, A. M. K. Sherani, M. U. Qayyum, and H. K. Hussain, "REVOLUTIONIZING HEALTHCARE: THE IMPACT OF ARTIFICIAL INTELLIGENCE ON PATIENT CARE, DIAGNOSIS, AND TREATMENT," JURIHUM: Jurnal Inovasi dan Humaniora, vol. 1, no. 5, pp. 779-790, 2024.
K. J. Prabhod, "The Role of Artificial Intelligence in Reducing Healthcare Costs and Improving Operational Efficiency," Quarterly Journal of Emerging Technologies and Innovations, vol. 9, no. 2, pp. 47-59, 2024.
D. Brunner, C. Legat, and U. Seebacher, "Towards Next Generation Data-Driven Management," Collective Intelligence: The Rise of Swarm Systems and their Impact on Society, p. 152, 2024.
N. A. Wani, R. Kumar, J. Bedi, and I. Rida, "Explainable AI-driven IoMT fusion: Unravelling techniques, opportunities, and challenges with Explainable AI in healthcare," Information Fusion, p. 102472, 2024.
A. Vimont, H. Leleu, and I. Durand-Zaleski, "Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France," The European Journal of Health Economics, vol. 23, no. 2, pp. 211-223, 2022.
M. Mazumdar et al., "Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data," BMC health services research, vol. 20, pp. 1-12, 2020.
B. Langenberger, T. Schulte, and O. Groene, "The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data," PloS one, vol. 18, no. 1, p. e0279540, 2023.
L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.
S. Ramraj, N. Uzir, R. Sunil, and S. Banerjee, "Experimenting XGBoost algorithm for prediction and classification of different datasets," International Journal of Control Theory and Applications, vol. 9, no. 40, pp. 651-662, 2016.
S. Nanglia, M. Ahmad, F. A. Khan, and N. Jhanjhi, "An enhanced Predictive heterogeneous ensemble model for breast cancer prediction," Biomedical Signal Processing and Control, vol. 72, p. 103279, 2022.
J. Abdollahi, B. Nouri-Moghaddam, and M. Ghazanfari, "Deep Neural Network Based Ensemble learning Algorithms for the healthcare system (diagnosis of chronic diseases)," arXiv preprint arXiv:2103.08182, 2021.
H. Kwon, J. Park, and Y. Lee, "Stacking ensemble technique for classifying breast cancer," Healthcare informatics research, vol. 25, no. 4, pp. 283-288, 2019.
F. Ali et al., "A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion," Information Fusion, vol. 63, pp. 208-222, 2020.
D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, "Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM)," Diagnostics, vol. 11, no. 9, p. 1714, 2021.
A. Y. Krishna, K. R. Kiran, N. R. Sai, A. Sharma, S. P. Praveen, and J. Pandey, "Ant Colony Optimized XGBoost for Early Diabetes Detection: A Hybrid Approach in Machine Learning," Journal of Intelligent Systems & Internet of Things, vol. 10, no. 2, 2023.
K. Amarasinghe, K. T. Rodolfa, H. Lamba, and R. Ghani, "Explainable machine learning for public policy: Use cases, gaps, and research directions," Data & Policy, vol. 5, p. e5, 2023.
A. Tursunalieva, D. L. Alexander, R. Dunne, J. Li, L. Riera, and Y. Zhao, "Making Sense of Machine Learning: A Review of Interpretation Techniques and Their Applications," Applied Sciences, vol. 14, no. 2, p. 496, 2024.
M. Van der Schaar et al., "How artificial intelligence and machine learning can help healthcare systems respond to COVID-19," Machine Learning, vol. 110, pp. 1-14, 2021.
W. Ding, M. Abdel-Basset, H. Hawash, and A. M. Ali, "Explainability of artificial intelligence methods, applications and challenges: A comprehensive survey," Information Sciences, vol. 615, pp. 238-292, 2022.
N. Rane, S. Choudhary, and J. Rane, "Explainable Artificial Intelligence (XAI) in healthcare: Interpretable Models for Clinical Decision Support," Available at SSRN 4637897, 2023.
M. Liu, Y. Ning, H. Yuan, M. E. H. Ong, and N. Liu, "Balanced background and explanation data are needed in explaining deep learning models with SHAP: An empirical study on clinical decision making," arXiv preprint arXiv:2206.04050, 2022.
M. A. Shakir et al., "Developing Interpretable Models for Complex Decision-Making," in 2024 36th Conference of Open Innovations Association (FRUCT), 2024: IEEE, pp. 66-75.
P. N. Srinivasu, N. Sandhya, R. H. Jhaveri, and R. Raut, "From blackbox to explainable AI in healthcare: existing tools and case studies," Mobile Information Systems, vol. 2022, no. 1, p. 8167821, 2022.
S. Singhal, "Cost optimization and affordable health care using AI," International Machine learning journal and Computer Engineering, vol. 6, no. 6, pp. 1-12, 2023.
A. K. Leist et al., "Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences," Science Advances, vol. 8, no. 42, p. eabk1942, 2022.
J. Amann, "Machine learning in stroke medicine: Opportunities and challenges for risk prediction and prevention," Artificial Intelligence in Brain and Mental Health: Philosophical, Ethical & Policy Issues, pp. 57-71, 2022.
M. Ordu, E. Demir, C. Tofallis, and M. M. Gunal, "A novel healthcare resource allocation decision support tool: A forecasting-simulation-optimization approach," Journal of the operational research society, vol. 72, no. 3, pp. 485-500, 2021.
S. Joshi et al., "Modeling conceptual framework for implementing barriers of AI in public healthcare for improving operational excellence: experiences from developing countries," Sustainability, vol. 14, no. 18, p. 11698, 2022.
D. Patil, N. Rane, P. Desai, and J. Rane, "Machine learning and deep learning: Methods, techniques, applications, challenges, and future research opportunities," Trustworthy Artificial Intelligence in Industry and Society, pp. 28-81, 2024.
J. Rane, S. Mallick, O. Kaya, and N. Rane, "Scalable and adaptive deep learning algorithms for large-scale machine learning systems," Future Research Opportunities for Artificial Intelligence in Industry 4.0 and, vol. 5, pp. 2-40, 2024.
R. Ramya, S. Priya, P. Thamizhikkavi, and M. Anand, "The Pillars of AI Ethics: Transparency, Accountability, and Privacy," in Responsible Implementations of Generative AI for Multidisciplinary Use: IGI Global, 2025, pp. 85-110.
DOI: https://doi.org/10.31449/inf.v49i22.7981

This work is licensed under a Creative Commons Attribution 3.0 License.