Comparison of Machine Learning Algorithms for Predicting Thyroid Disorders in Diabetic Patients
Abstract
Machine Learning (ML), a subfield of Artificial Intelligence (AI), has been used successfully in the healthcare domain for disease diagnosis. Thyroid disorders and diabetes are two of the most prevalent and interconnected chronic diseases, as both play critical roles in regulating various physiological processes in the body. This study aims to predict thyroid disorders in diabetes patients using six machine learning algorithms: Random Forest (RF), Decision Tree (DT), K-Nearest Neighbors (KNN), Logistic Regression (LR), Naïve Bayes (NB), and Support Vector Machine (SVM). A locally sourced dataset comprising 44,534 instances of diabetic patients was utilized, undergoing preprocessing steps including data cleaning, encoding, and balancing. Two balancing techniques were employed: manual balancing and RandomUnderSampler. The dataset was partitioned into training and testing sets using a Stratified K-Fold cross-validation approach with 10 folds to ensure robust evaluation. Each algorithm’s performance was assessed using metrics such as accuracy and F1-score. Among the models, the RF algorithm outperformed the others, achieving the highest accuracy of 95% on the manually balanced dataset and 84% when the RandomUnderSampler technique was employed. Additionally, the F1-scores for RF were 95% and 82%, respectively, indicating its robustness in handling imbalanced datasets. This study highlights the importance of selecting appropriate preprocessing techniques and machine learning methods for healthcare datasets. The findings can assist healthcare providers in making early diagnoses and interventions for thyroid disorders in diabetic patients, potentially improving their quality of life and overall healthcare outcomes.References
F. Rong et al., “Association between thyroid dysfunction and type 2 diabetes: a meta-analysis of prospective observational studies,” BMC Med, vol. 19, no. 1, Dec. 2021, doi: 10.1186/s12916-021-02121-2.
B. Biondi, G. J. Kahaly, and R. P. Robertson, “Thyroid Dysfunction and Diabetes Mellitus: Two Closely Associated Disorders,” Endocr Rev, vol. 40, no. 3, pp. 789–824, Dec. 2018, doi: 10.1210/er.2018-00163.
N. T. Y. Alibrahim, M. G. Chasib, S. S. Hamadi, and A. A. Mansour, “Predictors of Metformin Side Effects in Patients with Newly Diagnosed Type 2 Diabetes Mellitus,” Ibnosina Journal of Medicine and Biomedical Sciences, vol. 15, no. 02, pp. 067–073, Jun. 2023, doi: 10.1055/s-0043-1761215.
I. Tasin, T. U. Nabil, S. Islam, and R. Khan, “Diabetes prediction using machine learning and explainable AI techniques,” Healthc Technol Lett, vol. 10, no. 1–2, pp. 1–10, Feb. 2023, doi: 10.1049/htl2.12039.
S. A. Hassan, A.-K. M. Ali, and R. I. Saleem, “Relationship between glycemic control and different insulin regimens in pediatric type 1 diabetes mellitus,” The Medical Journal of Basrah University, 2023, doi: 10.33762/mjbu.2023.140990.1138.
R. Kumar, P. Saha, S. Sahana, and A. Dubey, “A REVIEW ON DIABETES MELLITUS: TYPE1 & TYPE2,” 2020, doi: 10.20959/wjpps202010-17336.
C. J. McElwain, F. P. McCarthy, and C. M. McCarthy, “Gestational diabetes mellitus and maternal immune dysregulation: What we know so far,” Apr. 02, 2021, MDPI. doi: 10.3390/ijms22084261.
K. Dharmarajan, K. Balasree, A. S. Arunachalam, and K. Abirmai, “Thyroid Disease Classification Using Decision Tree and SVM,” 2020.
M. Nishi, “Diabetes mellitus and thyroid diseases,” May 01, 2018, Springer Tokyo. doi: 10.1007/s13340-018-0352-4.
P. Sharma, S. Shrestha, and P. Kumar, “A review on association between diabetes and thyroid disease,” Santosh University Journal of Health Sciences, vol. 5, no. 2, pp. 50–55, Jan. 2020, doi: 10.18231/j.sujhs.2019.013.
S. Gopal, P. Gaurav, and D. Prateek, Machine learning algorithms using Python programming. New York: Nova Science Publishers, 2021.
A. Panesar, Machine Learning and AI for Healthcare: big data for improved health outcomes. Berkeley, CA: Apress, 2021. doi: https://doi.org/10.1007/978-1-4842-6537-6.
F. Pedro. García Márquez, Handbook of research on big data clustering and machine learning. Engineering Science Reference (an imprint of IGI Global), 2020.
I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Computer Science, vol. 2, no. 3, pp. 1–21, Mar. 2021, doi: https://doi.org/10.1007/s42979-021-00592-x.
Yuxi. (Hayden). Liu, Python Machine Learning by Example Build Intelligent Systems Using Python, TensorFlow 2, Pytorch, and Scikit-Learn, 3rd Edition. Birmingham: Packt Publishing, Limited, 2020.
S. L. Mirtaheri and R. Shahbazian, Machine Learning Theory to Applications. CRC Press, 2022. doi: https://doi.org/10.1201/9781003119258.
D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,” Procedia Computer Science, vol. 132, pp. 1578–1585, 2018, doi: https://doi.org/10.1016/j.procs.2018.05.122.
P. Sonar and K. JayaMalini, "Diabetes Prediction Using Different Machine Learning Approaches," 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2019, pp. 367-371, doi: 10.1109/ICCMC.2019.8819841.
A. H. Khassawneh et al., “Prevalence and Predictors of Thyroid Dysfunction Among Type 2 Diabetic Patients: A Case–Control Study,” International Journal of General Medicine, vol. Volume 13, pp. 803–816, Oct. 2020, doi: https://doi.org/10.2147/ijgm.s273900.
C. Yadav and S. Pal, “Prediction of thyroid disease using decision tree ensemble method,” Human-Intelligent Systems Integration, vol. 2, no. 1–4, pp. 89–95, Apr. 2020, doi: https://doi.org/10.1007/s42454-020-00006-y.
P. Duggal and S. Shukla, "Prediction Of Thyroid Disorders Using Advanced Machine Learning Techniques," 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2020, pp. 670-675, doi: https://doi.org/10.1109/Confluence47617.2020.9058102.
Dudkina, I. Meniailov, K. Bazilevych, S. Krivtsov, and A. Tkachenko, “Classification and Prediction of Diabetes Disease using Decision Tree Method,” Symposium on Information Technologies & Applied Sciences, Bratislava, Slovakia, Mar. 2021. Available: https://ceur-ws.org/Vol-2824/paper16.pdf
G. Chaubey, D. Bisen, S. Arjaria, and V. Yadav, “Thyroid Disease Prediction Using Machine Learning Approaches,” National Academy Science Letters, vol. 44, no. 3, pp. 233–238, May 2020, doi: https://doi.org/10.1007/s40009-020-00979-z.
Samin Poudel, “A Study of Disease Diagnosis using Machine Learning,” 2021, doi: 10.3390/xxxxx.
G. S. Ohannesian and E. J. Harfash, “Epileptic Seizures Detection from EEG Recordings Based on a Hybrid System of Gaussian Mixture Model and Random Forest Classifier,” Informatica (Slovenia), vol. 46, no. 6, 2022, doi: 10.31449/inf.v46i6.4203.
DOI:
https://doi.org/10.31449/inf.v49i12.6927Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







