Hybrid Machine Learning and Optimization Algorithms for pH-Based Water Quality Classification

Xiaolin Li, Baomeng Pang

Abstract


Water quality—defined through its physical, chemical, and biological parameters—is essential for critical applications such as drinking and irrigation. Among these parameters, pH plays a significant role by influencing metal solubility and nutrient availability, thereby impacting aquatic ecosystems. In this study, Support Vector Classifier (SVC) and Extra Trees Classifier (ETC) were employed to classify water quality based on pH values. To boost classification accuracy, the models were hybridized using two advanced metaheuristic algorithms: Transit Search Optimization Algorithm (TSOA) and Chaos Game Optimization (CGO), resulting in hybrid variants ETTS, ETCG, SVTS, and SVCG. Comprehensive experiments were conducted using standard evaluation metrics. The ETTS model achieved the best performance, with training accuracy of 0.910 and testing accuracy of 0.778, along with a precision of 0.911, recall of 0.910, and F1 score of 0.910 in training. In contrast, the base ETC model recorded training and testing accuracies of 0.881 and 0.750, respectively. Similarly, SVTS and SVCG outperformed the base SVC model, with SVTS achieving training and testing accuracies of 0.894 and 0.760, compared to SVC’s 0.850 and 0.745. The proposed hybrid framework outperforms traditional SVC and ETC models and demonstrates superior classification performance compared to standard non-optimized baselines. This underscores the value of integrating advanced optimization techniques with machine learning for robust and reliable water quality assessment. The framework is a promising tool for environmental monitoring, promoting sustainable water resource management and public health protection.


Full Text:

PDF

References


C. E. Boyd, Water quality: an introduction. Springer Nature, 2019.

M. M. Mekonnen and A. Y. Hoekstra, “Four billion people facing severe water scarcity,” Science advances, vol. 2, no. 2, p. e1500323, 2016. https://doi.org/10.1126/sciadv.1500323

C. Vorosmarty, P. Green, J. Salisbury, and R. Lammers, “Global Water Resources: Vulnerability from Climate Change and Population Growth,” Science, vol. 289, p. 284, Jul. 2000, doi: 10.1126/science.289.5477.284. https://doi.org/10.1126/science.289.5477.284

D. Chapman, “Water Quality Assessments - A Guide to Use of Biota, Sediments and Water in Environmental Monitoring - Second Edition,” Jan. 1992. https://doi.org/10.1201/9781003062103

R. Schwarzenbach et al., “The Challenge of Micropollutants in Aquatic Systems,” Science (New York, N.Y.), vol. 313, pp. 1072–1077, Sep. 2006, doi: 10.1126/science.1127291. https://doi.org/10.1126/science.1127291

J. D. Hem, “Study and interpretation of the chemical characteristics of natural water,” Reston, VA, 1985. doi: 10.3133/wsp2254.

S. Madhav et al., “Water Pollutants: Sources and Impact on the Environment and Human Health BT - Sensors in Water Pollutants Monitoring: Role of Material,” D. Pooja, P. Kumar, P. Singh, and S. Patil, Eds., Singapore: Springer Singapore, 2020, pp. 43–62. https://doi.org/10.1007/978-981-15-0671-0_4

W. Stumm 1924-1999 (viaf)108131256 and J. J. Morgan 1932-2020 (viaf)85249981, Aquatic chemistry : chemical equilibria and rates in natural waters, 3rd ed. New York : Wiley, 1996.

T. M. Mitchell 1951-, Machine Learning. New York SE - xvii, 414 pages : illustrations ; 25 cm: McGraw-Hill, 1997. doi: LK - https://worldcat.org/title/36417892.

M. Al-Adhaileh and F. Alsaade, “Modelling and Prediction of Water Quality by Using Artificial Intelligence,” Sustainability, vol. 13, p. 4259, Apr. 2021. https://doi.org/10.3390/su13084259

J. Zhou, Y. Wang, F. Xiao, Y. Wang, and L. Sun, “Water Quality Prediction Method Based on IGRA and LSTM,” 2018. https://doi.org/10.3390/w10091148

Y. Zhang, P. Thorburn, M. Vilas, and P. Fitch, Machine learning approaches to improve and predict water quality data. 2019. doi: 10.36334/modsim.2019.D5.zhangYiF.

T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer, 2009. https://doi.org/10.1007/978-0-387-21606-5

U. Shafi, R. Mumtaz, H. Anwar, A. M. Qamar, and H. Khurshid, “Surface water pollution detection using internet of things,” in 2018 15th international conference on smart cities: improving quality of life using ICT & IoT (HONET-ICT), IEEE, 2018, pp. 92–96. https://doi.org/10.1109/HONET.2018.8551341

S. H. Abbas, B. H. Khudair, and M. H. Jaafar, “Water quality assessment and total dissolved solids prediction for Tigris river in Baghdad city using mathematical models,” Journal of Engineering Science and Technology, vol. 14, no. 6, pp. 3337–3346, 2019.

F. Muharemi, D. Logofătu, and F. Leon, “Machine learning approaches for anomaly detection of water quality on a real-world data set*,” Journal of Information and Telecommunication, vol. 3, no. 3, pp. 294–307, Jul. 2019. https://doi.org/10.1080/24751839.2019.1565653

H. Lu and X. Ma, “Hybrid decision tree-based machine learning models for short-term water quality prediction,” Chemosphere, vol. 249, p. 126169, 2020. https://doi.org/10.1016/j.chemosphere.2020.126169.

M. Hmoud Al-Adhaileh and F. Waselallah Alsaade, “Modelling and prediction of water quality by using artificial intelligence,” Sustainability, vol. 13, no. 8, p. 4259, 2021. https://doi.org/10.3390/su13084259

S. Janjua, I. Hassan, S. Muhammad, S. Ahmed, and A. Ahmed, “Water management in Pakistan’s Indus Basin: challenges and opportunities,” Water Policy, vol. 23, no. 6, pp. 1329–1343, 2021. https://doi.org/10.2166/wp.2021.068

N. Nasir et al., “Water quality classification using machine learning algorithms,” Journal of Water Process Engineering, vol. 48, p. 102920, 2022. https://doi.org/10.1016/j.jwpe.2022.102920

S. Wanniarachchi and R. Sarukkalige, “A Review on Evapotranspiration Estimation in Agricultural Water Management: Past, Present, and Future,” 2022. https://doi.org/10.3390/hydrology9070123

“Water Quality Data.” [Online]. Available: https://www.kaggle.com/datasets/supriyoain/water-quality-data

C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, pp. 273–297, 1995.

V. N. Vapnik and V. Vapnik, “Statistical learning theory,” 1998.

R. Singla, B. Chambayil, A. Khosla, and J. Santosh, “Comparison of SVM and ANN for classification of eye events in EEG,” Journal of Biomedical Science and Engineering, vol. 4, no. 1, p. 62, 2011.

P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine learning, vol. 63, pp. 3–42, 2006. https://doi.org/10.1007/s10994-006-6226-1

S. Talatahari and M. Azizi, “Chaos game optimization: a novel metaheuristic algorithm,” Artificial Intelligence Review, vol. 54, no. 2, pp. 917–1004, 2021. https://doi.org/10.1007/s10462-020-09867-w

M. Hippke and R. Heller, “Optimized transit detection algorithm to search for periodic transits of small planets,” Astronomy & Astrophysics, vol. 623, p. A39, 2019. https://doi.org/10.1051/0004-6361/201834672

“Evaluation Metrics for Classification.” [Online]. Available: https://medium.com/@impythonprogrammer/evaluation-metrics-for-classification-fc770511052d#:~:text=Accuracy is the most used,out of all the predictions.




DOI: https://doi.org/10.31449/inf.v49i12.7724

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.