A Comparative Analysis of Extreme Gradient Boosting, Decision Tree, Support Vector Machines, and Random Forest Algorithm in Data Analysis of College Students' Psychological Health
Abstract
To solve the problem of identifying the mental health status of college students, this study investigated the psychological conditions of students in a certain department of a university in Hubei Province through a questionnaire survey using the SCL - 90 scale. It combined machine learning algorithms to analyze the applicability of the model and explore the differences between students with healthy and sub - healthy mental states. Data (including basic information) of 500 students were randomly collected. A self - compiled questionnaire was used in combination with on - site scoring by psychological teachers to classify the mental states of the 500 students into healthy and sub - healthy states. Questionnaire data were analyzed through decision tree, support vector machine, random forest, and XGBOOST algorithms to quickly identify the healthy and sub - healthy states and to mine the behavioral characteristics that have a certain correlation with the mental health status of students. The data information of 500 students was modeled respectively, and the classification effects of the models were evaluated through accuracy, precision, recall, F1 - score, and AUC. The results showed that among the four methods, the random forest had the best classification effect, with an R2 score of 0.8891, which was higher than the R2 score of 0.8393 for the decision tree, the R2 score of 0.8840 for the support vector machine, and the R2 score of 0.8618 for the XGBOOST algorithm. Considering the advantages of the random forest in terms of classification performance, modeling time, interpretability, feature selection, and simplicity, we recommend using the random forest model to assist in the diagnosis of mental health status classification. The experimental results on the SCL - 90 scale survey and the student basic information dataset show that the proposed model has high accuracy and can converge quickly, enabling more effective and accurate prediction of students' mental health status.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i15.7004

This work is licensed under a Creative Commons Attribution 3.0 License.