Integrating Equation-Based Labeling and Classification for Adaptive Turkish Vocabulary Acquisition
Abstract
Traditional vocabulary evaluation techniques frequently emphasize correctness above behavioral indications such as attempts and reaction time. To overcome this gap, our study proposes a machine learning technique that combines behavioral analysis with linguistic insights to discover vocabulary gaps among Turkish language learners. A Support Vector Machine (SVM) model was constructed with a Radial Basis Function (RBF) kernel and refined via grid search to maximize hyperparameters (C=10, γ=0.1) using a dataset of 1,000 interactions from 20 students. Behavioral attributes such as attempt count, answer response time, and answer correctness were collected to quantify student uncertainty and engagement. The approach also integrates word difficulty levels and thematic categories. An equation-based labeling technique was first applied to identify vocabulary weaknesses, laying the foundation for subsequent machine learning classification. The findings demonstrated strong performance, achieving an accuracy of 89%, precision of 86%, recall of 91%, and an F1-score of 88%, surpassing linear and polynomial kernel alternatives. These results underscore the importance of behavioral metrics in adaptive learning systems and support scalable integration into mobile applications.
Full Text:
PDFReferences
Albrecht, C. M., Marianno, F., & Klein, L. J. (2021). Autogeolabel: Automated label generation for geospatial machine learning. Paper presented at the 2021 IEEE International Conference on Big Data (Big Data).
Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of educational data mining, 1(1), 3-17.
Berrar, D. (2019). Bayes’ theorem and naive Bayes classifier. Encyclopedia of Bioinformatics and Computational Biology, 1, 403-412. In.
Bobák, P., Čmolík, L., & Čadík, M. (2023). Reinforced Labels: Multi-agent deep reinforcement learning for point-feature label placement. IEEE Transactions on Visualization Computer Graphics, 30(9), 5908-5922.
Bolton, R. J., & Hand, D. J. (2002). Statistical fraud detection: A review. Statistical science, 17(3), 235-255.
Bratko, I. (1997). Machine learning: Between accuracy and interpretability. In Learning, networks and statistics (pp. 163-177): Springer.
Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2), 48-57.
De Ville, B. (2013). Decision trees. Wiley Interdisciplinary Reviews: Computational Statistics, 5(6), 448-455.
Diallo, R., Edalo, C., & Awe, O. O. (2024). Machine Learning Evaluation of Imbalanced Health Data: A Comparative Analysis of Balanced Accuracy, MCC, and F1 Score. In Practical Statistical Learning and Data Science Methods: Case Studies from LISA 2020 Global Network, USA (pp. 283-312): Springer.
François, T. (2009). Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. Paper presented at the Proceedings of the Student Research Workshop at EACL 2009.
Géron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems: " O'Reilly Media, Inc.".
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification. In: Taipei, Taiwan.
Karatzoglou, A., Meyer, D., & Hornik, K. (2006). Support vector machines in R. Journal of statistical software, 15, 1-28.
Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine, 23(1), 89-109.
Kramer, O. (2013). Dimensionality reduction with unsupervised nearest neighbors (Vol. 51): Springer.
Michaud, E. J., Liu, Z., & Tegmark, M. (2023). Precision machine learning. Entropy, 25(1), 175.
Nam, S., Collins-Thompson, K., Jurgens, D., & Tong, X. (2024). Finding Educationally Supportive Contexts for Vocabulary Learning with Attention-Based Models. Paper presented at the Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).
PHP-ML. (2025). Retrieved from https://php-ml.readthedocs.io/en/latest/
Schmarje, L., Grossmann, V., Michels, T., Nazarenus, J., Santarossa, M., Zelenka, C., & Koch, R. (2023). Label Smarter, Not Harder: CleverLabel for Faster Annotation of Ambiguous Image Classification with Higher Quality. Paper presented at the DAGM German Conference on Pattern Recognition.
Shin, J., & Park, J. (2021). Pedagogical Word Recommendation: A novel task and dataset on personalized vocabulary acquisition for L2 learners. arXiv preprint arXiv:2112.13808.
Simon, L., Webster, R., & Rabin, J. (2019). Revisiting precision and recall definition for generative model evaluation. arXiv preprint arXiv, 05441.
Stember, J. N., & Shalu, H. (2022). Deep reinforcement learning with automated label extraction from clinical reports accurately classifies 3D MRI brain volumes. Journal of digital imaging, 35(5), 1143-1152.
Sulaiman, M., & Roy, K. (2022). Fair classification via transformer neural networks: Case study of an educational domain. arXiv preprint arXiv, 01410.
van der Waa, J., Nieuwburg, E., Cremers, A., & Neerincx, M. (2021). Evaluating XAI: A comparison of rule-based and example-based explanations. Artificial Intelligence in medicine, 291, 103404.
Zhang, F., Zhou, S., Wang, Y., Wang, X., & Hou, Y. (2024). Label assignment matters: A gaussian assignment strategy for tiny object detection. IEEE Transactions on Geoscience Remote Sensing.
Zhang, S., Jafari, O., & Nagarkar, P. (2021). A survey on machine learning techniques for auto labeling of video, audio, and text data. arXiv preprint arXiv, 03784.
Zhang, W., Wang, Y., & Wang, S. (2022). Predicting academic performance using tree-based machine learning models: A case study of bachelor students in an engineering department in China. Education Information Technologies, 27(9), 13051-13066.
DOI: https://doi.org/10.31449/inf.v49i27.8821

This work is licensed under a Creative Commons Attribution 3.0 License.