Application of LASSO Algorithm and GBDT Algorithm in Predicting Financial Distress of Companies
Abstract
With the global economy in a downward cycle under the influence of the epidemic, companies are facing a crisis in their business and financial conditions, and most companies are more likely to be in financial distress in a poor economic environment. The existence of concept drift problem makes the actual prediction of financial distress prediction poor or can only solve limited types of concept drift. Most existing research on financial distress prediction methods use machine learning methods, such as random forests, but there are limitations in dealing with concept drift problems, such as difficulty in model updating and data imbalance. Therefore, a study proposes a model that combines the minimum absolute shrinkage and selection operator with gradient boosting tree algorithm to solve the problem of dynamic concept drift and accurately predict the financial difficulties of enterprises. The study selected financial datasets from Chinese A-share listed companies from 2019 to 2022, with selection criteria including but not limited to the company's market value, industry representativeness, and financial information. In order to reduce potential sample bias caused by market structure changes, policy adjustments, and other factors, the study adopts time series and industry stratified sampling methods to ensure the representativeness of the samples. Firstly, conduct a thorough analysis of the two algorithms and apply them to dynamic financial indicator selection in financial samples. Secondly, a comprehensive prediction model is established using the sample similarity index. The experimental results show that the model has high accuracy rates of 92.47% and 92.31% in dynamic environments, with high F values of 85.33% and 85.12%, and G values of 91.78%, 91.65%, and 91.92%, respectively. This prediction model has high accuracy and dynamic stability in solving the concept drift problem in financial distress. The study achieved effective processing of dynamic concept drift for the first time by combining two algorithms and using sample similarity index.
Full Text:
PDFReferences
Kuerten B G, Samuel B, Bonner M J, Ayuku D O, Njuguna F, Taylor S M, Puffer E S. Psychosocial burden of childhood sickle cell disease on caregivers in Kenya. Journal of Pediatric Psychology, 2020, 45(5):561-572.
Cuesta-González M, Paredes-Gazquez J, Ruza C, Fernandez-Olit B. The relationship between vulnerable financial consumers and banking institutions. A qualitative study in Spain. Geoforum, 2021, 119(3):163-176.
Lavikainen P, Aarnio E, Niskanen L, Mantyselka P, Martikainen J. Short-term impact of co-payment level increase on the use of medication and patient- reported outcomes in Finnish patients with type 2 diabetes. Health Policy, 2020, 124(12):1310-1316.
Ohishi M, Fukui K, Okamura K, Itoh Y, Yanagiharaa H. Coordinate optimization for generalized fused Lasso. Communications in Statistics-Theory and Methods, 2021, 50(24):5955-5973.
Luo S, Zhao W, Pan L. Online GBDT with chunk dynamic weighted majority learners for noisy and drifting data streams. Neural Processing Letters, 2021, 53(5):3783-3799.
Kang J, Choi Y J, Kim I, Lee H, Kim H S, Baik S H, Kim N K, Lee K Y. LASSO-based machine learning algorithm for prediction of lymph node metastasis in T1 colorectal cancer. Cancer Research and Treatment: Official Journal of Korean Cancer Association, 2021, 53(3):773-783.
Motamedi F, Pérez-Sánchez H, Mehridehnavi A, Fassihi A, Ghasemi F. Accelerating big data analysis through LASSO-random forest algorithm in QSAR studies. Bioinformatics, 2022, 38(2):469-475.
Jiang C, Jiang W. Lasso algorithm and support vector machine strategy to screen pulmonary arterial hypertension gene diagnostic markers. Scottish Medical Journal, 2023, 68(1):21-31.
Miswan N H, Chan C S, Ng C G. Hospital readmission prediction based on improved feature selection using grey relational analysis and LASSO. Grey Systems: Theory and Application, 2021, 11(4):796-812.
Arumugam P, Kuppan V. A GBDT-SOA approach for the system modelling of optimal energy management in grid-connected micro -grid system. International Journal of Energy Research, 2021, 45(5):6765-6783.
Jing Y, Guo S, Chen F, Wang X, Li K. Dynamic differential pricing of high-speed railway based on improved GBDT train classification and bootstrap time node determination. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(9):16854-16866.
Huang P, Wang L, Hou D, Lin W, Yu J, Zhang G, Zhang H. A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification. Journal of Hydroinformatics, 2021, 23(5):1050-1065.
Ma L, Xiao H, Tao J, Su Z. Intelligent lithology classification method based on GBDT algorithm. Editorial Department of Petroleum Geology and Recovery Efficiency, 2022, 29(1): 21-29.
Li R, Chang C, Justesen J M, Tanigawa Y, Tibshirani R J. Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank. Biostatistics, 2022, 23(2):522-540.
Zhang N, Zhang Y, Sun D, Kim-Chuan T. An efficient linearly convergent regularized proximal point algorithm for fused multiple graphical lasso problems. SIAM Journal on Mathematics of Data Science, 2021, 3(2):524-543.
Luo S, Zhao W, Pan L. Online GBDT with chunk dynamic weighted majority learners for noisy and drifting data streams. Neural Processing Letters, 2021, 53(5):3783-3799.
Zhu H, Li H. Predict prices of second-hand house using gbdt algorithm and PSO algorithm. Frontiers in Economics and Management, 2021, 2(11):513-524.
Slavova-Azmanova N S, Newton J C, Saunders C, Johnson C E. 'Biggest factors in having cancer were costs and no entitlement to compensation'-The determinants of out-of-pocket costs for cancer care through the lenses of rural and outer metropolitan Western Australians. Australian Journal of Rural Health, 2020, 28(6):588-602.
Guo Y, Mustafaoglu Z, Koundal D. Spam detection using bidirectional transformers and machine learning classifier algorithms. Journal of Computational and Cognitive Engineering, 2023, 2(1):5-9.
Afrin S, Shamrat F M J M, Nibir T I, Muntasim M F, Moharram M S, Imran M M, Applicable A N. Supervised machine learning based liver disease prediction approach with LASSO feature selection. Bulletin of Electrical Engineering and Informatics, 2021, 10(6):3369-3376.
Lei H. Financial Index Data Prediction Based on Improved GBDT Model, IEEE international conference on artificial intelligence and computer applications. IEEE, 2021, 13(2): 697-702.
Nykamp K, Anderson M, Powers M, Garcia J, Herrera B, Ho Y Y, et al. Sherloc: a comprehensive refinement of the ACMG-AMP variant classification criteria. Genetics in medicine, 2020, 22(1): 240-241.
DOI: https://doi.org/10.31449/inf.v48i17.6493
This work is licensed under a Creative Commons Attribution 3.0 License.