Parallel Support Vector Machines for Multi-Label Classification in Imbalanced Databases
Abstract
We propose a multi-label classification mining method using parallel support vector machines for imbalanced sample databases. The samples within the unbalanced sample database are partitioned into the majority sub-cluster and the minority sub-cluster by means of the hierarchical clustering algorithm, thereby achieving the oversampling of the unbalanced sample database. Using hierarchical clustering algorithm to divide into majority and minority sub clusters, complete oversampling of imbalanced sample database. Clustering itself does not directly generate new samples, but it divides the data into sub clusters, allowing oversampling to be more targeted in the sub clusters of minority classes, which can avoid noise or overfitting problems caused by blind oversampling. The role of clustering algorithms is to provide structured data partitioning basis for oversampling. Improve the accuracy of minority class classification in imbalanced sample databases through parallel computing, and use MapReduce to solve SVM dual problems in parallel to optimize hyperplanes for multi label classification. By using the Map function to divide the training sample set into small sample sets and train support vector machines, these support vector machines are then integrated in the Reduce stage to train a new support vector machine as the final decision function, in order to efficiently handle multi label classification problems. The experimental results show that the studied method consistently maintains a high accuracy of 0.95 or higher on the G-means index, far exceeding the comparison methods; In terms of acceleration ratio, when the sample size increased from 1000 to 10000, the acceleration ratio of our method steadily improved from 1.0 to 2.5, while the two comparison methods only reached 1.5 and 2.0 respectively, and there were significant fluctuations.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i8.8350
This work is licensed under a Creative Commons Attribution 3.0 License.








