Hybrid Variable-Length Spider Monkey Optimization with Good-Point Set Initialization for Data Clustering

Athraa Qays Obaid, Maytham Alabbas


Data clustering refers to grouping data points that are similar in some way. This can be done in accordance with their patterns or characteristics. It can be used for various purposes, including image analysis, pattern recognition, and data mining. The K-means algorithm, commonly used for clustering, is subject to limitations, such as requiring the number of clusters to be specified and being sensitive to initial center points. To address these limitations, this study proposes a novel method to determine the optimal number of clusters and initial centroids using a variable-length spider monkey optimization algorithm (VLSMO) with a hybrid proposed measure. Results of experiments on real-life datasets demonstrate that VLSMO performs better than the standard k-means in terms of accuracy and clustering capacity.

Full Text:



I. Aljarah, H. Faris, and S. Mirjalili, Evolutionary data clustering: Algorithms and applications. Springer, 2021.

S. F. Raheem and M. Alabbas, "Optimal k-means clustering using artificial bee colony algorithm with variable food sources length," International Journal of Electrical & Computer Engineering (2088-8708), vol. 12, no. 5, 2022.

C. Yuan and H. Yang, "Research on K-value selection method of K-means clustering algorithm," J, vol. 2, no. 2, pp. 226-235, 2019.

S. Saatchi and C. C. Hung, "Hybridization of the ant colony optimization with the k-means algorithm for clustering," in Image Analysis: 14th Scandinavian Conference, SCIA 2005, Joensuu, Finland, June 19-22, 2005. Proceedings 14, 2005: Springer, pp. 511-520.

A. Kumar, D. Kumar, and S. Jarial, "A novel hybrid K-means and artificial bee colony algorithm approach for data clustering," Decision Science Letters, vol. 7, no. 1, pp. 65-76, 2018.

M. Neshat, S. F. Yazdi, D. Yazdani, and M. Sargolzaei, "A new cooperative algorithm based on PSO and k-means for data clustering," Journal of Computer Science, vol. 8, no. 2, p. 188, 2012.

B. Li, "An experiment of k-means initialization strategies on handwritten digits dataset," Intelligent Information Management, vol. 10, no. 2, pp. 43-48, 2018.

Y. Li, Z. Ni, F. Jin, J. Li, and F. Li, "Research on clustering method of improved glowworm algorithm based on good-point set," Mathematical Problems in Engineering, vol. 2018, 2018.

Z. Bin, G. Zhichun, and H. Qiangqiang, "A Genetic Clustering Method Based on Variable Length String," in 2019 2nd International Conference on Safety Produce Informatization (IICSPI), 2019: IEEE, pp. 460-464.

G. Komarasamy and A. Wahi, "An optimized K-means clustering technique using bat algorithm," European Journal of Scientific Research, vol. 84, no. 2, pp. 263-273, 2012.

T. Hassanzadeh and M. R. Meybodi, "A new hybrid approach for data clustering using firefly algorithm and K-means," in The 16th CSI international symposium on artificial intelligence and signal processing (AISP 2012), 2012: IEEE, pp. 007-011.

G. Zhu and S. Kwong, "Gbest-guided artificial bee colony algorithm for numerical function optimization," Applied mathematics and computation, vol. 217, no. 7, pp. 3166-3173, 2010.

S. F. Raheem and M. Alabbas, "Fuzzy logic-based self-adaptive artificial bee colony algorithm," in AIP Conference Proceedings, 2023, vol. 2591, no. 1: AIP Publishing.

D. Karaboga and B. Akay, "A modified artificial bee colony (ABC) algorithm for constrained optimization problems," Applied soft computing, vol. 11, no. 3, pp. 3021-3031, 2011.

M. Alabbas and A. Abdulkareem, "Hybrid artificial bee colony algorithm with multi-using of simulated annealing algorithm and its application in attacking of stream cipher systems," Journal of Theoretical and Applied Information Technology, vol. 97, pp. 23-33, 01/15 2019.

J. C. Bansal, H. Sharma, S. S. Jadon, and M. Clerc, "Spider monkey optimization algorithm for numerical optimization," Memetic computing, vol. 6, pp. 31-47, 2014.

K. P. Sinaga and M.-S. Yang, "Unsupervised K-means clustering algorithm," IEEE access, vol. 8, pp. 80716-80727, 2020.

G. S. Ohannesian and E. J. Harfash, "Epileptic Seizures Detection from EEG Recordings Based on a Hybrid system of Gaussian Mixture Model and Random Forest Classifier," Informatica, vol. 46, no. 6, 2022.

S. F. Raheem and M. Alabbas, "Dynamic Artificial Bee Colony Algorithm with Hybrid Initialization Method," Informatica, vol. 45, no. 6, 2021.

C. Blake and C. Merz, "UCI repository of machine learning databases, 1998).(http," archive. ics. uci. edu/ml/index. PHP.

V.-P. Ha, T.-K. Dao, N.-Y. Pham, and M.-H. Le, "A variable-length chromosome genetic algorithm for time-based sensor network schedule optimization," Sensors, vol. 21, no. 12, p. 3990, 2021.

L. Cruz-Piris, I. Marsa-Maestre, and M. A. Lopez-Carmona, "A variable-length chromosome genetic algorithm to solve a road traffic coordination multipath problem," IEEE Access, vol. 7, pp. 111968-111981, 2019.

DOI: https://doi.org/10.31449/inf.v47i8.4872

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.