Parametrized MTree Clusterer for Weka

Marian Cristian Mihăescu, Marius Andrei Ciurez

Abstract


In the area of clustering, proposing or improving new algorithms represents a challenging task due to an already existing well-established list of algorithms and various implementations that allow rapid evaluation against tasks on publicly available datasets. In this work, we present an improved version of the MTree clustering algorithm that has been implemented within the Weka workbench. The algorithmic approach starts from classical metric spaces and integrates parametrized business logic for finding the optimal number of clusters, choosing the division policy and other characteristics. The result is a versatile data structure that may be used in the context of clustering for finding the optimal number of clusters, but mainly for loading datasets, which already have a known structure. Experimental results show the MTree manages to find the right structure in two clustering tasks, although other algorithms fail in various ways. A discussion of topics related to further improvements and experiments on real datasets and tasks is included.


Full Text:

PDF

References


This work was supported by the grant 135C/2021 "Dezvoltarea de aplicații software care integrează algoritmi de învățare automată" cofinanced by the University of Craiova.

Andreas Adolfsson, Margareta Ackerman, and Naomi C Brownstein. “To cluster, or not to cluster: An analysis of clusterability methods”. In: Pattern Recognition 88 (2019), pp. 13–26.

David Arthur and Sergei Vassilvitskii. “kmeans++: The advantages of careful seeding”. In: Proceedings of the eighteenth annual ACMSIAM

symposium on Discrete algorithms. Society for Industrial and Applied Mathematics. 2007, pp. 1027–1035.

Catherine L Blake and Christopher J Merz. UCI repository of machine learning databases, 1998. 1998.

Roberto Caldelli et al. “Fast image clustering of unknown source images”. In: Jan. 2011, pp. 1–5. DOI: 10.1109/WIFS.2010.5711454.

Jianlong Chang et al. “Deep adaptive image clustering”. In: Proceedings of the IEEE International Conference on Computer Vision.

, pp. 5879–5887.

Malika Charrad et al. “NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set”. In: Journal of Statistical

Software 61 (Oct. 2014), pp. 1–36.

Paolo Ciaccia et al. “Indexing metric spaces with m-tree.” In: SEBD. Vol. 97. 1997, pp. 67–86.

Marius Andrei Ciurez. MTree client code.

https://github.com/kyko007/Cordoba/tree/master/MTree. 2019.

Marius Andrei Ciurez and Marian Cristian Mihaescu.

“Improved Architectural Redesign of MTree Clusterer in the Context of Image Segmentation”. In: International Conference on Intelligent Data Engineering and Automated Learning. Springer. 2018, pp. 99–106.

Abhisek Dash et al. “Image Clustering without Ground Truth”. In: CoRR (Oct. 2016).

Jiangzhou Deng, Junpeng Guo, and Yong Wang. “A Novel K-medoids clustering recommendation algorithm based on probability distribution for collaborative filtering”. In: Knowledge-Based Systems (Mar. 2019).

Franc¸ois Deves et al. “Scalable real-time shadows using clustering and metric trees”. In: 2018.

Nameirakpam Dhanachandra, Khumanthem Manglem, and Yambem Jina Chanu. “Image segmentation using K-means clustering algorithm and subtractive clustering algorithm”. In: Procedia Computer Science 54 (2015), pp. 764–771.

Gianni A Di Caro, Frederick Ducatelle, and L Gambardella. “A fully distributed communication-based approach for spatial clustering in robotic swarms”. In: Proceedings of the 2nd Autonomous Robots and Multirobot Systems Workshop (ARMS), affiliated with the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)(Valencia, Spain, June 5). Citeseer. 2012, pp. 153–171. 16 Informatica 37 page 501–yyy M.C. Mihaescu et al.

Herbert Edelsbrunner. Algorithms in combinatorial geometry. Vol. 10. Springer Science & Business Media, 2012.

Frank Eibe, MA Hall, and IH Witten. “The WEKA Workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques”. In: Morgan Kaufmann (2016).

Ahmed Ali Abdalla Esmin, Rodrigo A. Coelho, and Stan Matwin. “A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data”. In: Artif. Intell. Rev. 44.1 (2015), pp. 23–45.

Adil Fahad et al. “A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis”. In: IEEE Trans. Emerging Topics

Comput. 2.3 (2014), pp. 267–279.

Pasi Fr¨anti and Sami Sieranoja. K-means properties

on six clustering benchmark datasets. 2018. URL: http : / / cs . uef . fi / sipu/datasets/.

Guojun Gan, Chaoqun Ma, and Jianhong Wu. Data clustering: theory, algorithms, and applications. Vol. 20. Siam, 2007.

Farid Garcia-Lamont et al. “Automatic computing of number of clusters for color image segmentation employing fuzzy c-means by extracting chromaticity features of colors”. In: Pattern Analysis and Applications (July 2018).

Melvin Gauci et al. “Clustering objects with robots that do not compute”. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems. 2014, pp. 421–428.

Angel Castellanos Gonza´les, Juan Manuel Cigarr´an, and Ana Garcıa-Serrano. “Formal concept analysis for topic detection: A clustering quality experimental analysis”. In: Inf. Syst. 66 (2017), pp. 24–42.

Suchita Goswami and Lalit Kumar P Bhaiya. “Brain tumour detection using unsupervised learning based neural network”. In: 2013 International Conference on Communication Systems and Network Technologies. IEEE. 2013, pp. 573–577.

Sudipto Guha and Nina Mishra. “Clustering Data Streams”. In: Data Stream Management - Processing High-Speed Data Streams. Ed. by

Minos N. Garofalakis, Johannes Gehrke, and Rajeev Rastogi. Springer, 2016, pp. 169–187.

Givanna H. Putri et al. “ChronoClust: Densitybased clustering and cluster tracking in highdimensional time-series data”. In: Knowledge-

Based Systems 174 (Feb. 2019).

K. Anil Jain and Aditya Vailaya. “Image retrieval using color and shape”. In: Pattern Recognition 29.8 (1996), pp. 1233–1244.

Ismo K¨arkk¨ainen and Pasi Fr¨anti. “Gradual model generator for single-pass clustering”. In: Pattern Recognition 40.3 (2007), pp. 784–795.

Manish Maheshwari, Sanjay Silakari, and Mahesh Motwani. “Image clustering using color and texture”. In: Computational Intelligence,

Communication Systems and Networks. IEEE, 2009, pp. 403–408.

Marian Cristian Mihaescu. MTree Clusterer. Accessed: 2019-05-30. URL: http://weka.sourceforge . net/packageMetaData /MTreeClusterer/index.html.

Marian Cristian Mihaescu and Dumitru Dan Burdescu. “Using M tree data structure as unsupervised classification method”. In: Informatica

2 (2012).

Anton Milan et al. “Joint tracking and segmentation of multiple targets”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12,

2015, pp. 5397–5406.

Jose A Mi˜narro-Gim´enez, Markus Kreuzthaler, and Stefan Schulz. “Knowledge Extraction from MEDLINE by Combining Clustering with Natural Language Processing”. In: AMIA Annual Symposium Proceedings. Vol. 2015. American Medical Informatics Association. 2015, p. 915.

Traian Rebedea, Costin-Gabriel Chiru, and Stefan Trausan-Matu. News Web Portal based on Natural Language Processing.

Mohammad Rezaei and Pasi Fr¨anti. “Set matching measures for external cluster validity”. In: IEEE Transactions on Knowledge and

Data Engineering 28.8 (2016), pp. 2173–2186.

Hermes Robles et al. “LEAC: An efficient library for clustering with evolutionary algorithms”. In: Knowledge-Based Systems (May 2019).

´ Erick Oliveira Rodrigues et al. “K-MS: a novel clustering algorithm based on morphological reconstruction”. In: Pattern Recognition 66

(2017), pp. 392–403.

Tiago Rodrigues Lopes dos Santos and Luis E. Z´arate. “Categorical data clustering: What similarity measure to recommend?” In: Expert Syst. Appl. 42.3 (2015), pp. 1247–1260.

Lincoln F Silva et al. “Hybrid analysis for indicating patients with breast cancer using temperature time series”. In: Computer methods and programs in biomedicine 130 (2016), pp. 142–153.

Jeffrey K Uhlmann. “Satisfying general proximity/ similarity queries with metric trees”. In: Information processing letters 40.4 (1991), pp. 175–179.

Ulrike Von Luxburg, Robert CWilliamson, and Isabelle Guyon. “Clustering: Science or art?” In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning. 2012, pp. 65–79.

Zhimin Wang et al. “Adaptive spatial information-theoretic clustering for image segmentation”. In: Pattern Recognition (Sept.

, pp. 2029–2044.

Pavel Zezula et al. Similarity search: the metric space approach. Vol. 32. Springer Science & Business Media, 2006.

Shibing Zhou and Zhenyuan Xu. “Automatic grayscale image segmentation based on Affinity Propagation clustering”. In: Pattern Analysis and Applications (Feb. 2019).




DOI: https://doi.org/10.31449/inf.v46i4.3565

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.