Application of Algorithms with Variable Greedy Heuristics for k-Medoids Problems

Lev A. Kazakovtsev, Ivan Rozhnov

Abstract


Progress in location theory methods and clustering algorithms is mainly targeted at improving the performance of the algorithms. The most popular clustering models are based on solving the p-median and similar location problems (k-means, k-medoids). In such problems, the algorithm must find several points called cluster centers, centroids, medoids, depending on the specific problem which minimize some function of distances from known objects to the centers. In the the k-medoids problem, the centers (medoids) of the cluster must coincide with one of the clustered objects. The problem is NP-hard, and the efforts of researchers are focused on the development of compromise heuristic algorithms that provide a fairly quick solution with minimal error. In this paper, we propose new algorithms of the Greedy Heuristic Method which use the idea of the Variable Neighborhood Search (VNS) algorithms for solving the k-medoids problem (which is also called the discrete p-median problem). In addition to the known PAM (Partition Around Medoids) algorithm, neighborhoods of a known solution are formed by applying greedy agglomerative heuristic procedures. According to the results of computational experiments, the new search algorithms (Greedy PAM-VNS) give more accurate and stable results (lower average value of the objective function and its standard deviation, smaller spread) in comparison with known algorithms on various data sets.

Povzetek: Avtorji predlagajo nove algoritme za reševanje problema lokacije k-medoidov in grozdenja.


Full Text:

PDF

References


Kazakovtsev L.A. (2016) The greedy heuristics method for systems of automatic grouping of objects. Diss ... Dr. tech. of science. Krasnoyarsk. Siberian Federal University.

Ghosh J., Acharya A. (2011) Cluster ensembles. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 1(4), pp..305−315.

Rozhnov I., Orlov V., Kazakovtsev L. (2018) Ensembles of clustering algorithms for problem of detection of homogeneous production batches of semiconductor devices. CEUR-WS, Vol.2098, pp.338-348.

Drezner Z., Hamacher H. (2004) Facility location: applications and theory. Berlin:Springer-Verlag.

Struyf A., Hubert M., Rousseeuw P. (1997) Clustering in an Object-Oriented Environment. Journal of Statistical Software, Issue 1 (4), pp.1-30.

Kaufman L., Rousseeuw P.J. (1990) Finding groups in data: an introduction to cluster analysis. New York:Wiley.

Moreno-Perez J.A., Roda Garcia J.L., Moreno-Vega J.M. (1994) A Parallel Genetic Algorithm for the Discrete p-Median Problem. Studies in Location Analysis, Issue 7, pp.131-141.

Wesolowsky, G. (1993) The Weber problem: History and perspectives. Location Science, No.1, pp.5-23.

Drezner Z., Wesolowsky G.O. (1978) A Trajectory Method for the Optimization of the Multifacility Location Problem with lp Distances. Management Science, Vol.24, pp.1507-1514.

Farahani R., Hekmatfar M. (2009). Facility location: Concepts, models, algorithms and case studies. Berlin Heidelberg:Springer-Verlag.

Deza M.M., Deza E. (2013) Metrics on Normed Structures. Encyclopedia of Distances. Berlin Heidelberg: Springer, pp.89-99, DOI: 10.1007/978-3-642-30958-85.

Weiszfeld, E. (1937) Sur le point sur lequel la somme des distances de n points donnes est minimum. Tohoku Mathematical Journal, Vol.43, No.1, pp.335-0386.

Kaufman L. and Rousseeuw P.J. (1987) Clustering by means of Medoids. Statistical Data Analysis Based on the L1-Norm and Related Methods, Springer US. pp. 405-416.

Kochetov Yu., Mladenovic N., Hansen P. (2003) Local search with alternating neighborhoods. Discrete analysis and operations research, Series 2, Vol.10(1), pp.11-43.

Nicholson T. A. J. (1965) A sequential method for discrete optimization problems and its application to the assignment, traveling salesman and tree scheduling problems. J. Inst. Math. Appl, Vol.13, pp.362-375.

Page E. S. (1965) On Monte Carlo methods in congestion problems. I: Searching for an optimum in discrete situations. Oper. Res. Vol.13(2), pp. 291-299.

Kernighan B. W., Lin S. (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. Vol.49. pp.291-307.

Gastrigin L.А. (1978) Random search - specificity, stages of history and prejudice. Questions of cybernetics. M.: Nauch. Council on the complex problem "Cybernetics" of the USSR Academy of Sciences, Vol. 33, pp.3-16.

Kochetov Yu.А. (2010) Local search methods for discrete placement problems. Dis. Doctors of Physical and Mathematical Sciences. Novosibirsk.

Hansen P., Mladenovic N., Bruke E.K., Kendall G. (2014) Variable Neighborhood Search. Search Methodology. Springer US. P.211-238, doi: 10.1007/0-387-28356-0_8.

Kazakovtsev, L.A., Antamoshkin, A.N. (2014) Genetic Algorithm with Fast Greedy Heuristic for Clustering and Location Problems. Informatica, Vol.38(3), pp.229-240.

Osman I. H., Laporte G. (1996) Metaheuristics: a bibliography. Ann. Oper. Res, Vol.63. pp.513-628.

Mladenovic N., Hansen P. Variable neighborhood search. Comput. Oper. Res, Vol.24, P.1097-1100.

Hansen P., Mladenovic N. (2001) Variable neighborhood search: principles and applications (invited review). European J. Oper. Res, Vol.130(3), pp.449-467.

Lopez F. G., Batista B. M., Moreno-Perez J., Moreno-Vega M. (2000) The parallel variable neighborhood search for the p-median problem. Res. Rep. Univ. of La Laguna, Spain.

Brimberg J., Mladenovic N. (1996) A variable neighborhood algorithm for solving the continuous location-allocation problem. Stud. Locat. Anal. V. 10. pp. 1-12.

Hansen P., Mladenovic N., Perez-Brito D. (2001) Variable neighborhood decomposition search. J. Heuristics, Vol.7 (4), pp. 335-350.

Goldberg D. E. (1989) Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley.

Houck C.R., Joines J. A., G.Kay. M. (1996) Comparison of Genetic Algorithms, Random Restart, and Two-Opt Switching for Solving Large Location-Allocation Problems. Computers and Operations Research, Vol. 23, pp. 587-596.

Alp O., Erkut E., Drezner Z. (2003) An Efficient Genetic Algorithm for the p-Median Problem. Annals of Operations Research. Vol.122, pp.21-42, DOI:10.1023/A:1026130003508. (2003).

Neema M.N., Maniruzzaman K.M., Ohgai A. (2011) New Genetic Algorithms Based Approaches to Continuous p-Median Problem. Netw. Spat. Econ., Vol.11, pp.83-99, DOI:10.1007/s11067-008-9084-5.

UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. access date 28.03.2019.

Clustering basic benchmark [http://cs.joensuu.fi/sipu/datasets], access date 28.03.2019.

Sheng W., Liu X. (2006) A genetic k-medoids clustering algorithm. Journal of Heuristics. Vol.12, No.6. P. 447-466.

Wilcoxon F. (1945) Individual comparisons by ranking methods. Biometrics Bulletin, Vol.1(6), pp. 80–83, DOI:10.2307/3001968.

Derrac J., Garcia S., Molina D., Herrera F. (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, Vol. 1(1), pp. 3-18, DOI:j.swevo.2011.02.002.




DOI: https://doi.org/10.31449/inf.v44i1.2737

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.