A New Ensemble Self-labeled Semi-supervised Algorithm

Ioannis E. Livieris

Abstract


As an alternative to traditional classification methods, semi-supervised learning algorithms have become a hot topic of significant research, exploiting the knowledge hidden in the unlabeled data for building powerful and effective classifiers. In this work, a new ensemble-based semi-supervised algorithm is proposed which is based on a maximum-probability voting scheme. The reported numerical results illustrate the efficacy of the proposed algorithm outperforming classical semi-supervised algorithms in term of classification accuracy, leading to more efficient and robust predictive models.

Full Text:

PDF

References


David W. Aha. Lazy Learning. Dordrecht: Kluwer Academic Publishers, 1997.

https://doi.org/10.1007/978-94-017-2053-3

Jesús Alcalá-Fdez, Alberto Fernández, Julián Luengo, Joaquín Derrac, Salvador García, Luciano Sánchez, and Francisco Herrera. Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 17,

https://doi.org/10.1109/nwesp.2011.6088224

Ethem Alpaydin. Introduction to Machine Learning. MIT Press, Cambridge, 2nd edition, 2010.

https://doi.org/10.1017/s0269888906220745

Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In 11th annual conference on Computational learning theory, pages 92–100. ACM, 1998.

https://doi.org/10.1109/icdm.2001.989574

William W. Cohen. Fast effective rule induction. In International Conference on Machine Learning, pages 115–123, 1995. https://doi.org/10.1016/b978-1-55860-377-6.50023-2

Bozidara Cvetkovic, Boštjan Kaluza, Mitja Luštrek, and Matjaz Gams. Semi-supervised learning for adaptation of human activity recognition classifier to the user. In Proceddings of International Joint Conference on Artificial Intelligence, pages 24–29, 2011.

Asif Ekbal and Sivaji Bandyopadhyay. Named entity recognition using appropriate unlabeled data, postprocessing and voting. Informatica, 34(1), 2010.

Helmut Finner. On a monotonicity problem in stepdown multiple test procedures. Journal of the American Statistical Association, 88(423):920–923, 1993.

https://doi.org/10.2307/2290782

Matjaž Gams. Weak intelligence: through the principle and paradox of multiple knowledge. Nova Science, 2001. A New Ensemble Semi-supervised Self-labeled Algorithm Informatica 43 (2019) 221–234 233

Salvador García, Alberto Fernández, Julián Luengo, and Francisco Herrera. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 180(10):2044–2064, 2010.

https://doi.org/10.1016/j.ins.2009.12.010

Hristijan Gjoreski, Boštjan Kaluža, Matjaž Gams, Radoje Milić, and Mitja Luštrek. Context-based ensemble method for human energy expenditure estimation. Applied Soft Computing, 37:960–970, 2015.

https://doi.org/10.1016/j.asoc.2015.05.001

Tao Guo and Guiyang Li. Improved tri-training with unlabeled data. Software Engineering and Knowledge Engineering: Theory and Practice, pages 139–147, 2012.

https://doi.org/10.1007/978-3-642-25349-2 19

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The WEKA data mining software: An update. SIGKDD Explorations Newsletters, 11:10–18, 2009.

https://doi.org/10.1145/1656274.1656278

Kyaw Kyaw Htike. Hidden-layer ensemble fusion of MLP neural networks for pedestrian detection. Informatica, 41(1), 2017.

Ludmila I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. McGraw Hill, John Wiley & Sons, Inc., second edition, 2014.

https://doi.org/10.1002/9781118914564

Jurica Levatić, Sašo Džeroski, Fran Supek, and Tomislav Šmuc. Semi-supervised learning for quantitative structure-activity modeling. Informatica, 37(2), 2013.

Ming Li and Zhi-Hua Zhou. Improve computeraided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 37(6):1088–1098, 2007.

https://doi.org/10.1109/tsmca.2007.904745

Chang Liu and Pong C. Yuen. A boosted co-training algorithm for human action recognition. IEEE transactions on circuits and systems for video technology, 21(9):1203–1213, 2011.

https://doi.org/10.1109/tcsvt.2011.2130270

Ioannis E. Livieris, Ioannis Dimopoulos, Thedore Kotsilieris, and Panagiotis Pintelas. Predicting length of stay in hospitalized patients using ssl algorithms. In ACM 8th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Infoexclusion, pages 1–8, 2018.

https://doi.org/10.1145/3218585.3218588

Ioannis E. Livieris, Konstantina Drakopoulou, Vassilis Tampakas, Tassos Mikropoulos, and Panagiotis Pintelas. Predicting secondary school students’ performance utilizing a semi-supervised learning approach. Journal of Educational Computing Research, 2018.

https://doi.org/10.1177/0735633117752614

Ioannis E. Livieris, Konstantina Drakopoulou, Vassilis Tampakas, Tassos Mikropoulos, and Panagiotis Pintelas. Research on e-Learning and ICT in Education, chapter An ensemble-based semi-supervised approach for predicting students’ performance, page 25-42. Springer, 2018.

https://doi.org/10.1007/978-3-319-95059-4 2

Ioannis E. Livieris, Andreas Kanavos, Vassilis Tampakas, and Panagiotis Pintelas. An ensemble SSL algorithm for efficient chest x-ray image classification. Journal of Imaging, 4(7), 2018.

https://doi.org/10.3390/jimaging4070095

Ioannis E. Livieris, Tassos Mikropoulos, and Panagiotis Pintelas. A decision support system for predicting students’ performance. Themes in Science and Technology Education, 9:43–57, 2016.

Christopher J. Merz. Using correspondence analysis to combine classifiers. Machine Learning, 36:33–58, 1999.

https://doi.org/10.1023/A:1007559205422

Vincent Ng and Claire Cardie. Weakly supervised natural language learning without redundant views. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 94–101. Association for Computational Linguistics, 2003.

https://doi.org/10.3115/1073445.1073468

J. Ross Quinlan. C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco, 1993.

https://doi.org/10.1007/BF00993309

Matteo Re and Giorgio Valentini. Advances in Machine Learning and Data Mining for Astronomy, chapter Ensemble methods: A review, pages 563–594. Chapman & Hall, 2012.

https://doi.org/10.1201/b11822-34

Lior Rokach. Pattern Classification Using Ensemble Methods. World Scientific Publishing Company, 2010.

https://doi.org/10.1142/7238

Moumita Roy, Susmita Ghosh, Ashish Ghosh. A novel approach for change detection of remotely sensed images using semi-supervised multiple classifier system. Information Sciences, 269:35–47, 2014.

https://doi.org/10.1016/j.ins.2014.01.037

S.K. Satapathy, A.K. Jagadev, and S. Dehuri. An empirical analysis of different machine learning techniques for classification of EEG signal to detect epileptic seizure. Informatica, 41(1), 2017.

Sandeep Kumar Satapathy, Alok Kumar Jagadev, and Satchidananda Dehuri. Weighted majority voting based ensemble of classifiers using different machine learning techniques for classification of EEG signal to detect epileptic seizure. Informatica, 41(1):99, 2017.

Gasper Slapniˇcar, Mitja Luštrek, and Matej Marinko. Continuous blood pressure estimation from PPG signal. Informatica, 42(1), 2018.

Shiliang Sun and Feng Jin. Robust co-training. International Journal of Pattern Recognition and Artificial Intelligence, 25(07):1113–1126, 2011.

https://doi.org/10.1142/s0218001411008981

Shiliang Sun and Qingjiu Zhang. Multipleview multiple-learner semi-supervised learning. Neural processing letters, 34(3):229, 2011.

https://doi.org/10.1007/s11063-011-9195-8

Isaac Triguero, Salvador García, and Francisco Herrera. SEG-SSC: A framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Transactions on Cybernetics, 45:622–634, 2014.

https://doi.org/10.1109/tcyb.2014.2332003

Isaac Triguero, Salvador García, and Francisco Herrera. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 42(2):245–284, 2015.

https://doi.org/10.1007/s10115-013-0706-y

Isaac Triguero, José A. Sáez, Julián Luengo, Salvador García, and Francisco Herrera. On the characterization of noise filters for selftraining semi-supervised in nearest neighbor classification. Neurocomputing, 132:30–41, 2014.

https://doi.org/10.1016/j.neucom.2013.05.055

Julius Venskus, Povilas Treigys, Jolita Bernatavičiene, Viktor Medvedev, Miroslav Voznak, Mindaugas Kurmis, and Violeta Bulbenkiene. Integration of a self-organizing map and a virtual pheromone for real-time abnormal movement detection in marine traffic. Informatica, 28(2):359–374, 2017.

Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, and Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg Top 10 algorithms in data mining. Knowledge and information systems, 14(1):1–37, 2008.

https://doi.org/10.1201/9781420089653

Qian Xu, Derek Hao Hu, Hong Xue, Weichuan Yu, and Qiang Yang. Semi-supervised protein subcellular localization. BMC bioinformatics, 10(1):S47, 2009.

https://doi.org/10.1186/1471-2105-10-s1-s47

David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting of the association for computational linguistics, pages 189–196, 1995.

https://doi.org/10.3115/981658.981684

Yan Zhou and Sally Goldman. Democratic co-learning. In 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pages 594–602. IEEE, 2004.

https://doi.org/10.1109/ictai.2004.48

Zhi-Hua Zhou. When semi-supervised learning meets ensemble learning. In Frontiers of Electrical and Electronic Engineering in China, volume 6, pages 6–16. Springer, 2011.

https://doi.org/10.1007/s11460-011-0126-2

Zhi-Hua Zhou and Ming Li. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 17(11):1529–1541, 2005.

https://doi.org/10.1109/tkde.2005.186

Xiaojin Zhu. Semi-supervised learning. In Encyclopedia of Machine Learning, pages 892–897. Springer, 2011.

Xiaojin Zhu and Andrew B. Goldberg Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3(1):1–130, 2009.

https://doi.org/10.2200/s00196ed1v01y200906aim006




DOI: https://doi.org/10.31449/inf.v43i2.2217

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.