Software Features Extraction From Object-Oriented Source Code Using an Overlapping Clustering Approach

Imad Eddine Araar, Hassina Seridi


For many decades, numerous organizations have launched software reuse initiatives to improve their productivity. Software product lines (SPL) addressed this problem by organizing software development around a set of features that are shared by a set of products. In order to exploit existing software products for building a new SPL, features composing each of the used products must be specified in the first place. In this paper we analyze the effectiveness of overlapping clustering based technique to mine functional features from object-oriented (OO) source code of existing systems. The evaluation of the proposed approach using two different Java open-source applications, i.e. “Mobile media” and “Drawing Shapes”, has revealed encouraging results.

Full Text:



C. C. Aggarwal (2015), "Similarity and Distances", Data Mining: The text book, pp. 63-91: Springer International Publishing.

R. Al-Msie'deen et al. (2014), “Automatic Documentation of [Mined] Feature Implementations from Source Code Elements and Use-Case Diagrams with the REVPLINE Approach”, International Journal of Software Engineering and Knowledge Engineering, vol. 24, n°. 10, pp. 1413-1438.

E. Amigó et al. (2009), “A comparison of extrinsic clustering evaluation metrics based on formal constraints”, Information Retrieval, vol. 12, n°. 4, pp. 461-486.

E. T. Barr et al. (2015), “Automated software transplantation”, in International Symposium on Software Testing and Analysis Baltimore, MD, USA, pp. 257-269.

Y.-L. Chen, and H.-L. Hu (2006), “An overlapping cluster algorithm to provide non-exhaustive clustering”, European Journal of Operational Research, vol. 173, n°. 3, pp. 762-780.

P. Clements, and L. Northrop (2001), "Software product lines: practices and patterns", Addison-Wesley.

J. Dietrich et al. (2008), “Cluster analysis of Java dependency graphs”, in Proceedings of the 4th ACM symposium on Software visualization, Ammersee, Germany, pp. 91-94.

K. Draszawka, and J. Szymański (2011), "External Validation Measures for Nested Clustering of Text Documents", Emerging Intelligent Technologies in Industry, Studies in Computational Intelligence pp. 207-225: Springer Berlin Heidelberg.

H. Eyal-Salman, A.-D. Seriai, and C. Dony (2013), “Feature-to-Code Traceability in Legacy Software Variants”, in 39th EUROMICRO Conference on Software Engineering and Advanced Applications Santander, Spain, pp. 57-61.

S. Ferber, J. Haag, and J. Savolainen (2002), “Feature Interaction and Dependencies: Modeling Features for Reengineering a Legacy Product Line”, in Proceedings of the Second International Conference on Software Product Lines, pp. 235-256.

E. Figueiredo et al. (2008), “Evolving Software Product Lines with Aspects: An Empirical Study on Design Stability”, in 30th International Conference on Software Engineering, Leipzig, Germany, pp. 261-270.

D. Fisher (1987), “Knowledge Acquisition Via Incremental Conceptual Clustering”, Machine Learning, vol. 2, n°. 2, pp. 139-172.

R. W. Floyd (1962), “Algorithm 97: Shortest path”, Communications of the ACM, vol. 5, n°. 6, pp. 345.

B. Graaf, S. Weber, and A. van Deursen (2006), "Migrating supervisory control architectures using model transformations", The 10th European Conference on Software Maintenance and Reengineering. pp. 153-164.

E. N. Haslinger, R. E. Lopez-Herrejon, and A. Egyed (2011), “Reverse Engineering Feature Models from Programs' Feature Sets”, in 18th Working Conference on Reverse Engineering, Limerick, Ireland, pp. 308-312.

K. Kang et al. (1990), Feature Oriented Domain Analysis (FODA) Feasibility Study, Report CMU/SEI-90-TR-21, Carnegie-Mellon University Software Engineering Institute, United States.

C. Kästner, M. Kuhlemann, and D. Batory (2007), “Automating feature-oriented refactoring of legacy applications”, in ECOOP Workshop on Refactoring Tools, pp. 62-63.

S. Khuller, and B. Raghavachari (2009), "Basic graph algorithms", Algorithms and Theory of Computation Handbook, Second Edition, Volume 1, Chapman & Hall/CRC Applied Algorithms and Data Structures series: Chapman & Hall/CRC.

C. W. Krueger (2002), "Easing the Transition to Software Mass Customization", Software Product-Family Engineering : Revised Papers from the 4th International Workshop on Software Product-Family Engineering, Lecture Notes in Computer Science pp. 282-293: Springer Berlin / Heidelberg.

F. Loesch, and E. Ploedereder (2007), "Restructuring variability in software product lines using concept analysis of product configurations", Proceedings of 11th European Conference on Software Maintenance and Reengineering CSMR '07. pp. 159-168.

B. Meyer (1985), “On Formalism in Specifications”, IEEE Software, vol. 2, n°. 1, pp. 6-26.

N. Niu, and S. Easterbrook (2008), "On-Demand Cluster Analysis for Product Line Functional Requirements", Proceedings of 12th International Software Product Line Conference SPLC '08. pp. 87-96.

P. Paskevicius et al. (2012), “Automatic Extraction of Features and Generation of Feature Models from Java Programs”, Information Technology and Control, vol. 41, n°. 4, pp. 376-384.

A. Pérez-Suárez et al. (2013), “OClustR: A new graph-based algorithm for overlapping clustering”, Neurocomputing, vol. 121, pp. 234-247.

R. Al-Msie’Deen et al. (2013), “Mining Features from the Object-Oriented Source Code of Software Variants by Combining Lexical and Structural Similarity”, in IEEE 14th International Conference on Information Reuse & Integration, Las Vegas, NV, USA, pp. 586-593.

A. Rashid, J. C. Royer, and A. Rummler (2011), "Aspect-Oriented, Model-Driven Software Product Lines: The AMPLE Way", Cambridge University Press.

U. Ryssel, J. Ploennigs, and K. Kabitzsch (2011), “Extraction of feature models from formal contexts”, in Proceedings of the 15th International Software Product Line Conference, Volume 2, Munich, Germany, pp. 1-8.

E. Stroulia, and T. Systä (2002), “Dynamic analysis for reverse engineering and program understanding”, ACM SIGAPP Applied Computing Review, vol. 10, n°. 1, pp. 8-17.

L. P. Tizzei et al. (2011), “Components meet aspects: Assessing design stability of a software product line”, Information and Software Technology, vol. 53, n°. 2, pp. 121-136.

P. Tonella, and A. Potrich (2007), "Reverse Engineering of Object Oriented Code", Springer-Verlag New York, 1 ed.

S. Warshall (1962), “A Theorem on Boolean Matrices”, Journal of the ACM (JACM), vol. 9, n°. 1, pp. 11-12.

T. J. Young (2005), “Using aspectj to build a software product line for mobile devices”, Master Thesis, The University of British Columbia.

T. Ziadi et al. (2012), "Feature Identification from the Source Code of Product Variants", Proceedings of 16th European Conference on Software Maintenance and Reengineering (CSMR). pp. 417-422.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.