Ensemble-Based Network Anomaly Detection Using RFE and Information Gain for Optimized Feature Selection
Abstract
Intrusion Detection Systems (IDSs) play a significant role in reducing dynamic cyber threats. However, current machine learning-centric IDSs are not without issues, as they may have a high false positive rate and suboptimal feature selection, resulting in a low detection rate. This paper proposes an ensemble IDS architecture that utilizes RFE and IG for feature selection, aiming to enhance anomaly detection performance and reduce computational intensity. We begin with a preprocessing pipeline that includes data cleaning, one-hot encoding of categorical features, and normalization to scale the features. The most discriminative attributes are selected to minimize redundancy. Then, the selected feature subset is fed to build a set of ensemble classifiers, including Random Forest, XGBoost, Extra Trees, and a weighted Voting Classifier. Extensive experimental results on the CIC-IDS2017 datasets demonstrate that the proposed ensemble-level approach outperforms in all aspects, achieving 97.5% accuracy, 97.2% precision, 97.8% recall, and 97.5% F1-score. Overall, the ensemble model exhibits an improvement in terms of recall and hence robustness compared to the two baseline classifiers, namely the standalone Random Forest (recall: 96.5%) and XGBoost (recall: 97.3%). We also conducted an ablation study that confirms the effectiveness of RFE and Information Gain by comparing settings with and without feature selection. These findings indicate that the proposed IDS architecture can be feasibly and scalably implemented for real-time network anomaly detection. Adaptive feature selection and deployment in a streaming setting could be investigated to enhance its resistance to novel attacks in the future.DOI:
https://doi.org/10.31449/inf.v49i10.8387Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







