Ensemble-Based Network Anomaly Detection Using RFE and Information Gain for Optimized Feature Selection

Nagamani Uddamari; P Sammulal

doi:10.31449/inf.v49i10.8387

Contact Editors Europe, Africa:
Matjaz Gams
N. and S. America:
Karthick Gunasekaran
Asia, Australia:
Vinay Singh
Overview papers:
Maria Ganzha
Wiesław Pawlowski
Aleksander Denisiuk Abstacting / Indexing

Informatica is surveyed by:

ACM Digital Library
Citeseer
COBISS
Compendex
Computer & Information Systems Abstracts
Computer Database
Computer Science Index
dLib.si
DBLP Computer Science Bibliography
Directory of Open Access Journals
Google Scholar
InfoTrac OneFile
Inspec
Linguistic and Language Behaviour Abstracts
Mathematical Reviews, MatSciNet, MatSci on SilverPlatter and Current Mathematical Publications
Scopus Publishing

Informatica is published by:

Support

Informatica is supported by:

ACM Slovenia
Slovenian Society for Pattern Recognition
Slovenian Artificial Intelligence Society
Slovenian Society for Cognitive Science
Slovenian Society of Mathematicians, Physicists and Astronomers
Automatic Control Society of Slovenia
Slovenian Academy of Engineering
International Federation for Information Processing

Journal Help

User

Journal Content Search
Browse

Information

Notifications

About The Authors

Nagamani Uddamari

P Sammulal

Support & Indexing

Ensemble-Based Network Anomaly Detection Using RFE and Information Gain for Optimized Feature Selection

Nagamani Uddamari, P Sammulal

Abstract

Intrusion Detection Systems (IDSs) play a significant role in reducing dynamic cyber threats. However, current machine learning-centric IDSs are not without issues, as they may have a high false positive rate and suboptimal feature selection, resulting in a low detection rate. This paper proposes an ensemble IDS architecture that utilizes RFE and IG for feature selection, aiming to enhance anomaly detection performance and reduce computational intensity. We begin with a preprocessing pipeline that includes data cleaning, one-hot encoding of categorical features, and normalization to scale the features. The most discriminative attributes are selected to minimize redundancy. Then, the selected feature subset is fed to build a set of ensemble classifiers, including Random Forest, XGBoost, Extra Trees, and a weighted Voting Classifier. Extensive experimental results on the CIC-IDS2017 datasets demonstrate that the proposed ensemble-level approach outperforms in all aspects, achieving 97.5% accuracy, 97.2% precision, 97.8% recall, and 97.5% F1-score. Overall, the ensemble model exhibits an improvement in terms of recall and hence robustness compared to the two baseline classifiers, namely the standalone Random Forest (recall: 96.5%) and XGBoost (recall: 97.3%). We also conducted an ablation study that confirms the effectiveness of RFE and Information Gain by comparing settings with and without feature selection. These findings indicate that the proposed IDS architecture can be feasibly and scalably implemented for real-time network anomaly detection. Adaptive feature selection and deployment in a streaming setting could be investigated to enhance its resistance to novel attacks in the future.

Full Text:

PDF

DOI: https://doi.org/10.31449/inf.v49i10.8387

This work is licensed under a Creative Commons Attribution 3.0 License.

Informatica is financially supported by the Slovenian research agency from the Call for co-financing of scientific periodical publications.

Webmaster: Mario Konecki

Username
Password
Remember me