Optimizing Network Intrusion Detection Systems Through Ensemble Learning and Feature Selection Using the CIC-IDS2017 Dataset
Abstract
The increasing complexity of cyber threats demands high-performance Network Intrusion Detection Systems (NIDS) that are both accurate and efficient. This study presents an optimized NIDS framework combining feature selection with ensemble learning. Experiments were performed on the CIC-IDS2017 dataset
using a stratified train/test split of 70/30. Feature selection methods included Information Gain (24 features), Chi-square (χ2, 25 features), and Principal Component Analysis (PCA, 20 features). Bagging classifiers (Random Forest, Extra Trees, Bagged Decision Tree) and boosting classifiers (XGBoost, Gradient Boosting, LightGBM, AdaBoost, CatBoost) were evaluated. Using Information Gain selecting 24 features, Extra Trees achieved 99.98% accuracy with near-perfect precision, recall, and F1-score, and extremely low false positive and false negative rates of 0.0001397 and 0.0002597, respectively. Boosting-based models demonstrated superior sensitivity for minority attack classes, improving performance under imbalanced conditions. These results indicate that integrating feature selection with diverse ensemble techniques produces a scalable, interpretable, and highly effective NIDS suitable for practical cybersecurity applications.
Full Text:
PDFReferences
M. M. Issa, M. Aljanabi, and H. M.
Muhialdeen, “Systematic literature re-
view on intrusion detection systems:
Research trends,
algorithms,
meth-
ods, datasets, and limitations,” Jour-
nal of Intelligent Systems, vol. 33,
no. 1, p. 20230248, 2024. DOI: https:
//doi.org/10.1515/jisys-2023-0248.
Vanin, P., Newe, T., Dhirani, L. L.,
O’Connell, E., O’Shea, D., Lee, B., and Rao,
M, “A study of network intrusion detection
systems using artificial intelligence/machine
learning,” Applied Sciences, vol. 12, no. 22,
p. 11752, 2022. DOI:https://doi.org/10.
/app122211752.
A. Khraisat, I. Gondal, P. Vamplew, and J.
Kamruzzaman, “Survey of intrusion detec-
tion systems: Techniques, datasets, and chal-
lenges,” Cybersecurity, vol. 2, no. 1, pp. 1–
, 2019. DOI: https://doi.org/10.1186/
s42400-019-0038-7.
D. R. Patil and T. M. Pattewar, “Major-
ity voting and feature selection based net-
work intrusion detection system,” EAI En-
dorsed Transactions on Scalable Information
Systems, vol. 9, no. 6, 2022. DOI: https://
doi.org/10.4108/eai.4-4-2022.173780.
N. G. Relan and D. R. Patil, “Implementa-
tion of network intrusion detection system
using variant of decision tree algorithm,” in
International Conference on Nascent
Technologies in the Engineering Field (IC-
NTE), pp. 1–5, 2015.
Cisco Cyber Threat Trends Report
[Online]. Available: https://www.
cisco.com/c/en/us/products/security/
cyber-threat-trends-report.html
Checkpoint 2024 Cyber Security Report.
[Online]. Available:
https://engage.
checkpoint.com/quantum-force-ppc
Ahmad, Z., Shahid Khan, A., Wai Shiang,
C., Abdullah, J.,and Ahmad, F. , “Network
intrusion detection system: A systematic
study of machine learning and deep learn-
ing approaches,” Transactions on Emerging
Telecommunications Technologies, vol. 32,
no. 1, p. e4150, 2021. DOI: https://doi.
org/10.1002/ett.4150.
J. O. Mebawondu, O. D. Alowolodu, J.
O. Mebawondu, and A. O. Adetunmbi,
“Network intrusion detection system using supervised learning paradigm,” Scientific
African, vol. 9, p. e00497, 2020. DOI:
https://ui.adsabs.harvard.edu/link_
gateway/2020SciAf...900497M/doi:
1016/j.sciaf.2020.e00497.
J. Ghadermazi, A. Shah, and N. D. Bas-
tian, “Towards real-time network intrusion
detection with image-based sequential pack-
ets representation,” IEEE Transactions on
Big Data, 2024. DOI: https://doi.org/10.
/TBDATA.2024.3403394.
R. Vinayakumar, K. P. Soman, and P. Poor-
nachandran, “A comparative analysis of deep
learning approaches for network intrusion
detection systems (N-IDSs): Deep learn-
ing for N-IDSs,” International Journal of
Digital Crime and Forensics (IJDCF), vol.
, no. 3, pp. 65–89, 2019. DOI: DOI:
4018/IJDCF.2019070104.
Sarvari, S., Sani, N. F. M., Hanapi, Z. M.,
and Abdullah, M. T. , “An efficient anomaly
intrusion detection method with feature se-
lection and evolutionary neural network,”
IEEE Access, vol. 8, pp. 70651–70663, 2020.
DOI: 10.1109/ACCESS.2020.2986217.
Duhayyim, M. A., Alissa, K. A., Alrayes,
F. S., Alotaibi, S. S., Tag El Din, E. M.,
Abdelmageed, A. A., and Motwakel, A. ,
“Evolutionary-based deep stacked Autoen-
coder for intrusion detection in a cloud-
based cyber-physical system,” Applied Sci-
ences, vol. 12, no. 14, p. 6875, 2022. DOI:
https://doi.org/10.3390/app12146875.
Dini, P., Elhanashi, A., Begni, A., Saponara,
S., Zheng, Q., and Gasmi, K. , “Overview on
intrusion detection systems design exploit-
ing machine learning for networking cyber-
security,” Applied Sciences, vol. 13, no. 13,
p. 7507, 2023. DOI: https://doi.org/10.
/app13137507.
Su, T., Sun, H., Zhu, J., Wang, S., and
Li, Y. , “BAT: Deep learning methods on
network intrusion detection using NSL-KDD
dataset,” IEEE Access, vol. 8, pp. 29575–
, 2020. DOI: https://doi.org/10.
/ACCESS.2020.2972627.
Stiawan, D., Idris, M. Y. B., Bamhdi, A. M.,
and Budiarto, R. , “CICIDS-2017 dataset
feature analysis with information gain for
anomaly detection,” IEEE Access, vol. 8, pp.
–132921, 2020. DOI: https://doi.
org/10.1109/ACCESS.2020.3009843.
G. Liu and J. Zhang, “CNID: Research of
network intrusion detection based on convo-
lutional neural network,” Discrete Dynamics
in Nature and Society, vol. 2020, no. 1, p.
, 2020. DOI: https://doi.org/10.
/2020/4705982.
A. S. Jaradat, M. M. Barhoush, and R.
B. Easa, “Network intrusion detection sys-
tem: Machine learning approach,” Indone-
sian Journal of Electrical Engineering and
Computer Science, vol. 25, no. 2, pp. 1151–
, 2022.
Alissa, K. A., Alotaibi, S. S., Alrayes, F.
S., Aljebreen, M., Alazwari, S., Alshahrani,
H., and Motwakel, A. , “Crystal structure
optimization with deep-Autoencoder-based
intrusion detection for secure internet of
drones environment,” Drones, vol. 6, no. 10,
p. 297, 2022. DOI: https://doi.org/10.
/drones6100297.
Toldinas, J., Venčkauskas, A., Damaševičius,
R., Grigaliūnas, Š. Morkevičius, N., and
Baranauskas, E. , “A novel approach for
network intrusion detection using multi-
stage deep learning image recognition,”
Electronics, vol. 10, no. 15, p. 1854,
DOI: https://doi.org/10.3390/
electronics10151854.
Fatani, A., Abd Elaziz, M., Dahou, A.,
Al-Qaness, M. A., and Lu, S. , “IoT in-
trusion detection system using deep learn-
ing and enhanced transient search optimiza-
tion,” IEEE Access, vol. 9, pp. 123448–
, 2021. DOI: https://doi.org/10.
/ACCESS.2021.3109081.
A. Chiche and M. Meshesha, “Towards a
scalable and adaptive learning approach for
network intrusion detection,” Journal of
Computer Networks and Communications,
vol. 2021, no. 1, p. 8845540, 2021. DOI:
https://doi.org/10.1155/2021/8845540.
Zivkovic, M., Tair, M., Venkatachalam, K.,
Bacanin, N., Hubálovský, Š., and Trojovský,
P. , “Novel hybrid firefly algorithm: An ap-
plication to enhance XGBoost tuning for in-
trusion detection classification,” PeerJ Com-
puter Science, vol. 8, p. e956, 2022. DOI:
https://doi.org/10.7717/peerj-cs.956.
E. S. A. Alars and S. Kurnaz, “Enhanc-
ing network intrusion detection systems with
combined network and host traffic features
using deep learning: Deep learning and IoT
perspective,” Discover Computing, vol. 27,
no. 1, p. 39, 2024. DOI: https://doi.org/
1007/s10791-024-09480-3.
M. Sajid, K. R. Malik, A. Almogren, T.
S. Malik, A. H. Khan, J. Tanveer, and A.
U. Rehman, “Enhancing intrusion detection:
A hybrid machine and deep learning ap-
proach,” Journal of Cloud Computing, vol.
, no. 1, p. 123, 2024. DOI: https://doi.
org/10.1186/s13677-024-00685-x.
A. Shiravani, M. H. Sadreddini, and H. N.
Nahook, “Network intrusion detection us-
ing data dimensions reduction techniques,”
Journal of Big Data, vol. 10, no. 1, p.
, 2023. DOI: https://doi.org/10.1186/
s40537-023-00697-5.
Ayantayo, A., Kaur, A., Kour, A., Schmoor,
X., Shah, F., Vickers, I.,and Abdelsamea,
M. M., “Network intrusion detection us-
ing feature fusion with deep learning,”
Journal of Big Data, vol. 10, no. 1,
p. 167, 2023. DOI: https://doi.org/10.
/s40537-023-00834-0.
C. Xi, H. Wang, and X. Wang, “A
novel multi-scale network intrusion detection
model with transformer,” Scientific Reports,
vol. 14, no. 1, p. 23239, 2024. DOI :https:
//doi.org/10.1038/s41598-024-74214-w.
Y. Gu, K. Li, Z. Guo, and Y. Wang,
“Semi-supervised K-means DDoS detection
method using hybrid feature selection al-
gorithm,” IEEE Access, vol. 7, pp. 64351–
, 2019. DOI: https://doi.org/10.
/ACCESS.2019.2917532.
Mohamed, H. G., Alrowais, F., Al-Hagery,
M. A., Al Duhayyim, M., Hilal, A. M.,
and Motwakel, A., “Optimal Wavelet Neu-
ral Network-Based Intrusion Detection in
Internet of Things Environment,” Comput-
ers, Materials & Continua, vol. 75, no.
, 2023. DOI: https://doi.org/10.32604/
cmc.2023.036822.
F. Wei, H. Li, Z. Zhao, and H. Hu, “XNIDS:
Explaining Deep Learning-based Network In-
trusion Detection Systems for Active Intru-
sion Responses,” in 32nd USENIX Secu-
rity Symposium (USENIX Security 23), pp.
–4354, 2023.
Scikit-learn
Documentation
on
Fea-
ture
Selection,
[Online].
Available:
https://scikit-learn.org/stable/
modules/feature_selection.html.
[Ac-
cessed: Nov. 25, 2024].
D. R. Patil, “A framework for malicious do-
main names detection using feature selec-
tion and majority voting approach,” Infor-
matica, vol. 48, no. 3, 2024. DOI: https:
//doi.org/10.31449/inf.v48i3.5824.
D. R. Patil and J. B. Patil, “Malicious web
pages detection using feature selection tech-
niques and machine learning,” Int. J. High
Perform. Comput. Networking, vol. 14, no. 4,
pp. 473–488, 2019. DOI: https://doi.org/
1504/IJHPCN.2019.102355.
Qu K, Xu J, Hou Q, Qu K, Sun Y. Fea-
ture selection using Information Gain and
decision information in neighborhood deci-
sion system. Applied Soft Computing. 2023
Mar 1;136:110100. DOI: https://doi.org/
1016/j.asoc.2023.110100.
Prasetiyo B, Muslim MA, Baroroh N. Eval-
uation of feature selection using information
gain and gain ratio on bank marketing clas-
sification using naı̈ve bayes. In Journal of
physics: conference series 2021. Jun 1 (Vol.
, No. 4, p. 042153). IOP Publishing.
DOI: 10.1088/1742-6596/1918/4/042153.
Zhai Y, Song W, Liu X, Liu L, Zhao X.
A chi-square statistics based feature selec-
tion method in text classification. In 2018
IEEE 9th International conference on soft-
ware engineering and service science (IC-
SESS) 2018. Nov 23 (pp. 160-163). IEEE.
Scikit-learn Documentation on Chi-square
Feature Selection,
[Online]. Available:
https://scikit-learn.org/stable/
modules/feature_selection.html#chi2.
[Accessed: Nov. 25, 2024].
I. T. Jolliffe and J. Cadima, “Principal Com-
ponent Analysis: A Review and Recent De-
velopments,” Philosophical Transactions of
the Royal Society A: Mathematical, Physi-
cal and Engineering Sciences, vol. 374, no.
, pp. 20150202, Apr. 2016. DOI: https:
//doi.org/10.1098/rsta.2015.0202.
H. Abdi and L. J. Williams, “Principal Com-
ponent Analysis,” Wiley Interdisciplinary
Reviews: Computational Statistics, vol. 2,
no. 4, pp. 433–459, July 2010.
Scikit-learn Documentation on PCA, [On-
line]. Available: https://scikit-learn.
org/stable/modules/generated/
sklearn.decomposition.PCA.html.
[Ac-
cessed: Nov. 25, 2024].
F. Pedregosa et al., “Scikit-learn: Machine
Learning in Python,” Journal of Machine
Learning Research, vol. 12, pp. 2825–2830,
Oct. 2011.
D. R. Patil and J. B. Patil, “Malicious URLs
detection using decision tree classifiers and
majority voting technique,” Cybernetics and
Inf. Technol., vol. 18, no. 1, pp. 11–29, 2018.
DOI: 10.2478/cait-2018-0002.
L. Breiman, “Bagging predictors,” Machine
Learning, vol. 24, no. 2, pp. 123–140, 1996.
P. Geurts, D. Ernst, and L. Wehenkel, “Ex-
tremely Randomized Trees,” Machine Learn-
ing, vol. 63, no. 1, pp. 3–42, Apr. 2006.
L. Breiman, “Random forests,” Machine
Learning, vol. 45, no. 1, pp. 5–32, 2001.
Y. Freund and R. E. Schapire, “A decision-
theoretic generalization of on-line learning
and an application to boosting,” in Proceed-
ings of the Second European Conference on
Computational Learning Theory, pp. 23–37,
Springer, 1995.
T. Chen and C. Guestrin, “XGBoost: A scal-
able tree boosting system,” in Proceedings of
the 22nd ACM SIGKDD International Con-
ference on Knowledge Discovery and Data
Mining, pp. 785–794, ACM, 2016.
A. V. Dorogush, V. Ershov, and A. Gulin,
“CatBoost: A high-performance gradient
boosting library,” in Proceedings of the 2018
Data Mining and Knowledge Discovery Con-
ference, pp. 1–10, 2018.
J. H. Friedman, “Greedy function approxi-
mation: A gradient boosting machine,” The
Annals of Statistics, vol. 29, no. 5, pp. 1189–
, 2001.
Ke, G., Meng, Q., Finley, T., Wang, T., and
Yang, W. , “LightGBM: A highly efficient
gradient boosting decision tree,” in Proceed-
ings of the 31st Conference on Neural Infor-
mation Processing Systems, pp. 3146–3154,
I. Sharafaldin, A. H. Lashkari, and
A. A. Ghorbani, “Toward Generating a
New Intrusion Detection Dataset and Intru-
sion Traffic Characterization,” in Proc. 4th
Int. Conf. Information Systems Security and
Privacy (ICISSP), Funchal, Portugal, 2018,
pp. 108–116.
Canadian Institute for Cybersecurity, “CI-
CIDS2017 Dataset,” [Online]. Available:
https://www.unb.ca/cic/datasets/
ids-2017.html. [Accessed: Nov. 25, 2024].
Kaggle, “CICIDS2017 Dataset for In-
trusion Detection,” [Online]. Available:
https://www.kaggle.com/datasets/
ishadss/cicids2017. [Accessed: Nov. 25,
.
A. H. Lashkari, M. S. Mamun, and
A. A. Ghorbani, “Characterization of Tor
Traffic Using Time Based Features,” in Proc.
rd Int. Conf. Information Systems Secu-
rity and Privacy (ICISSP), Porto, Portugal,
, pp. 253–262.
M. Sokolova and G. Lapalme, “A systematic
analysis of performance measures for clas-
sification tasks,” Information Processing & Management, vol. 45, no. 4, pp. 427–437,
Jul. 2009. DOI: https://doi.org/10.1016/
j.ipm.2009.03.002.
DOI: https://doi.org/10.31449/inf.v49i4.7678
This work is licensed under a Creative Commons Attribution 3.0 License.








