Optimizing Network Intrusion Detection Systems Through Ensemble Learning and Feature Selection Using the CIC-IDS2017 Dataset

Dharmaraj Rajaram Patil, Tareek M Pattewar, Trupti S Shinde, Kavita S. Kumavat, Sujit N. Deshpande

Abstract


The increasing complexity of cyber threats demands high-performance Network Intrusion Detection Systems (NIDS) that are both accurate and efficient. This study presents an optimized NIDS framework combining feature selection with ensemble learning. Experiments were performed on the CIC-IDS2017 dataset
using a stratified train/test split of 70/30. Feature selection methods included Information Gain (24 features), Chi-square (χ2, 25 features), and Principal Component Analysis (PCA, 20 features). Bagging classifiers (Random Forest, Extra Trees, Bagged Decision Tree) and boosting classifiers (XGBoost, Gradient Boosting, LightGBM, AdaBoost, CatBoost) were evaluated. Using Information Gain selecting 24 features, Extra Trees achieved 99.98% accuracy with near-perfect precision, recall, and F1-score, and extremely low false positive and false negative rates of 0.0001397 and 0.0002597, respectively. Boosting-based models demonstrated superior sensitivity for minority attack classes, improving performance under imbalanced conditions. These results indicate that integrating feature selection with diverse ensemble techniques produces a scalable, interpretable, and highly effective NIDS suitable for practical cybersecurity applications.


Full Text:

PDF

References


M. M. Issa, M. Aljanabi, and H. M.

Muhialdeen, “Systematic literature re-

view on intrusion detection systems:

Research trends,

algorithms,

meth-

ods, datasets, and limitations,” Jour-

nal of Intelligent Systems, vol. 33,

no. 1, p. 20230248, 2024. DOI: https:

//doi.org/10.1515/jisys-2023-0248.

Vanin, P., Newe, T., Dhirani, L. L.,

O’Connell, E., O’Shea, D., Lee, B., and Rao,

M, “A study of network intrusion detection

systems using artificial intelligence/machine

learning,” Applied Sciences, vol. 12, no. 22,

p. 11752, 2022. DOI:https://doi.org/10.

/app122211752.

A. Khraisat, I. Gondal, P. Vamplew, and J.

Kamruzzaman, “Survey of intrusion detec-

tion systems: Techniques, datasets, and chal-

lenges,” Cybersecurity, vol. 2, no. 1, pp. 1–

, 2019. DOI: https://doi.org/10.1186/

s42400-019-0038-7.

D. R. Patil and T. M. Pattewar, “Major-

ity voting and feature selection based net-

work intrusion detection system,” EAI En-

dorsed Transactions on Scalable Information

Systems, vol. 9, no. 6, 2022. DOI: https://

doi.org/10.4108/eai.4-4-2022.173780.

N. G. Relan and D. R. Patil, “Implementa-

tion of network intrusion detection system

using variant of decision tree algorithm,” in

International Conference on Nascent

Technologies in the Engineering Field (IC-

NTE), pp. 1–5, 2015.

Cisco Cyber Threat Trends Report

[Online]. Available: https://www.

cisco.com/c/en/us/products/security/

cyber-threat-trends-report.html

Checkpoint 2024 Cyber Security Report.

[Online]. Available:

https://engage.

checkpoint.com/quantum-force-ppc

Ahmad, Z., Shahid Khan, A., Wai Shiang,

C., Abdullah, J.,and Ahmad, F. , “Network

intrusion detection system: A systematic

study of machine learning and deep learn-

ing approaches,” Transactions on Emerging

Telecommunications Technologies, vol. 32,

no. 1, p. e4150, 2021. DOI: https://doi.

org/10.1002/ett.4150.

J. O. Mebawondu, O. D. Alowolodu, J.

O. Mebawondu, and A. O. Adetunmbi,

“Network intrusion detection system using supervised learning paradigm,” Scientific

African, vol. 9, p. e00497, 2020. DOI:

https://ui.adsabs.harvard.edu/link_

gateway/2020SciAf...900497M/doi:

1016/j.sciaf.2020.e00497.

J. Ghadermazi, A. Shah, and N. D. Bas-

tian, “Towards real-time network intrusion

detection with image-based sequential pack-

ets representation,” IEEE Transactions on

Big Data, 2024. DOI: https://doi.org/10.

/TBDATA.2024.3403394.

R. Vinayakumar, K. P. Soman, and P. Poor-

nachandran, “A comparative analysis of deep

learning approaches for network intrusion

detection systems (N-IDSs): Deep learn-

ing for N-IDSs,” International Journal of

Digital Crime and Forensics (IJDCF), vol.

, no. 3, pp. 65–89, 2019. DOI: DOI:

4018/IJDCF.2019070104.

Sarvari, S., Sani, N. F. M., Hanapi, Z. M.,

and Abdullah, M. T. , “An efficient anomaly

intrusion detection method with feature se-

lection and evolutionary neural network,”

IEEE Access, vol. 8, pp. 70651–70663, 2020.

DOI: 10.1109/ACCESS.2020.2986217.

Duhayyim, M. A., Alissa, K. A., Alrayes,

F. S., Alotaibi, S. S., Tag El Din, E. M.,

Abdelmageed, A. A., and Motwakel, A. ,

“Evolutionary-based deep stacked Autoen-

coder for intrusion detection in a cloud-

based cyber-physical system,” Applied Sci-

ences, vol. 12, no. 14, p. 6875, 2022. DOI:

https://doi.org/10.3390/app12146875.

Dini, P., Elhanashi, A., Begni, A., Saponara,

S., Zheng, Q., and Gasmi, K. , “Overview on

intrusion detection systems design exploit-

ing machine learning for networking cyber-

security,” Applied Sciences, vol. 13, no. 13,

p. 7507, 2023. DOI: https://doi.org/10.

/app13137507.

Su, T., Sun, H., Zhu, J., Wang, S., and

Li, Y. , “BAT: Deep learning methods on

network intrusion detection using NSL-KDD

dataset,” IEEE Access, vol. 8, pp. 29575–

, 2020. DOI: https://doi.org/10.

/ACCESS.2020.2972627.

Stiawan, D., Idris, M. Y. B., Bamhdi, A. M.,

and Budiarto, R. , “CICIDS-2017 dataset

feature analysis with information gain for

anomaly detection,” IEEE Access, vol. 8, pp.

–132921, 2020. DOI: https://doi.

org/10.1109/ACCESS.2020.3009843.

G. Liu and J. Zhang, “CNID: Research of

network intrusion detection based on convo-

lutional neural network,” Discrete Dynamics

in Nature and Society, vol. 2020, no. 1, p.

, 2020. DOI: https://doi.org/10.

/2020/4705982.

A. S. Jaradat, M. M. Barhoush, and R.

B. Easa, “Network intrusion detection sys-

tem: Machine learning approach,” Indone-

sian Journal of Electrical Engineering and

Computer Science, vol. 25, no. 2, pp. 1151–

, 2022.

Alissa, K. A., Alotaibi, S. S., Alrayes, F.

S., Aljebreen, M., Alazwari, S., Alshahrani,

H., and Motwakel, A. , “Crystal structure

optimization with deep-Autoencoder-based

intrusion detection for secure internet of

drones environment,” Drones, vol. 6, no. 10,

p. 297, 2022. DOI: https://doi.org/10.

/drones6100297.

Toldinas, J., Venčkauskas, A., Damaševičius,

R., Grigaliūnas, Š. Morkevičius, N., and

Baranauskas, E. , “A novel approach for

network intrusion detection using multi-

stage deep learning image recognition,”

Electronics, vol. 10, no. 15, p. 1854,

DOI: https://doi.org/10.3390/

electronics10151854.

Fatani, A., Abd Elaziz, M., Dahou, A.,

Al-Qaness, M. A., and Lu, S. , “IoT in-

trusion detection system using deep learn-

ing and enhanced transient search optimiza-

tion,” IEEE Access, vol. 9, pp. 123448–

, 2021. DOI: https://doi.org/10.

/ACCESS.2021.3109081.

A. Chiche and M. Meshesha, “Towards a

scalable and adaptive learning approach for

network intrusion detection,” Journal of

Computer Networks and Communications,

vol. 2021, no. 1, p. 8845540, 2021. DOI:

https://doi.org/10.1155/2021/8845540.

Zivkovic, M., Tair, M., Venkatachalam, K.,

Bacanin, N., Hubálovský, Š., and Trojovský,

P. , “Novel hybrid firefly algorithm: An ap-

plication to enhance XGBoost tuning for in-

trusion detection classification,” PeerJ Com-

puter Science, vol. 8, p. e956, 2022. DOI:

https://doi.org/10.7717/peerj-cs.956.

E. S. A. Alars and S. Kurnaz, “Enhanc-

ing network intrusion detection systems with

combined network and host traffic features

using deep learning: Deep learning and IoT

perspective,” Discover Computing, vol. 27,

no. 1, p. 39, 2024. DOI: https://doi.org/

1007/s10791-024-09480-3.

M. Sajid, K. R. Malik, A. Almogren, T.

S. Malik, A. H. Khan, J. Tanveer, and A.

U. Rehman, “Enhancing intrusion detection:

A hybrid machine and deep learning ap-

proach,” Journal of Cloud Computing, vol.

, no. 1, p. 123, 2024. DOI: https://doi.

org/10.1186/s13677-024-00685-x.

A. Shiravani, M. H. Sadreddini, and H. N.

Nahook, “Network intrusion detection us-

ing data dimensions reduction techniques,”

Journal of Big Data, vol. 10, no. 1, p.

, 2023. DOI: https://doi.org/10.1186/

s40537-023-00697-5.

Ayantayo, A., Kaur, A., Kour, A., Schmoor,

X., Shah, F., Vickers, I.,and Abdelsamea,

M. M., “Network intrusion detection us-

ing feature fusion with deep learning,”

Journal of Big Data, vol. 10, no. 1,

p. 167, 2023. DOI: https://doi.org/10.

/s40537-023-00834-0.

C. Xi, H. Wang, and X. Wang, “A

novel multi-scale network intrusion detection

model with transformer,” Scientific Reports,

vol. 14, no. 1, p. 23239, 2024. DOI :https:

//doi.org/10.1038/s41598-024-74214-w.

Y. Gu, K. Li, Z. Guo, and Y. Wang,

“Semi-supervised K-means DDoS detection

method using hybrid feature selection al-

gorithm,” IEEE Access, vol. 7, pp. 64351–

, 2019. DOI: https://doi.org/10.

/ACCESS.2019.2917532.

Mohamed, H. G., Alrowais, F., Al-Hagery,

M. A., Al Duhayyim, M., Hilal, A. M.,

and Motwakel, A., “Optimal Wavelet Neu-

ral Network-Based Intrusion Detection in

Internet of Things Environment,” Comput-

ers, Materials & Continua, vol. 75, no.

, 2023. DOI: https://doi.org/10.32604/

cmc.2023.036822.

F. Wei, H. Li, Z. Zhao, and H. Hu, “XNIDS:

Explaining Deep Learning-based Network In-

trusion Detection Systems for Active Intru-

sion Responses,” in 32nd USENIX Secu-

rity Symposium (USENIX Security 23), pp.

–4354, 2023.

Scikit-learn

Documentation

on

Fea-

ture

Selection,

[Online].

Available:

https://scikit-learn.org/stable/

modules/feature_selection.html.

[Ac-

cessed: Nov. 25, 2024].

D. R. Patil, “A framework for malicious do-

main names detection using feature selec-

tion and majority voting approach,” Infor-

matica, vol. 48, no. 3, 2024. DOI: https:

//doi.org/10.31449/inf.v48i3.5824.

D. R. Patil and J. B. Patil, “Malicious web

pages detection using feature selection tech-

niques and machine learning,” Int. J. High

Perform. Comput. Networking, vol. 14, no. 4,

pp. 473–488, 2019. DOI: https://doi.org/

1504/IJHPCN.2019.102355.

Qu K, Xu J, Hou Q, Qu K, Sun Y. Fea-

ture selection using Information Gain and

decision information in neighborhood deci-

sion system. Applied Soft Computing. 2023

Mar 1;136:110100. DOI: https://doi.org/

1016/j.asoc.2023.110100.

Prasetiyo B, Muslim MA, Baroroh N. Eval-

uation of feature selection using information

gain and gain ratio on bank marketing clas-

sification using naı̈ve bayes. In Journal of

physics: conference series 2021. Jun 1 (Vol.

, No. 4, p. 042153). IOP Publishing.

DOI: 10.1088/1742-6596/1918/4/042153.

Zhai Y, Song W, Liu X, Liu L, Zhao X.

A chi-square statistics based feature selec-

tion method in text classification. In 2018

IEEE 9th International conference on soft-

ware engineering and service science (IC-

SESS) 2018. Nov 23 (pp. 160-163). IEEE.

Scikit-learn Documentation on Chi-square

Feature Selection,

[Online]. Available:

https://scikit-learn.org/stable/

modules/feature_selection.html#chi2.

[Accessed: Nov. 25, 2024].

I. T. Jolliffe and J. Cadima, “Principal Com-

ponent Analysis: A Review and Recent De-

velopments,” Philosophical Transactions of

the Royal Society A: Mathematical, Physi-

cal and Engineering Sciences, vol. 374, no.

, pp. 20150202, Apr. 2016. DOI: https:

//doi.org/10.1098/rsta.2015.0202.

H. Abdi and L. J. Williams, “Principal Com-

ponent Analysis,” Wiley Interdisciplinary

Reviews: Computational Statistics, vol. 2,

no. 4, pp. 433–459, July 2010.

Scikit-learn Documentation on PCA, [On-

line]. Available: https://scikit-learn.

org/stable/modules/generated/

sklearn.decomposition.PCA.html.

[Ac-

cessed: Nov. 25, 2024].

F. Pedregosa et al., “Scikit-learn: Machine

Learning in Python,” Journal of Machine

Learning Research, vol. 12, pp. 2825–2830,

Oct. 2011.

D. R. Patil and J. B. Patil, “Malicious URLs

detection using decision tree classifiers and

majority voting technique,” Cybernetics and

Inf. Technol., vol. 18, no. 1, pp. 11–29, 2018.

DOI: 10.2478/cait-2018-0002.

L. Breiman, “Bagging predictors,” Machine

Learning, vol. 24, no. 2, pp. 123–140, 1996.

P. Geurts, D. Ernst, and L. Wehenkel, “Ex-

tremely Randomized Trees,” Machine Learn-

ing, vol. 63, no. 1, pp. 3–42, Apr. 2006.

L. Breiman, “Random forests,” Machine

Learning, vol. 45, no. 1, pp. 5–32, 2001.

Y. Freund and R. E. Schapire, “A decision-

theoretic generalization of on-line learning

and an application to boosting,” in Proceed-

ings of the Second European Conference on

Computational Learning Theory, pp. 23–37,

Springer, 1995.

T. Chen and C. Guestrin, “XGBoost: A scal-

able tree boosting system,” in Proceedings of

the 22nd ACM SIGKDD International Con-

ference on Knowledge Discovery and Data

Mining, pp. 785–794, ACM, 2016.

A. V. Dorogush, V. Ershov, and A. Gulin,

“CatBoost: A high-performance gradient

boosting library,” in Proceedings of the 2018

Data Mining and Knowledge Discovery Con-

ference, pp. 1–10, 2018.

J. H. Friedman, “Greedy function approxi-

mation: A gradient boosting machine,” The

Annals of Statistics, vol. 29, no. 5, pp. 1189–

, 2001.

Ke, G., Meng, Q., Finley, T., Wang, T., and

Yang, W. , “LightGBM: A highly efficient

gradient boosting decision tree,” in Proceed-

ings of the 31st Conference on Neural Infor-

mation Processing Systems, pp. 3146–3154,

I. Sharafaldin, A. H. Lashkari, and

A. A. Ghorbani, “Toward Generating a

New Intrusion Detection Dataset and Intru-

sion Traffic Characterization,” in Proc. 4th

Int. Conf. Information Systems Security and

Privacy (ICISSP), Funchal, Portugal, 2018,

pp. 108–116.

Canadian Institute for Cybersecurity, “CI-

CIDS2017 Dataset,” [Online]. Available:

https://www.unb.ca/cic/datasets/

ids-2017.html. [Accessed: Nov. 25, 2024].

Kaggle, “CICIDS2017 Dataset for In-

trusion Detection,” [Online]. Available:

https://www.kaggle.com/datasets/

ishadss/cicids2017. [Accessed: Nov. 25,

.

A. H. Lashkari, M. S. Mamun, and

A. A. Ghorbani, “Characterization of Tor

Traffic Using Time Based Features,” in Proc.

rd Int. Conf. Information Systems Secu-

rity and Privacy (ICISSP), Porto, Portugal,

, pp. 253–262.

M. Sokolova and G. Lapalme, “A systematic

analysis of performance measures for clas-

sification tasks,” Information Processing & Management, vol. 45, no. 4, pp. 427–437,

Jul. 2009. DOI: https://doi.org/10.1016/

j.ipm.2009.03.002.




DOI: https://doi.org/10.31449/inf.v49i4.7678

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.