A Framework for Malicious Domain Names Detection using Feature Selection and Majority Voting Approach
Abstract
names has become critical to assuring the security of online environments. This paper
presents a framework for detecting malicious domain names using a feature selection
strategy and a majority vote method. The suggested methodology begins with the
extraction of important features from domain names and their related characteristics,
followed by a rigorous feature selection procedure to determine the most discriminating
attributes. To accomplish feature selection, a variety of feature selection techniques are
used, including chi-square statistics, information gain, gain ratio, and correlation-based
feature selection, to analyse the value of each characteristic in distinguishing benign and
malicious domain names. In addition, a majority voting strategy is utilised to improve
the detection system’s overall accuracy and reliability by combining the predictions of
different classifiers such as AdaBoost, logistic regression, k-nearest neighbours, naive
bayes, and multilayer perceptron. The ensemble of classifiers is trained on the ideal
features, yielding a complete and robust model capable of accurately recognising mali-
cious domain names while minimising false positives. The proposed approach is evalu-
ated against real-world examples of harmful domain names. The suggested framework
employing Chi-square feature selection and majority voting detects malicious domain
names with an accuracy of 99.44%, precision of 99.44%, recall of 99.44%, and f-measure
of 99.44%. The use of feature selection and a majority voting technique improves the
system’s adaptability and resilience in the face emerging cyber threats.
Full Text:
PDFReferences
Interisle malicious domain names statistics 4Q 2022. Available
online,https://www.cybercrimeinfocenter. org/malware-landscape-2023.
CSC domain security 2023 report. Available
online, https://www.cscdbs.com/assets/
pdfs/2023-Domain-Security-Report.pdf.
Zhao, Hong, Zhaobin Chang, Guangbin Bao,
and Xiangyan Zeng, Malicious domain names
detection algorithm based on N-gram. Jour-
nal of Computer Networks and Communica-
tions 2019.
Soleymani, Ali, and Fatemeh Arabgol, A
novel approach for detecting DGA-based
botnets in DNS queries using machine learn-ing techniques. Journal of Computer Networks and Communications, 2021, 1–13.
Yang, Luhui, Guangjie Liu, Weiwei Liu,
Huiwen Bai, Jiangtao Zhai, and Yuewei
Dai,Detecting Multielement Algorithmically
Generated Domain Names Based on Adap-
tive Embedding Model, Security and Com-
munication Networks, 2021, 1–20.
Chen, Shaojie, Bo Lang, Yikai Chen, and
Chong Xie, Detection of Algorithmically
Generated Malicious Domain Names with
Feature Fusion of Meaningful Word Segmen-
tation and N-Gram Sequences, Applied Sci-
ences, 13, no. 7,2023, 4406.
Wagan, Atif Ali, Qianmu Li, Zubair Za-
land, Shah Marjan, Dadan Khan Bozdar,
Aamir Hussain, Aamir Mehmood Mirza, and
Mehmood Baryalai, A Unified Learning Ap-
proach for Malicious Domain Name Detec-
tion, Axioms, 12, no. 5, 2023, 458.
Bilge, Leyla, Engin Kirda, Christopher
Kruegel, and Marco Balduzzi, Exposure:
Finding malicious domains using passive
DNS analysis, In Ndss, pp. 1–17, 2011.
Fan, Zhaoshan, Qing Wang, Haoran Jiao,
Junrong Liu, Zelin Cui, Song Liu, and Yuling
Liu, PUMD: a PU learning-based malicious
domain detection framework, Cybersecurity,
, no. 1, 2022, 1–22.
Yang, Luhui, Jiangtao Zhai, Weiwei Liu, Xi-
aopeng Ji, Huiwen Bai, Guangjie Liu, and
Yuewei Dai, Detecting word-based algorith-
mically generated domains using semantic
analysis, Symmetry, 11, no. 2, 2019, 176.
Shi, Yong, Gong Chen, and Juntao Li, Mali-
cious domain name detection based on ex-
treme machine learning, Neural Processing
Letters, 48,2018,1347–1357.
Fu, Yu, Lu Yu, Oluwakemi Hambolu, Ilker
Ozcelik, Benafsh Husain, Jingxuan Sun,
Karan Sapra, Dan Du, Christopher Tate
Beasley, and Richard R. Brooks, Stealthy do-
main generation algorithms, IEEE Transac-
tions on Information Forensics and Security,
, no. 6, 2017, 1430–1443.
Yun, Xiaochun, Ji Huang, Yipeng Wang,
Tianning Zang, Yuan Zhou, and Yongzheng
Zhang, Khaos: An adversarial neural net-
work DGA with high anti-detection ability,
IEEE transactions on information forensics
and security, 15, 2019,, 2225–2240.
Yang, Luhui, Guangjie Liu, Yuewei Dai,
Jinwei Wang, and Jiangtao Zhai, Detecting
stealthy domain generation algorithms using
heterogeneous deep neural network frame-
work, IEEE Access, 8, 2020,82876–82889.
Xu, Congyuan, Jizhong Shen, and Xin Du,
Detection method of domain names gener-
ated by DGAs based on semantic represen-
tation and deep neural network, Computers
& Security, 85, 2019,77–88.
Vinayakumar, R., K. P. Soman, and Praba-
haran Poornachandran, Detecting malicious
domain names using deep learning ap-
proaches at scale, Journal of Intelligent &
Fuzzy Systems, 34, no. 3, 2018,1355–1367.
Yang, Luhui, Guangjie Liu, Jinwei Wang,
Jiangtao Zhai, and Yuewei Dai, A seman-
tic element representation model for mali-
cious domain name detection, Journal of
Information Security and Applications, 66,
,103148.
Marques, Claudio, Benign and malicious do-
mains based on DNS logs, Mendeley Data,
V5, 2021, doi: 10.17632/623sshkdrz.5.
Hall M, Frank E, Holmes G, Pfahringer
B, Reutemann P, Witten IH, The WEKA
data mining software: an update, ACM
SIGKDD explorations newsletter, 2009, Nov
, 11(1),10–8.
Zhai Y, Song W, Liu X, Liu L, Zhao X,
A chi-square statistics based feature selec-
tion method in text classification, In 2018
IEEE 9th International conference on soft-
ware engineering and service science (IC-
SESS), 2018, Nov 23,pp. 160–163, IEEE.
Prasetiyo B, Muslim MA, Baroroh N, Eval-
uation of feature selection using information
gain and gain ratio on bank marketing clas-
sification using Naı̈ve bayes, In Journal of physics: conference series, 2021, Jun 1,Vol. 1918, No. 4, pp. 042153, IOP Publishing.
Qu K, Xu J, Hou Q, Qu K, Sun Y., Fea-
ture selection using Information Gain and de-
cision information in neighborhood decision
system, Applied Soft Computing, 2023, Mar
, 136,110100.
Hall, Mark A., Correlation-based feature se-
lection of discrete and numeric class machine
learning, 2000.
Patil, Dharmaraj R., Tareek M. Patte-
war, Vipul D. Punjabi, and Shailendra M.
Pardeshi, Detecting Fake Social Media Pro-
files Using the Majority Voting Approach,
EAI Endorsed Transactions on Scalable In-
formation Systems,2024.
Schapire RE., Explaining AdaBoost, In Em-
pirical Inference: Festschrift in Honor of
Vladimir N. Vapnik, 201,3 Oct 9, pp. 37–52,.
Berlin, Heidelberg: Springer Berlin Heidel-
berg.
Stoltzfus JC., Logistic regression: a brief
primer, Academic emergency medicine, 2011,
Oct, 18(10), 1099–104.
Peterson LE., K-nearest neighbor, Scholar-
pedia, 2009, Feb 21, 4(2),1883.
Rish, Irina., An empirical study of the naive
Bayes classifier, In IJCAI 2001 workshop on
empirical methods in artificial intelligence,
vol. 3, no. 22, pp. 41–46. 2001.
Tang, Jiexiong, Chenwei Deng, and Guang-
Bin Huang, Extreme learning machine for
multilayer perceptron, IEEE transactions on
neural networks and learning systems, 27, no.
, 2015, 809–821.
Ruta D, Gabrys B., Classifier selection for
majority voting, Information fusion, 2005,
Mar 1, 6(1), 63-81.
Patil, Dharmaraj R., Tareek M. Patte-
war, Vipul D. Punjabi, and Shailendra M.
Pardeshi, Detecting Fake Social Media Pro-
files Using the Majority Voting Approach,
EAI Endorsed Transactions on Scalable In-
formation Systems, 2024.
Patil, Dharmaraj R., and Tareek M. Patte-
war, Majority Voting and Feature Selection
Based Network Intrusion Detection System,
EAI Endorsed Transactions on Scalable In-
formation Systems 9, no. 6,2022: e6-e6.
Patil, Dharmaraj R., Fake news detection us-
ing majority voting technique, arXiv preprint
arXiv:2203.09936, 2022.
Patil, Dharmaraj R., and Jayantro B. Patil,
Malicious URLs detection using decision tree
classifiers and majority voting technique, Cy-
bernetics and Information Technologies 18,
no. 1, 2018: 11-29.
Sokolova M, Lapalme G., A systematic anal-
ysis of performance measures for classifica-
tion tasks, Information processing & man-
agement, 2009, Jul 1, 45(4), 427–37.
DOI: https://doi.org/10.31449/inf.v48i3.5824
This work is licensed under a Creative Commons Attribution 3.0 License.