A Framework for Malicious Domain Names Detection using Feature Selection and Majority Voting Approach

Dharmaraj Rajaram Patil

Abstract


As cyber attacks become more sophisticated, identifying and mitigating bad domain
names has become critical to assuring the security of online environments. This paper
presents a framework for detecting malicious domain names using a feature selection
strategy and a majority vote method. The suggested methodology begins with the
extraction of important features from domain names and their related characteristics,
followed by a rigorous feature selection procedure to determine the most discriminating
attributes. To accomplish feature selection, a variety of feature selection techniques are
used, including chi-square statistics, information gain, gain ratio, and correlation-based
feature selection, to analyse the value of each characteristic in distinguishing benign and
malicious domain names. In addition, a majority voting strategy is utilised to improve
the detection system’s overall accuracy and reliability by combining the predictions of
different classifiers such as AdaBoost, logistic regression, k-nearest neighbours, naive
bayes, and multilayer perceptron. The ensemble of classifiers is trained on the ideal
features, yielding a complete and robust model capable of accurately recognising mali-
cious domain names while minimising false positives. The proposed approach is evalu-
ated against real-world examples of harmful domain names. The suggested framework
employing Chi-square feature selection and majority voting detects malicious domain
names with an accuracy of 99.44%, precision of 99.44%, recall of 99.44%, and f-measure
of 99.44%. The use of feature selection and a majority voting technique improves the
system’s adaptability and resilience in the face emerging cyber threats.

Full Text:

PDF

References


Interisle malicious domain names statistics 4Q 2022. Available

online,https://www.cybercrimeinfocenter. org/malware-landscape-2023.

CSC domain security 2023 report. Available

online, https://www.cscdbs.com/assets/

pdfs/2023-Domain-Security-Report.pdf.

Zhao, Hong, Zhaobin Chang, Guangbin Bao,

and Xiangyan Zeng, Malicious domain names

detection algorithm based on N-gram. Jour-

nal of Computer Networks and Communica-

tions 2019.

Soleymani, Ali, and Fatemeh Arabgol, A

novel approach for detecting DGA-based

botnets in DNS queries using machine learn-ing techniques. Journal of Computer Networks and Communications, 2021, 1–13.

Yang, Luhui, Guangjie Liu, Weiwei Liu,

Huiwen Bai, Jiangtao Zhai, and Yuewei

Dai,Detecting Multielement Algorithmically

Generated Domain Names Based on Adap-

tive Embedding Model, Security and Com-

munication Networks, 2021, 1–20.

Chen, Shaojie, Bo Lang, Yikai Chen, and

Chong Xie, Detection of Algorithmically

Generated Malicious Domain Names with

Feature Fusion of Meaningful Word Segmen-

tation and N-Gram Sequences, Applied Sci-

ences, 13, no. 7,2023, 4406.

Wagan, Atif Ali, Qianmu Li, Zubair Za-

land, Shah Marjan, Dadan Khan Bozdar,

Aamir Hussain, Aamir Mehmood Mirza, and

Mehmood Baryalai, A Unified Learning Ap-

proach for Malicious Domain Name Detec-

tion, Axioms, 12, no. 5, 2023, 458.

Bilge, Leyla, Engin Kirda, Christopher

Kruegel, and Marco Balduzzi, Exposure:

Finding malicious domains using passive

DNS analysis, In Ndss, pp. 1–17, 2011.

Fan, Zhaoshan, Qing Wang, Haoran Jiao,

Junrong Liu, Zelin Cui, Song Liu, and Yuling

Liu, PUMD: a PU learning-based malicious

domain detection framework, Cybersecurity,

, no. 1, 2022, 1–22.

Yang, Luhui, Jiangtao Zhai, Weiwei Liu, Xi-

aopeng Ji, Huiwen Bai, Guangjie Liu, and

Yuewei Dai, Detecting word-based algorith-

mically generated domains using semantic

analysis, Symmetry, 11, no. 2, 2019, 176.

Shi, Yong, Gong Chen, and Juntao Li, Mali-

cious domain name detection based on ex-

treme machine learning, Neural Processing

Letters, 48,2018,1347–1357.

Fu, Yu, Lu Yu, Oluwakemi Hambolu, Ilker

Ozcelik, Benafsh Husain, Jingxuan Sun,

Karan Sapra, Dan Du, Christopher Tate

Beasley, and Richard R. Brooks, Stealthy do-

main generation algorithms, IEEE Transac-

tions on Information Forensics and Security,

, no. 6, 2017, 1430–1443.

Yun, Xiaochun, Ji Huang, Yipeng Wang,

Tianning Zang, Yuan Zhou, and Yongzheng

Zhang, Khaos: An adversarial neural net-

work DGA with high anti-detection ability,

IEEE transactions on information forensics

and security, 15, 2019,, 2225–2240.

Yang, Luhui, Guangjie Liu, Yuewei Dai,

Jinwei Wang, and Jiangtao Zhai, Detecting

stealthy domain generation algorithms using

heterogeneous deep neural network frame-

work, IEEE Access, 8, 2020,82876–82889.

Xu, Congyuan, Jizhong Shen, and Xin Du,

Detection method of domain names gener-

ated by DGAs based on semantic represen-

tation and deep neural network, Computers

& Security, 85, 2019,77–88.

Vinayakumar, R., K. P. Soman, and Praba-

haran Poornachandran, Detecting malicious

domain names using deep learning ap-

proaches at scale, Journal of Intelligent &

Fuzzy Systems, 34, no. 3, 2018,1355–1367.

Yang, Luhui, Guangjie Liu, Jinwei Wang,

Jiangtao Zhai, and Yuewei Dai, A seman-

tic element representation model for mali-

cious domain name detection, Journal of

Information Security and Applications, 66,

,103148.

Marques, Claudio, Benign and malicious do-

mains based on DNS logs, Mendeley Data,

V5, 2021, doi: 10.17632/623sshkdrz.5.

Hall M, Frank E, Holmes G, Pfahringer

B, Reutemann P, Witten IH, The WEKA

data mining software: an update, ACM

SIGKDD explorations newsletter, 2009, Nov

, 11(1),10–8.

Zhai Y, Song W, Liu X, Liu L, Zhao X,

A chi-square statistics based feature selec-

tion method in text classification, In 2018

IEEE 9th International conference on soft-

ware engineering and service science (IC-

SESS), 2018, Nov 23,pp. 160–163, IEEE.

Prasetiyo B, Muslim MA, Baroroh N, Eval-

uation of feature selection using information

gain and gain ratio on bank marketing clas-

sification using Naı̈ve bayes, In Journal of physics: conference series, 2021, Jun 1,Vol. 1918, No. 4, pp. 042153, IOP Publishing.

Qu K, Xu J, Hou Q, Qu K, Sun Y., Fea-

ture selection using Information Gain and de-

cision information in neighborhood decision

system, Applied Soft Computing, 2023, Mar

, 136,110100.

Hall, Mark A., Correlation-based feature se-

lection of discrete and numeric class machine

learning, 2000.

Patil, Dharmaraj R., Tareek M. Patte-

war, Vipul D. Punjabi, and Shailendra M.

Pardeshi, Detecting Fake Social Media Pro-

files Using the Majority Voting Approach,

EAI Endorsed Transactions on Scalable In-

formation Systems,2024.

Schapire RE., Explaining AdaBoost, In Em-

pirical Inference: Festschrift in Honor of

Vladimir N. Vapnik, 201,3 Oct 9, pp. 37–52,.

Berlin, Heidelberg: Springer Berlin Heidel-

berg.

Stoltzfus JC., Logistic regression: a brief

primer, Academic emergency medicine, 2011,

Oct, 18(10), 1099–104.

Peterson LE., K-nearest neighbor, Scholar-

pedia, 2009, Feb 21, 4(2),1883.

Rish, Irina., An empirical study of the naive

Bayes classifier, In IJCAI 2001 workshop on

empirical methods in artificial intelligence,

vol. 3, no. 22, pp. 41–46. 2001.

Tang, Jiexiong, Chenwei Deng, and Guang-

Bin Huang, Extreme learning machine for

multilayer perceptron, IEEE transactions on

neural networks and learning systems, 27, no.

, 2015, 809–821.

Ruta D, Gabrys B., Classifier selection for

majority voting, Information fusion, 2005,

Mar 1, 6(1), 63-81.

Patil, Dharmaraj R., Tareek M. Patte-

war, Vipul D. Punjabi, and Shailendra M.

Pardeshi, Detecting Fake Social Media Pro-

files Using the Majority Voting Approach,

EAI Endorsed Transactions on Scalable In-

formation Systems, 2024.

Patil, Dharmaraj R., and Tareek M. Patte-

war, Majority Voting and Feature Selection

Based Network Intrusion Detection System,

EAI Endorsed Transactions on Scalable In-

formation Systems 9, no. 6,2022: e6-e6.

Patil, Dharmaraj R., Fake news detection us-

ing majority voting technique, arXiv preprint

arXiv:2203.09936, 2022.

Patil, Dharmaraj R., and Jayantro B. Patil,

Malicious URLs detection using decision tree

classifiers and majority voting technique, Cy-

bernetics and Information Technologies 18,

no. 1, 2018: 11-29.

Sokolova M, Lapalme G., A systematic anal-

ysis of performance measures for classifica-

tion tasks, Information processing & man-

agement, 2009, Jul 1, 45(4), 427–37.




DOI: https://doi.org/10.31449/inf.v48i3.5824

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.