Enhanced Cybercrime Detection on Twitter Using Aho-Corasick Algorithm and Machine Learning Techniques
Abstract
The proposed work objective is to adapt Online social networking (OSN) is a type of interactive computer-mediated technology that allows people to share information through virtual networks. The microblogging feature of Twitter makes cyberspace prominent (usually accessed via the dark web).
The work used the datasets and considered the Scrape Twitter Data (Tweets) in Python using the SN-Scrape module and Twitter 4j API in JAVA to extract social data based on hashtags, which is used to select and access tweets for dataset design from a profile on the Twitter platform based on locations, keywords, and hashtags. The experiments contain two datasets. The first dataset has over 1700 tweets with a focus on location as a keypoint (hacking-for-fun data, cyber-violence data, and vulnerability injector data), whereas the second dataset only comprises 370 tweets with a focus on reposting of tweet status as a keypoint.
The method used is focused on a new system model for analysing Twitter data and detecting terrorist attacks. The weights of susceptible keywords are found using a ternary search by the Aho-Corasick algorithm (ACA) for conducting signature and pattern matching.
The result represents the ACA used to perform signature matching for assigning weights to extracted words of tweet. ML is used to evaluate Twitter data for classifying patterns and determining the behaviour to identify if a person is a terrorist. SVM (Support Vector Machine) proved to be a more accurate classifier for predicting terrorist attacks compared to other classifiers (KNN- K-Nearest Neighbour and NB-Naïve Bayes). The 1st dataset shows the KNN-Acc. -98.38% and SVM Accuracy as 98.85%, whereas the 2nd dataset shows the KNN-Acc. -91.68% and SVM Accuracy as 93.97%.
The proposed work concludes that the generated weights are classified (cyber-violence, vulnerability injector, and hacking-for-fun) for further feature classification. Machine learning (ML) [KNN and SVM] is used to predict the occurrence and incident of crime. The accuracy and efficacy are evaluated using several parameters in the model.
Full Text:
PDFReferences
. Sarker, A., Chakraborty, P., Sha, S. S., Khatun, M., Hasan, M. R., & Banerjee, K. (2020). Improvised technique for analyzing data and detecting terrorist attack using machine learning approach based on twitter data. Journal of Computer and Communications, 8(7), 50-62.doi: 10.4236/jcc.2020.87005
. Nandhini, B. S., & Sheeba, J. I. (2015). Online social network bullying detection using intelligence techniques. Procedia Computer Science, 45, 485-492.https://doi.org/10.1016/j.procs.2015.03.085
. Galán-García, P., Puerta, J. G. D. L., Gómez, C. L., Santos, I., & Bringas, P. G. (2016). Supervised machine learning for the detection of troll profiles in twitter social network: Application to a real case of cyberbullying. Logic Journal of the IGPL, 24(1), 42-53.https://doi.org/10.1093/jigpal/jzv048
. Rathi, S. K., Keswani, B., Saxena, R. K., Kapoor, S. K., Gupta, S., & Rawat, R. (Eds.). (2024). Online Social Networks in Business Frameworks. John Wiley & Sons.https://onlinelibrary.wiley.com/doi/book/10.1002/9781394231126
. Elghanuni, R. H., Ali, M. A., & Swidan, M. B. (2019, August). An overview of anomaly detection for online social network. In 2019 IEEE 10th Control and System Graduate Research Colloquium (ICSGRC) (pp. 172-177). IEEE.DOI: 10.1109/ICSGRC.2019.8837066
. Kaddoura, S., & Henno, S. (2024). Dataset of Arabic spam and ham tweets. Data in Brief, 52, 109904.https://doi.org/10.1016/j.dib.2023.109904
. Lal, S., Tiwari, L., Ranjan, R., Verma, A., Sardana, N., & Mourya, R. (2020). Analysis and classification of crime tweets. Procedia computer science, 167, 1911-1919.https://doi.org/10.1016/j.procs.2020.03.211
. Rasheed, J., Akram, U., & Malik, A. K. (2018, December). Terrorist network analysis and identification of main actors using machine learning techniques. In Proceedings of the 6th international conference on information technology: IoT and smart city (pp. 7-12).https://doi.org/10.1145/3301551.3301573
. Mashechkin, I. V., Petrovskiy, M. I., Tsarev, D. V., & Chikunov, M. N. (2019). Machine learning methods for detecting and monitoring extremist information on the Internet. Programming and Computer Software, 45(3), 99-115.https://doi.org/10.1134/S0361768819030058
. Ji, X., Chun, S. A., Wei, Z., & Geller, J. (2015). Twitter sentiment classification for measuring public health concerns. Social Network Analysis and Mining, 5, 1-25.https://doi.org/10.1007/s13278-015-0253-5
. Ourlis, L., & Bellala, D. (2019). SIMD Implementation of the Aho-Corasick Algorithm Using Intel AVX2. Scalable Computing: Practice and Experience, 20(3), 563-576.https://doi.org/10.12694/scpe.v20i3.1572
. Tam, S., & Tanriöver, Ö. Ö. (2023). Multimodal Deep Learning Crime Prediction Using Tweets. IEEE Access, 11, 93204-93214.DOI: 10.1109/ACCESS.2023.3308967
. Agarwal, P., Sharma, M., & Chandra, S. (2019, August). Comparison of machine learning approaches in the prediction of terrorist attacks. In 2019 Twelfth International Conference on Contemporary Computing (IC3) (pp. 1-7). IEEE.DOI: 10.1109/IC3.2019.8844904
. Lin, W. C., Ke, S. W., & Tsai, C. F. (2015). CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-based systems, 78, 13-21.https://doi.org/10.1016/j.knosys.2015.01.009
. Badri, N., Kboubi, F., & Habacha Chaibi, A. (2024). Abusive and Hate speech Classification in Arabic Text Using Pre-trained Language Models and Data Augmentation. ACM Transactions on Asian and Low-Resource Language Information Processing.https://doi.org/10.1145/3679049
. Zulkarnine, A. T., Frank, R., Monk, B., Mitchell, J., & Davies, G. (2016, September). Surfacing collaborated networks in dark web to find illicit and criminal content. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI) (pp. 109-114). IEEE.DOI: 10.1109/ISI.2016.7745452
. Saini, S., Punhani, R., Bathla, R., & Shukla, V. K. (2019, April). Sentiment analysis on twitter data using R. In 2019 International Conference on Automation, Computational and Technology Management (ICACTM) (pp. 68-72). IEEE.DOI: 10.1109/ICACTM.2019.8776685
. Silivery, A. K., Rao, K. R. M., & Kumar, S. L. (2024). Rap-Densenet Framework for Network Attack Detection and Classification. Journal of Information & Knowledge Management, 2450033.https://doi.org/10.1142/S0219649224500333
. L'huillier, G., Alvarez, H., Ríos, S. A., & Aguilera, F. (2011). Topic-based social network analysis for virtual communities of interests in the dark web. ACM SIGKDD Explorations Newsletter, 12(2), 66-73.'https://doi.org/10.1145/1964897.1964917
. Rawat, R., & Rajavat, A. (2024). Perceptual Operating Systems for the Trade Associations of Cyber Criminals to Scrutinize Hazardous Content. International Journal of Cyber Warfare and Terrorism (IJCWT), 14(1), 1-19.DOI: 10.4018/IJCWT.343314.
. Godawatte, K., Raza, M., Murtaza, M., & Saeed, A. (2019, December). Dark Web along with the dark Web marketing and surveillance. In 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) (pp. 483-485). IEEE.DOI: 10.1109/PDCAT46702.2019.00095
. Cai, Y. (2024). Multi pattern matching algorithm for embedded computer network engineering intrusion detection system. Intelligent Decision Technologies, 18(2), 705-716.DOI: 10.3233/IDT-230249.
. Abdalrdha, Z. K., Al-Bakry, A. M., & Farhan, A. K. (2023, December). Crimes Tweet Detection Based on CNN Hyperparameter Optimization Using Snake Optimizer. In National Conference on New Trends in Information and Communications Technology Applications (pp. 207-222). Cham: Springer Nature Switzerland.https://doi.org/10.1007/978-3-031-62814-6_15
. Kini, S., Patil, A. P., Pooja, M., & Balasubramanyam, A. (2022, May). SQL Injection Detection and Prevention using Aho-Corasick Pattern Matching Algorithm. In 2022 3rd International Conference for Emerging Technology (INCET) (pp. 1-6). IEEE.DOI: 10.1109/INCET54531.2022.9825040
. Rawat, R., Chakrawarti, R. K., Raj, A. S. A., Mani, G., Chidambarathanu, K., & Bhardwaj, R. (2023). Association rule learning for threat analysis using traffic analysis and packet filtering approach. International Journal of Information Technology, 15(6), 3245-3255.https://doi.org/10.1007/s41870-023-01353-0
. Felix Enigo, V. S. (2020). An Automated System for Crime Investigation Using Conventional and Machine Learning Approach. In Innovative Data Communication Technologies and Application: ICIDCA 2019 (pp. 109-117). Springer International Publishing.https://doi.org/10.1007/978-3-030-38040-3_12
. Abdalrdha, Z. K., Al-Bakry, A. M., & Farhan, A. K. (2023, December). Improving the CNN Model for Arabic Crime Tweet Detection Based on an Intelligent Dictionary. In 2023 16th International Conference on Developments in eSystems Engineering (DeSE) (pp. 748-753). IEEE.DOI: 10.1109/DeSE60595.2023.10469560
. Rawat, R., Oki, O. A., Sankaran, K. S., Olasupo, O., Ebong, G. N., & Ajagbe, S. A. (2023). A new solution for cyber security in big data using machine learning approach. In Mobile Computing and Sustainable Informatics: Proceedings of ICMCSI 2023 (pp. 495-505). Singapore: Springer Nature Singapore.https://doi.org/10.1007/978-981-99-0835-6_35
. Taiwo, G. A., Saraee, M., & Fatai, J. (2024). Crime Prediction Using Twitter Sentiments and Crime Data. Informatica, 48(6). https://doi.org/10.31449/inf.v48i6.4749
. Liu, Y., & Pan, B. (2024). Profit Estimation Model and Financial Risk Prediction Combining Multi-scale Convolutional Feature Extractor and BGRU Model. Informatica, 48(11).https://doi.org/10.31449/inf.v48i11.5941
. Sabir, A., Ali, H. A., & Aljabery, M. A. (2024). ChatGPT Tweets Sentiment Analysis Using Machine Learning and Data Classification. Informatica, 48(7).https://doi.org/10.31449/inf.v48i7.5535
DOI: https://doi.org/10.31449/inf.v48i18.6272
This work is licensed under a Creative Commons Attribution 3.0 License.