Hybrid-MELAu: A Hybrid Mixing Engineered Linguistic Features Framework  Based on Autoencoder for Social Bot Detection.

Zineb Ferhat Hamida; Allaoua Refouf; Ahlem Drif; Silvia Giordano

doi:10.31449/inf.v46i6.4081

Contact Editors Europe, Africa:
Matjaz Gams
N. and S. America:
Karthick Gunasekaran
Asia, Australia:
Vinay Singh
Overview papers:
Maria Ganzha
Wiesław Pawlowski
Aleksander Denisiuk Abstacting / Indexing

Informatica is surveyed by:

ACM Digital Library
Citeseer
COBISS
Compendex
Computer & Information Systems Abstracts
Computer Database
Computer Science Index
dLib.si
DBLP Computer Science Bibliography
Directory of Open Access Journals
Google Scholar
InfoTrac OneFile
Inspec
Linguistic and Language Behaviour Abstracts
Mathematical Reviews, MatSciNet, MatSci on SilverPlatter and Current Mathematical Publications
Scopus Publishing

Informatica is published by:

Support

Informatica is supported by:

ACM Slovenia
Slovenian Society for Pattern Recognition
Slovenian Artificial Intelligence Society
Slovenian Society for Cognitive Science
Slovenian Society of Mathematicians, Physicists and Astronomers
Automatic Control Society of Slovenia
Slovenian Academy of Engineering
International Federation for Information Processing

Journal Help

User

Journal Content Search
Browse

Information

Notifications

About The Authors

Zineb Ferhat Hamida
Department of Computer Science, University of Sétif 1, Sétif, Algeria
Algeria

Allaoua Refouf
Department of Computer Science, University of Sétif 1, Sétif, Algeria
Algeria

Ahlem Drif
Department of Computer Science, University of Sétif 1, Sétif, Algeria
Algeria

Silvia Giordano
Networking Lab, SUPSI University of Applied Sciences of Southern Switzerland Lugano, Switzerland
Switzerland

Support & Indexing

Hybrid-MELAu: A Hybrid Mixing Engineered Linguistic Features Framework Based on Autoencoder for Social Bot Detection.

Zineb Ferhat Hamida, Allaoua Refouf, Ahlem Drif, Silvia Giordano

Abstract

Social bots are defined as computer algorithms that generate massive amounts of obnoxious or meaningful information. Most bot detection methods leverage multitudinous characteristics, from network features, temporal dynamics features, activities features, and sentiment features. However, there has been fairly lower work exploring lexicon measurement and linguistic indicators to detect bots. The main purpose of this research is to recognize the social bots through their writing style. Thus, we carried out an exploratory study on the effectiveness of only a set of linguistic features (17 features) ex- ploitable for bot detection, without the need to resort to other types of features. And we develop a novel framework in a hybrid fashion of Mixing Engineered Linguistic features based on Autoencoders (Hybrid-MELAu). The semi-supervised Hybrid-MELAu frame- work is composed of two essential constituents: the features learner and the predictors. We establish the features learner innovated on two powerful structures: a) the first is a Deep dense Autoencoder fed by the Lexical and the Syntactic content (DALS) that represents the high order lexical and syntactic features in latent space, b) the second one is a Glove-BiLSTM autoencoder, which sculpts the semantic features; subsequently, we generate elite elements from the pre-trained encoder part from each latent space with transfer learning. We consider a sample of 1 Million from Cresci datasets to conduct our linguistic analysis comparison between the writing style of humans and bots. With this dataset, we observe that the bot’s textual lexical diversity median is greater than the human one and the syntactic analysis based on speech-tagging shows a creative behavior in human writing style. Finally, we test the model’s robustness on several public dataset (celebrity, pronbots-2019, and political bots). The proposed framework achieves a good accuracy of 92.22%. Overall, the results shown in this paper, and the related discussion, argue that it is possible to discern the differences between humans’ and bots’ writing styles based on an efficient linguistic deep framework.

Full Text:

PDF

DOI: https://doi.org/10.31449/inf.v46i6.4081

This work is licensed under a Creative Commons Attribution 3.0 License.

Informatica is financially supported by the Slovenian research agency from the Call for co-financing of scientific periodical publications.

Webmaster: Mario Konecki

Username
Password
Remember me