Hybrid-MELAu: A Hybrid Mixing Engineered Linguistic Features Framework Based on Autoencoder for Social Bot Detection.

Zineb Ferhat Hamida, Allaoua Refouf, Ahlem Drif, Silvia Giordano

Abstract


Social bots are defined as computer algorithms that generate massive amounts of obnoxious or meaningful information. Most bot detection methods leverage multitudinous characteristics, from network features, temporal dynamics features, activities features, and sentiment features. However, there has been fairly lower work exploring lexicon measurement and linguistic indicators to detect bots. The main purpose of this research is to recognize the social bots through their writing style. Thus, we carried out an exploratory study on the effectiveness of only a set of linguistic features (17 features) ex- ploitable for bot detection, without the need to resort to other types of features. And we develop a novel framework in a hybrid fashion of Mixing Engineered Linguistic features based on Autoencoders (Hybrid-MELAu). The semi-supervised Hybrid-MELAu frame- work is composed of two essential constituents: the features learner and the predictors. We establish the features learner innovated on two powerful structures: a) the first is a Deep dense Autoencoder fed by the Lexical and the Syntactic content (DALS) that represents the high order lexical and syntactic features in latent space, b) the second one is a Glove-BiLSTM autoencoder, which sculpts the semantic features; subsequently, we generate elite elements from the pre-trained encoder part from each latent space with transfer learning. We consider a sample of 1 Million from Cresci datasets to conduct our linguistic analysis comparison between the writing style of humans and bots. With this dataset, we observe that the bot’s textual lexical diversity median is greater than the human one and the syntactic analysis based on speech-tagging shows a creative behavior in human writing style. Finally, we test the model’s robustness on several public dataset (celebrity, pronbots-2019, and political bots). The proposed framework achieves a good accuracy of 92.22%. Overall, the results shown in this paper, and the related discussion, argue that it is possible to discern the differences between humans’ and bots’ writing styles based on an efficient linguistic deep framework.

Full Text:

PDF


DOI: https://doi.org/10.31449/inf.v46i6.4081

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.