Facets of Fakes in Cyberspace: Machine and Ensemble Learning-Based Decisions and Detections

Ram Chatterjee, Mrinal Pandey, Hardeo Kumar Thakur, Anand Gupta

Abstract


Fake online reviews hinder internet marketing efforts to build businesses and brands in a competitive market with changing consumer expectations. This helps brands attract clients, making fake online reviews hard to uncover. Hence, fake reviews and websites are extensively examined. AI models like the Generalized Additive2 Model (GA2M) and its ensemble with the Elastic-net Classifier model have been studied using Log-Loss metric. This research, analysis, and depiction help demarcate bogus hotel reviews and websites from genuine entities. The paper uses ML classifiers (Decision Tree, Logistic Regression, Naïve Bayes) and ensemble models (Random Forest, Gradient Boosting) to identify legitimate websites using binary classification. This article compares ML classifiers and ensemble models by accuracy, precision, recall, f1-score, and ROC-AUC to evaluate their pros and downsides. Elastic-Net Classifier (L2 / Binomial Deviance) with score of 0.2879 outperformed GA2M model by 0.66% in LogLoss holdout score on Hotel dataset. LogLoss predicts values better than ROC-AUC due to its closer proximity to predicting actual values. Elastic-Net Classifier (L2 / Binomial Deviance) surpassed GA2M in F1 score, precision, and accuracy by 0.4%, 1.84%, and 0.63%. Ensemble techniques outperform ML classifiers in the Fraudulent and Legitimate Online Shops dataset with ROC-AUC scores of 0.71%, 1.73%, 0.76%, 1.10%, and 0.63% using 50% to 90% training datasets and 50% to 10% holdout datasets.

Full Text:

PDF

References


Allcott, Hunt, and Matthew Gentzkow. "Social media and fake news in the 2016 election." Journal of economic perspectives 31, no. 2 (2017): 211-236.

Chesney, Robert, and Danielle Citron. "Deepfakes and the new disinformation war: The coming age of post-truth geopolitics." Foreign Aff. 98 (2019): 147.

Shoaib, Mohamed R., Zefan Wang, Milad Taleby Ahvanooey, and Jun Zhao. "Deepfakes, misinformation, and disinformation in the era of frontier ai, generative ai, and large ai models." In 2023 International Conference on Computer and Applications (ICCA), pp. 1-7. IEEE, 2023.

Thakur, Hardeo Kumar, Anand Gupta, Ayushi Bhardwaj, and Devanshi Verma. "Rumor detection on Twitter using a supervised machine learning framework." International Journal of Information Retrieval Research (IJIRR) 8, no. 3 (2018): 1-13.

Chatterjee, Ram, Mrinal Pandey, Hardeo Kumar Thakur, and Anand Gupta. "Checking Counterfeit Critiques on Commodities using Ensemble Classifiers Enhancing Information Credibility." Procedia Computer Science 233 (2024): 570-579.

Ott, M., Cardie, C., & Hancock, J. T.: Negative deceptive opinion spam. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 497-501 (2013).

Ott, M., Choi, Y., Cardie, C., & Hancock, J. T.: Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557, (2011).

Islam, Md Sajadul, Mst Nusrat Jahan Jyoti, Md Solaiman Mia, and Md Gulzar Hussain. "Fake Website Detection Using Machine Learning Algorithms." In 2023 International Conference on Digital Applications, Transformation & Economy (ICDATE), pp. 255-259. IEEE, 2023.

Jindal, N., & Liu, B.: Analyzing and detecting review spam. In Seventh IEEE international conference on data mining, ICDM 2007, IEEE, pp 547-552 (2007).

Jindal, N., & Liu, B.: Review spam detection. In Proceedings of the 16th international conference on World Wide Web, pp. 1189-1190 (2007).

Ott, M., Cardie, C., & Hancock, J.: Estimating the prevalence of deception in online review communities. In Proceedings of the 21st international conference on World Wide Web, pp 201-210 (2012).

Li, Jiandun, Pin Lv, Wei Xiao, Liu Yang, and Pengpeng Zhang. "Exploring groups of opinion spam using sentiment analysis guided by nominated topics." Expert Systems with Applications 171 (2021): 114585.

Ren, Yafeng, and Donghong Ji. "Learning to detect deceptive opinion spam: A survey." IEEE Access 7 (2019): 42934-42945.

Chatterjee Ram. “Adoption rate of emerging technologies in organizations worldwide as of 2020.” https://www.statista.com/statistics/661164/worldwide-cio-surveyoperational-priorities/. Accessed 31 August 2024.

Weng, Haiqin, Zhao Li, Shouling Ji, Chen Chu, Haifeng Lu, Tianyu Du, and Qinming He. "Online e-commerce fraud: a large-scale detection and analysis." In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1435-1440. IEEE, 2018.

Jain, Ankit Kumar, and B. B. Gupta. "A survey of phishing attack techniques, defence mechanisms and open research challenges." Enterprise Information Systems 16, no. 4 (2022): 527-565.

Ren, Ye, Le Zhang, and Ponnuthurai N. Suganthan. "Ensemble classification and regression-recent developments, applications and future directions." IEEE Computational intelligence magazine 11, no. 1 (2016): 41-53.

Shahariar, G. M., Swapnil Biswas, Faiza Omar, Faisal Muhammad Shah, and Samiha Binte Hassan. "Spam review detection using deep learning." In 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 0027-0033. IEEE, 2019.

Kowsari, Kamran, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, and Donald Brown. "Text classification algorithms: A survey." Information 10, no. 4 (2019): 150.

Asaad, Wesam Hameed, Ragheed Allami, and Yossra Hussain Ali. "Fake Review Detection Using Machine Learning." Revue d'Intelligence Artificielle 37, no. 5 (2023).

Sánchez-Paniagua, Manuel, Eduardo Fidalgo, Enrique Alegre, and Francisco Jáñez-Martino. "Fraudulent e-commerce websites detection through machine learning." In Hybrid Artificial Intelligent Systems: 16th International Conference, HAIS 2021, Bilbao, Spain, September 22–24, 2021, Proceedings 16, pp. 267-279. Springer International Publishing, 2021.

Kirkpatrick, Keith. "Battling algorithmic bias: how do we ensure algorithms treat us fairly?." Communications of the ACM 59, no. 10 (2016): 16-17.

Chatterjee Ram. “Deceptive Opinion Spam Corpus.” https://www.kaggle.com/datasets/rtatman/deceptive-opinion-spam-corpus. Accessed 31 August 2024.

Chatterjee Ram. “Fraudulent and Legitimate Online Shops Dataset.” https://data.mendeley.com/datasets/m7xtkx7g5m/1. Accessed 31 August 2024.

Chatterjee Ram. “What is a Generalized Additive Model?” https://towardsdatascience.com/generalised-additive-models-6dfbedf1350a. Accessed 31 August 2024.

Chatterjee Ram. “LASSO (L1) Vs Ridge (L2) Vs Elastic Net Regularization For Classification Model.” https://pub.towardsai.net/lasso-l1-vs-ridge-l2-vs-elastic-net-regularization-for-classification-model-409c3d86f6e9. Accessed 31 August 2024.

Chatterjee Ram. “AUC VS LOG LOSS.” https://datamachines.com/blog/auc-vs-log-loss. Accessed 31 August 2024.

Chatterjee Ram. “Intuition behind Log-loss score.” Towards Data Science. https://towardsdatascience.com/intuition-behind-log-loss-score-4e0c9979680a. Accessed 31 August 2024.

Chatterjee Ram. “How to Develop Elastic Net Regression Models in Python.” Machine Learning Mastery. https://machinelearningmastery.com/elastic-net-regression-in-python/. Accessed 31 August 2024.

Chatterjee Ram. “Choosing the Right Metric for Evaluating Machine Learning Models — Part 2.” Medium. https://medium.com/usf-msds/choosing-the-right-metric-for-evaluating-machine-learning-models-part-2-86d5649a5428




DOI: https://doi.org/10.31449/inf.v49i13.7050

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.