Predicting Fraud in Mobile Money Transactions using Machine Learning: The Effects of Sampling Techniques on the Imbalanced Dataset

Francis Effirim Botchey, Zhen Qin, Kwesi Hughes-Lartey, Ernest Kwame Ampomah

Abstract


Mobile Money Fraud is advancing in developing countries. We propose a solution to this problem based on machine learning. Labeled data from financial transactions which include mobile money transactions are, however, skewed towards the negative class. Machine learning models built with such datasets are unreliable as the prediction algorithms will be biased towards the negative class. We investigate the performance of different sampling and weighting techniques such as Adaptive Synthetic Sampling (ADASYN) and Synthetic Minority Oversampling Technique (SMOTE). We select Logistic Regression for the experiments due to its simplicity and relatively low computational needs. The performance is evaluated with different metrics. Manually tuning the weights of the classes achieved the best results in our experiments.


Full Text:

PDF

References


S. Yu and S. Ibtasam, “A qualitative exploration of mobile money in ghana,” in Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies. ACM, 2018, p. 21.

M. Zhdanova, J. Repp, R. Rieke, C. Gaber, and B. Hemery, “No smurfs: Revealing fraud chains in mobile money transfers,” in 2014 Ninth International Conference on Availability, Reliability and Security. IEEE, 2014, pp. 11–20.

http://www.gsma.com/mobilefordevelopment/wp content/uploads/2017/03/GSMA

State-of-the-Industry-Report-on-Mobile-Money 2016.

pdf, Accessed: 2019-05-10.

H. Barros and M. Silveira, “Atlas based sparse logistic

regression for alzheimer’s disease classification,” in 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2017, pp. 501–504.

G. Harshvardhan, N. Venkateswaran, and N. Padmapriya, “Assessment of glaucoma with ocular thermal images using glcm techniques and logistic regression classifier,” in 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE, 2016, pp. 1534–1537.

L. Li, X. Wang, X. Du, Y. Liu, C. Liu, C. Qin, and Y. Li, “Classification of heart sound signals with bp neural network and logistic regression,” in 2017 Chinese Automation Congress (CAC). IEEE, 2017, pp. 7380–7383.

W. Pramesti, I. Damayanti, and D. A. Asfani, “Stator fault

identification analysis in induction motor using multinomial logistic regression,” in 2016 International Seminar on Intelligent Technology and Its Applications (ISITIA). IEEE, 2016, pp. 439–442.

J. Gao, S. Feng, Q. Huang, Z. Zhang, R. Luo, and

Y. Teng, “A study of logistic regression-based discrimination method of false overcurrent alarm of 500kv high voltage shunt reactor,” in 2018 International Conference on Smart Grid and Clean Energy Technologies (ICSGCE). IEEE, 2018, pp. 218–222.

D. Prasetio et al., “Predicting football match results with logistic regression,” in 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA). IEEE, 2016, pp. 1–5.

A. Vaidya, “Predictive and probabilistic approach using logistic regression: Application to prediction of loan approval,” in 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE, 2017, pp. 1–6.

T. Liu and L. Zhang, “Application of logistic regression in web vulnerability scanning,” in 2018 International Conference on Sensor Networks and Signal Processing (SNSP). IEEE, 2018, pp. 486–490.

R. R. Popat and J. Chaudhary, “A survey on credit card fraud detection using machine learning,” in 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, 2018, pp. 1120–1125.

J. L. Perols, R. M. Bowen, C. Zimmermann, and B. Samba,

“Finding needles in a haystack: Using data analytics to improve fraud prediction,” The Accounting Review, vol. 92, no. 2, pp. 221–245, 2016.

A. Fernandez, S. Garcia, F. Herrera, and N. V. Chawla, ´ “Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary,” Journal of artificial intelligence research, vol. 61, pp. 863–905, 2018.

C. S. Lai, Y. Tao, F. Xu, W. W. Ng, Y. Jia, H. Yuan, C. Huang, L. L. Lai, Z. Xu, and G. Locatelli, “A robust correlation analysis framework for imbalanced and dichotomous data with uncertainty,” Information Sciences, vol. 470, pp. 58–77, 2019.

A. Gosain and S. Sardana, “Handling class imbalance problem using oversampling techniques: A review,” in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2017, pp. 79–85.

J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare, “Credit card fraud detection using machine learning techniques: A comparative analysis,” in 2017 International Conference on Computing Networking and Informatics (ICCNI). IEEE, 2017, pp. 1–9.

D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla, “Credit card fraud detection-machine learning methods,” in 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH). IEEE, 2019, pp. 1–5.

G. Rushin, C. Stancil, M. Sun, S. Adams, and P. Beling, “Horse race analysis in credit card fraud—deep learning, logistic regression, and gradient boosted tree,” in 2017 Systems and Information Engineering Design Symposium (SIEDS). IEEE, 2017, pp. 117–121.

S. Maes, K. Tuyls, and B. Vanschoenwinkel, “Machine learning techniques for fraud detection,” Ph.D. dissertation, Master’s thesis, Vrije Universiteit Brussel, Brussels, 2001.

K. Fu, D. Cheng, Y. Tu, and L. Zhang, “Credit card fraud detection using convolutional neural networks,” in International Conference on Neural Information Processing. Springer, 2016, pp. 483–490.

M. Ahmed, A. N. Mahmood, and J. Hu, “A survey of network anomaly detection techniques,” Journal of Network and Computer Applications, vol. 60, pp. 19–31, 2016.

T. Fawcett, “An introduction to roc analysis,” Pattern recognition letters, vol. 27, no. 8, pp. 861–874, 2006.

M. Hossin and M. Sulaiman, “A review on evaluation metrics for data classification evaluations,” International Journal of Data Mining & Knowledge Management Process, vol. 5, no. 2, p. 1, 2015.

J. Ran, G. Zhang, T. Zheng, and W. Wang, “Logistic regression analysis on learning behavior and learning effect based on spoc data,” in 2018 13th International Conference on Computer Science & Education (ICCSE). IEEE, 2018, pp. 1–5.

X. Wang, L. Song, L. Sun, and H. Gao, “Nonparametric estimation of the roc curve based on the bernstein polynomial,” Journal of Statistical Planning and Inference, vol. 203, pp. 39–56, 2019.

R. Zhu and S. Ghosal, “Bayesian semiparametric roc surface estimation under verification bias,” Computational Statistics & Data Analysis, vol. 133, pp. 40–52, 2019.

A. Mokhtari and A. Ribeiro, “Global convergence of online limited memory bfgs,” The Journal of Machine Learning Research, vol. 16, no. 1, pp. 3151–3181, 2015.

M. Ahookhosh, K. Amini, M. Kimiaei, and M. Peyghami, “A limited memory adaptive trust-region approach for large-scale unconstrained optimization,” Bulletin of the Iranian Mathematical Society, vol. 42, no. 4, pp. 819–837, 2016.

E. Lopez-Rojas, A. Elmir, and S. Axelsson, “Paysim: A financial mobile money simulator for fraud detection,” in 28th European Modeling and Simulation Symposium, EMSS, Larnaca. Dime University of Genoa, 2016, pp. 249–255.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in python,” Journal of machine learning research, vol. 12, no. Oct, pp. 2825–2830, 2011.

R. Garreta and G. Moncecchi, Learning scikit-learn: machine learning in python. Packt Publishing Ltd, 2013.

Y. Shuai, Y. Zheng, and H. Huang, “Hybrid software obsolescence evaluation model based on pca-svm gridsearchcv,” in 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS). IEEE, 2018, pp. 449–453.

M. Koziarski, B. Krawczyk, and M. Wozniak, “Radial- ´based oversampling for noisy imbalanced data classification,” Neurocomputing, vol. 343, pp. 19–33, 2019.

M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, “Disease prediction by machine learning over big data from healthcare communities,” Ieee Access, vol. 5, pp. 8869–8879, 2017.

F. Isinkaye, Y. Folajimi, and B. Ojokoh, “Recommendation systems: Principles, methods and evaluation,” Egyptian Informatics Journal, vol. 16, no. 3, pp. 261–273, 2015.

R. Vinayakumar, M. Alazab, K. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, “Deep learning approach for intelligent intrusion detection system,” IEEE Access, vol. 7, pp. 41 525–41 550, 2019.

T. Liu, S. Wang, S. Wu, J. Ma, and Y. Lu, “Predication of wireless communication failure in grid metering automation system based on logistic regression model,” in 2014 China International Conference on Electricity Distribution (CICED). IEEE, 2014, pp. 894–897.




DOI: https://doi.org/10.31449/inf.v45i7.3179

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.