Deep Learning Methods for Ancient Arabic Handwritten Script Recognition: A Review of Challenges and Approaches
Abstract
The problem is made more difficult by the fact that the recognition of ancient handwritten Arabic script (AHR) is written in cursive, has different historical styles, and the manuscripts are often damaged. In addition, ancient handwriting does not follow modern standards of handwriting spacing, which includes overly spaced-out words and overly complex diacritics, which makes it extremely difficult to process. This irregularity causes ambiguity in character segmentation and word boundaries, increasing the error rate in automatic recognition systems. Even with modern advancements in deep learning through the use of CNNs, LSTMs, and hybrid models, AHR is still extremely complex and requires a lot of exploration. Some recent models have achieved accuracy between 70% and 90% on modern Arabic datasets, but performance drops to 50%–75% when applied to ancient texts due to noise, script variation, and limited annotated data. The article consolidates the major issues and recent developments with regard to dataset constraints, preprocessing requirements, and machine learning methodologies. This review is based on the analysis of over 50 peer-reviewed papers published between 2016 and 2024. It is also focused on the importance of deep learning in the image feature extraction by CNNs, sequential feature modeling by LSTMs, and combination of both – hybrids. For instance, CNN-LSTM architectures have shown promising results on historical scripts with limited training data. With so little annotated data available, it concentrates on the augmentation of datasets and creation of synthetic data. Techniques such as elastic distortions, GAN-generated samples, and noise injection are discussed as potential solutions. This work aims to improve the accuracy and scalability of AHR through analysis of existing techniques and identification of the gaps for further research to aid in digitization and analysis of manuscripts to safeguard them as a part of cultural heritage. In particular, this review highlights the lack of standardized benchmarks and the need for multilingual ancient Arabic datasets to support reproducible research.
Full Text:
PDFReferences
R. S. Khudeyer and N. M. Al-Moosawi, “Combination of Machine Learning Algorithms and ResNet50 for Arabic Handwritten Classification,” *Informatica*, vol. 46, no. 9, pp. 39–44, 2022. url{https://doi.org/10.31449/inf.v46i9.4375}.
Babbel Magazine. *Babbel Magazine*. Available at: url{https://www.babbel.com/en/magazine}. Accessed: March 09, 2021.
N. Altwaijry and I. Al-Turaiki, “Arabic Handwriting Recognition System Using Convolutional Neural Network,” *Neural Computing and Applications*, vol. 33, no. 1, Apr. 2021. doi: url{https://doi.org/10.1007/s00521-020-05070-8}.
I. P. de Sousa, “Convolutional Ensembles for Arabic Handwritten Character and Digit Recognition,” *PeerJ Computer Science*, vol. 2018, no. 10, p. e167, Oct. 2018. doi: url{https://doi.org/10.7717/peerj-cs.167}.
C. Boufenar, A. Kerboua, and M. Batouche, “Investigation on Deep Learning for Off-line Handwritten Arabic Character Recognition,” *Cognitive Systems Research*, vol. 50, pp. 180–195, Aug. 2018. doi: url{https://doi.org/10.1016/j.cogsys.2017.11.002}.
R. Ahmed et al., “Offline Arabic Handwriting Recognition Using Deep Machine Learning: A Review of Recent Advances,” in *Lecture Notes in Computer Science*, vol. 11691, pp. 457–468, 2020. doi: url{https://doi.org/10.1007/978-3-030-39431-8_44}.
H. Q. Ghadhban, M. Othman, N. A. Samsudin, M. N. Bin Ismail, and M. R. Hammoodi, “Survey of Offline Arabic Handwriting Word Recognition,” in *Advances in Intelligent Systems and Computing*, vol. 978, pp. 358–372, 2020. doi: url{https://doi.org/10.1007/978-3-030-36056-6_34}.
N. Alrobah and S. Albahli, “Arabic Handwritten Recognition Using Deep Learning: A Survey,” *Arabian Journal for Science and Engineering*, vol. 47, no. 10, Jan. 2022. doi: url{https://doi.org/10.1007/s13369-021-06363-3}.
R. Hussain, A. Raza, I. Siddiqi, K. Khurshid, and C. Djeddi, “A Comprehensive Survey of Handwritten Document Benchmarks: Structure, Usage and Evaluation,” *Eurasip Journal on Image and Video Processing*, vol. 2015, no. 1, pp. 1–24, Dec. 2015. doi: url{https://doi.org/10.1186/s13640-015-0102-5}.
A. El Sawy, H. El-Bakry, and M. Loey, “Arabic Handwritten Characters Dataset (AHCD),” 2015.
N. Lamghari and S. Raghay, “DBAHCL: Database for Arabic Handwritten Characters and Ligatures,” Cadi Ayyad University, May 2017. [Online]. Available: url{https://www.researchgate.net/publication/317204941_DBAHCL_database_for_Arabic_handwritten_characters_and_ligatures}.
S. A. Mahmoud et al., “KHATT: An Open Arabic Offline Handwritten Text Database,” *Pattern Recognition*, vol. 47, no. 3, pp. 1096–1112, 2014. doi: url{https://doi.org/10.1016/j.patcog.2013.08.009}.
“IFN/ENIT-Database of Handwritten Arabic Words,” *ResearchGate*, 2021. [Online]. Available: url{https://www.researchgate.net/publication/228904501_IFNENIT-database_of_handwritten_Arabic_words}.
S. Al-Ma’adeed, D. Elliman, and C. Higgins, “A Data Base for Arabic Handwritten Text Recognition Research,” 2004.
K. Adam, A. Baig, S. Al-Maadeed, A. Bouridane, and S. El-Menshawy, “KERTAS: Dataset for Automatic Dating of Ancient Arabic Manuscripts,” *International Journal on Document Analysis and Recognition (IJDAR)*, vol. 21, pp. 283–290, 2018. doi: url{https://doi.org/10.1007/s10032-018-0312-3}.
C. Vidal-Gorène, N. Lucas, C. Salah, A. Decours-Perez, and B. Dupin, “RASAM - A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi,” in *Document Analysis and Recognition – ICDAR 2021 Workshops*, vol. 12916, pp. 265–281, 2021. doi: url{https://doi.org/10.1007/978-3-030-86198-8_19}.
“The Quran Dataset,” *Kaggle*, 2021. [Online]. Available: url{https://www.kaggle.com/datasets/imrankhan197/the-quran-dataset}.
Ahmed El-Sawy, Mohamed Loey, and Hazem EL-Bakry. “Arabic Handwritten Characters Recognition using Convolutional Neural Network”. In: *WSEAS Transactions on Computer Research* 5 (2017), pp. 11-19. ISSN: 2415-1513. url{https://www.researchgate.net/publication/313891953_Arabic_Handwritten_Characters_Recognition_using_Convolutional_Neural_Network}.
K. S. Younis, “Arabic Handwritten Character Recognition Based on Deep Convolutional Neural Networks,” *Jordanian Journal of Computers and Information Technology (JJCIT)*, vol. 3, no. 3, pp. 186–198, 2017. [Online]. Available: url{https://www.jjcit.org/papers/vol3no3/vol3no3.pdf}.
A. T. Al-Taani and S. T. Ahmad, “Recognition of Arabic Handwritten Characters Using Residual Neural Networks,” *Jordanian Journal of Computers and Information Technology (JJCIT)*, vol. 7, no. 2, pp. 192–202, June 2021.
A. Bin Durayhim et al., “Towards Accurate Children’s Arabic Handwriting Recognition via Deep Learning,” *Applied Sciences*, vol. 13, no. 3, p. 1692, 2023. doi: url{https://doi.org/10.3390/app13031692}.
Z. Ullah and M. Jamjoom, “An Intelligent Approach for Arabic Handwritten Letter Recognition Using Convolutional Neural Network,” *PeerJ Computer Science*, vol. 8, p. e995, 2022. doi: url{https://doi.org/10.7717/peerj-cs.995}.
A. Alsayed et al., “Arabic Handwritten Character Recognition Using Convolutional Neural Networks,” *Springer Nature*, 2023. doi: url{https://doi.org/10.21203/rs.3.rs-3141935/v1}.
Y. Cao, C. Shi, X. Li, M. Li, and J. Bian, “Unbalanced Position Recognition of Rotor Systems Based on Long and Short-Term Memory Neural Networks,” *Machines*, vol. 12, no. 12, p. 865, 2024. doi: url{https://doi.org/10.3390/machines12120865}.
R. Maalej, N. Tagougui, and M. Kherallah, “Recognition of Handwritten Arabic Words with Dropout Applied in MDLSTM,” in *Lecture Notes in Computer Science*, vol. 9730, pp. 746–752, 2016. doi: url{https://doi.org/10.1007/978-3-319-41501-7_83}.
R. Maalej and M. Kherallah, “Improving MDLSTM for Offline Arabic Handwriting Recognition Using Dropout at Different Positions,” in *Lecture Notes in Computer Science*, vol. 9887, pp. 431–438, 2016. doi: url{https://doi.org/10.1007/978-3-319-44781-0_51}.
R. Maalej and M. Kherallah, “Maxout into MDLSTM for Offline Arabic Handwriting Recognition,” in *Lecture Notes in Computer Science*, vol. 11955, pp. 534–545, Dec. 2019. doi: url{https://doi.org/10.1007/978-3-030-36718-3_45}.
R. Alkhawaldeh, “Arabic (Indian) Digit Handwritten Recognition Using Recurrent Transfer Deep Architecture,” *Soft Computing*, pp. 1–11, 2021. doi: url{https://doi.org/10.1007/s00500-020-05368-8}.
M. Dahbali, N. Aboutabit, and N. Lamghari, “A Hybrid Model for Arabic Script Recognition Based on CNN-CBAM and BLSTM,” *Jordanian Journal of Computers and Information Technology (JJCIT)*, vol. 10, no. 3, pp. 294–303, Sept. 2024.
R. Ahmad, S. Naz, M. Afzal, M. Liwicki, and A. Dengel, “A Deep Learning Based Arabic Script Recognition System: Benchmark on KHAT,” *International Arab Journal of Information Technology*, vol. 17, no. 3, 2020. doi: url{https://doi.org/10.34028/iajit/17/3/3}.
R. Maalej and M. Kherallah, “Convolutional Neural Network and BLSTM for Offline Arabic Handwriting Recognition,” Mar. 2019. doi: url{https://doi.org/10.1109/ACIT.2018.8672667}.
A. Khémiri, A. K. Echi, and M. Elloumi, “Bayesian Versus Convolutional Networks for Arabic Handwriting Recognition,” *Arabian Journal for Science and Engineering*, vol. 44, no. 11, pp. 9301–9319, Nov. 2019. doi: url{https://doi.org/10.1007/s13369-019-03939-y}.
M. Amrouch, M. Rabi, and Y. Es-Saady, “Convolutional Feature Learning and CNN Based HMM for Arabic Handwriting Recognition,” in *Lecture Notes in Computer Science*, vol. 10884, pp. 265–274, 2018. doi: url{https://doi.org/10.1007/978-3-319-94211-7_29}.
M. Awni, M. I. Khalil, and H. M. Abbas, “Deep-learning ensemble for offline arabic handwritten words recognition,” in *Proceedings - ICCES 2019: 2019 14th International Conference on Computer Engineering and Systems*, pp. 40--45, Dec. 2019. doi: url{https://doi.org/10.1109/ICCES48960.2019.9068184}.
Ahmed Alruwaili, Sardar M. N. Islam, and Iqbal Gondal. *Cybersecurity for Robotic and Autonomous Vehicles*. Taylor & Francis, 2023. url{https://www.taylorfrancis.com/books/mono/10.1201/9781003610908/cybersecurity-robotic-autonomous-vehicles/-ahmed-alruwaili-sardar-islam-iqbal-gondal}. DOI: texttt{10.1201/9781003610908}.
DOI: https://doi.org/10.31449/inf.v49i28.8920

This work is licensed under a Creative Commons Attribution 3.0 License.