Hybrid Context Aware Gujarati Spell Correction Using Norvig Algorithm, GRU, and IndicBERT

Brijeshkumar Y Panchal, Apurva Shah

Abstract


Numerous applications in the domain of Natural Language Processing (NLP) rely on spelling and grammatical checks, including email, opinion mining, text summarization, chatbots, and countless more. An individual's credibility, cybersecurity efforts, legal ambiguities, and NLP application performance can all take a hit if they make a mistake when dealing with regional languages such as Assamese, Gujarati, Hindi, etc. In order to lessen the frequency of spelling errors, this article examines and concentrates on Gujarati. In addition to a thorough examination of issues related to the Gujarati language, this article provides up-to-date strategies for fixing spelling mistakes based on context of the word. A novel hybrid approach ensures top-notch Gujarati context aware spelling verification. After thoroughly considering all the suggestions, we used a two-layer GRU network and the IndicBERTv2-SS model, which was fine-tuned only on our curated Gujarati dataset of about 20,000 sentences (70/15/15 split into training, validation, and test), to choose the best correction while keeping the context in mind. Normalization for Gujarati (diacritics, compound characters, and numbers), regex-based tokenization, and edit-distance candidate creation were all part of preprocessing. We used accuracy, precision, and recall to assess the test split. Our proposed IndicBERT-GUJBRIJAPU tool got 93.49% accuracy, 94.46% precision, and 91.59% recall, which is much better than other approaches for context-aware correction


Full Text:

PDF

References


N. G. Patel and D. D. B. Patel, "Research review of Rule Based Gujarati Grammar Implementation with the Concepts of Natural Language Processing (NLP)," Journal of Emerging Technologies and Innovative Research (JETIR), vol. 5, no. 9, 2018.

N. P. Desai and V. K. Dabhi, "Resources and components for Gujarati NLP systems: a survey.," Artificial Intelligence Review , vol. 55, pp. 1-19, 2022.

H. Patel, B. Patel and K. Lad, "Jodani: A spell checking and suggesting tool for Gujarati language," in 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2021.

S. Singh and S. Singh., "HINDIA: a deep-learning-based model for spell-checking of Hindi language," Neural Computing and Applications, vol. 33, no. 8, pp. 3825-3840, 2021.

M. Gokani and R. Mamidi, "GSAC: A Gujarati Sentiment Analysis Corpus from Twitter," in Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Association for Computational Linguistics, 2023.

J. Baxi and B. Bhatt., "GujMORPH-ADatasetforCreatingGujaratiMorphological Analyzer," in ProceedingsoftheThirteenthLanguageResourcesandEvaluationConference, 2022.

A. A. Desai, "Gujarati handwritten numeral optical character reorganization through neural network.," Pattern recognition, vol. 43, no. 7, pp. 2582-2589, 2010.

S. Antani and L. Agnihotri, "Gujarati character recognition," in Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99, Bangalore, India,, 1999.

C. P. B. Tailor, "Chunker for Gujarati Language Using Hybrid Approach," in Rising Threats in Expert Applications and Solutions. Advances in Intelligent Systems and Computing, 2021.

K. Suba, D. Jiandani and P. Bhattacharyya, "Hybrid inflectional stemmer and rule-based derivational stemmer for gujarati.," in Proceedings of the 2nd workshop on south southeast Asian natural language processing (WSSANLP), 2011.

B. K. Y. Panchal and A. Shah, "Spell Checker Using Norvig Algorithm for Gujarati Language," in nternational Conference on Smart Data Intelligence. Singapore, Singapore, 2024.

N. Patel and D. Patel, "Implementation Approach of Indian Language Gujarati Grammar's Concept “sandhi” using the Concepts of Rule-based NLP," in 8th International Conference on Computing for Sustainable Global Development (INDIACom)., 2021.

J. Sheth and B. C. Patel., "Gujarati phonetics and Levenshtein based string similarity measure for Gujarati language.," in 5th National Conference on Indian Language Computing., 2015.

T. A. Gal, "Natural Language Processing(NLP) Pipeline," Medium, 23 Oct 2023. [Online]. Available: https://medium.com/@theaveragegal/natural-language-processing-nlp-pipeline-e766d832a1e5. [Accessed 22 02 2025].

P. Patel, K. Popat and P. Bhattacharyya, "Hybrid stemmer for Gujarati," in Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing, 2010.

M. Parikh and A. Desai, "Recognition of Handwritten Gujarati Conjuncts Using the Convolutional Neural Network Architectures: AlexNet, GoogLeNet, Inception V3, and ResNet50," in Advances in Computing and Data Sciences: 6th International Conference, ICACDS2022,, Kurnool,India, 2022.

B. K. Y. Panchal and A. Shah, "NLP‐Based Spellchecker and Grammar Checker for Indic Languages.," in Natural Language Processing for Software Engineering, Scrivener Publishing LLC, 2025, pp. 43-70.

C. Tailor and B. Patel, "Sentence Tokenization Using Statistical Unsupervised Machine LearningandRule-BasedApproachforRunningTextinGujaratiLanguage," in Emerging Trends in Expert Applications andSecurity.AdvancesinIntelligent SystemsandComputing, 2018.

S. Sooraj, K. Manjusha, M. A. Kumar and K. P. Soman, "Deep learning based spell checker for Malayalam language," Journal of Intelligent & Fuzzy Systems, vol. 34, no. 3, pp. 1427-1434, 2018.

S. Murugan, T. A. Bakthavatchalam and M. Sankarasubbu, "Symspell and lstm based spell-checkers for tamil," in Tamil Internet Conference,, 2020.

N. Hossain, M. H. Bijoy, S. Islam and S. Shatabda, "Panini: a transformer-based grammatical error correction method for Bangla," Neural Computing and Applications, vol. 36, pp. 3463-3477, 2024.

R. Phukan, M. Neog and N. Baruah, "A Deep Learning Based Approach For Spelling Error Detection In The Assamese Language," in 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 2023.

S. S. Jamwal and P. Gupta., "A Novel Hybrid Approach for the Designing and Implementation of Dogri Spell Checker," in Data, Engineering and Applications: Select Proceedings of IDEA 2021, Singapore, 2022.

M. Das, S. Borgohain, J. Gogoi and S. Nair, "Design and implementation of a spell checker for Assamese," in Language Engineering Conference, 2002. Proceedings, 2002.

S. Iqbal, W. Anwar, U. I. Bajwa and Z. Rehman., "Urdu spell checking: Reverse edit distance approach," in In Proceedings of the 4th workshop on south and southeast asian natural language processing, 2013.

A. A. Lawaye and B. S. Purkayastha, "KASHMIRI SPELL CHECKER AND SUGGESTION SYSTEM," THE COMMUNICATIONS, vol. 21, no. 2, p. 123, 2012.

B. Kaur and H. Singh, "Design and implementation of HINSPELL—Hindi spell checker using hybrid approach," International Journal of scientific research and management, vol. 3, no. 2, pp. 2058-2062, 2015.

R. Sankaravelayuthan, "Spell and grammar checker for Tamil.," Developing computing tools for Tamil, vol. 5, no. 23, pp. 52-64, 2015.

A. A. Lawaye and B. S. Purkayastha, "Design and implementation of spell checker for Kashmiri," International Journal of Scientific Research, vol. 5, no. 7, 2016.

R. Sakuntharaj and S. Mahesan, "A novel hybrid approach to detect and correct spelling in Tamil text," in 2016 IEEE international conference on information and automation for sustainability (ICIAfS), 2016.

U. M. G. Rao, A. P. Kulkarni and a. P. K. Christopher Mala, "Telugu Spell-Checker," Vaagartha, 2012.

S. Saha, F. Tabassum, K. Saha, Akter. and Marjana, "Bangla Spell Checker and Suggestion Generator," (Doctoral dissertation, United International University)., 2019.

S. Singh and S. Singh, "Systematic review of spell-checkers for highly inflectional languages," Artificial Intelligence Review , vol. 53, no. 6, pp. 4051-4092, 2020.

B. Bhagat and M. Dua, "Enhancing performance of end-to-end gujarati language asr using combination of integrated feature extraction and improved spell corrector algorithm," in ITM Web of Conferences, 2023.

D. Kakwani, A. Kunchukuttan, S. Golla, G. NC, A. Bhattacharyya, M. M. Khapra and P. Kumar., "IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages," In Findings of the association for computational linguistics: EMNLP 2020, pp. 4948-4961, 2020.

S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. Margam, P. Aggarwal, R. Nagipogu, S. Dave and S. Gupta, "Muril: Multilingual representations for indian languages.," arXiv preprint arXiv:2103, p. 10730, 2021.

A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer and V. Stoyanov., "Unsupervised cross-lingual representation learning at scale," arXiv preprint arXiv:1911.02116, 2019.

J. A. R. C. P. Pfeiffer, A. Kamath, I. Vulić, S. Ruder, K. Cho and I. Gurevych, "Adapterhub: A framework for adapting transformers," arXiv preprint arXiv:2007.07779, 2020.

S. Deode, J. Gadre, A. Kajale, A. Joshi and R. Joshi, "L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT."," arXiv preprint arXiv:2304.11434, 2023.

M. Nejja and A. Yousfi., "The context in automatic spell correction," Procedia Computer Science, vol. 73, pp. 109-114, 2015.

A. K. Ingason, S. B. Jóhannsson, E. Rögnvaldsson, H. Loftsson and S. Helgadóttir., "Context-sensitive spelling correction and rich morphology.," in Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009), 2009.

A. Yunus and M. Masum., "A context free spell correction method using supervised machine learning algorithms," International Journal of Computer Applications, vol. 176, no. 27, pp. 36-41, 2020.

P. Gupta, "A context-sensitive real-time Spell Checker with language adaptability," in 2020 IEEE 14th International Conference on Semantic Computing (ICSC), 2020.




DOI: https://doi.org/10.31449/inf.v49i34.9836

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.