New Proposed Solution for Speech Recognition Without Labeled Data: Tutoring System for Children with Autism Spectrum Disorder

Amel Ziani, Amine Adouane, Mohamed Nassim Amiri, Sabiha Smail

Abstract


Children diagnosed with Autism Spectrum Disorder (ASD) face challenges in understanding situations, verbal communication, and social interactions. Autism can manifest differently in each child, and it can be characterized by various degrees of severity. Some common behaviors observed in children with ASD include poor skills, repetitive behaviors, delayed speech, reasoning difficulties, narrow interests, and challenges with social interactions and communication, such as recognizing social cues. As every child with ASD has unique educational needs, there is no universal solution for treating the condition. This paper aims to introduce an adaptive educational system that will help children with ASD acquire new skills and improve their communication abilities, enabling them to better integrate into society. The proposed system will be based on therapist-researched and -analyzed activities that utilize speech recognition technology. To address the resource requirements of labeled datasets, we propose a new approach that leverages generative adversarial networks (MelGAN) to produce responses that closely resemble a child's voice. This allows for the comparison of the generated response with the correct answer using similarity metrics. The system was tested on Algerian children with ASD who speak Algerian dialect, and the results were promising and this can open a new direction for developing educational systems that do not rely on labeled datasets.


Full Text:

PDF

References


Pennington, M. L., Cullinan, D., Southern, L. B.: Defining autism: Variability in state education agency definitions of and evaluations for autism spectrum disorders. Autism research and treatment (2014).

Kim, E. S., Berkovits, L. D., Bernier, E. P., Leyzberg, D., Shic, F., Paul, R., Scassellati, B.: Social robots as embedded reinforcers of social behavior in children with autism. Journal of autism and developmental disorders, 43(5), 1038–1049 (2013).

Park, H. R., Lee, J. M., Moon, H. E., Lee, D. S., Kim, B.-N., Kim, J., Kim, D. G., Paek, S. H.: A short review on the current understanding of autism spectrum disorders. Experimental neurobiology, 25(1), 1–13 (2016).

Wilens, T. E., Spencer, T. J.: Understanding attention-deficit/hyperactivity disorder from childhood to adulthood. Postgraduate medicine, 122(5), 97–109 (2010).

Ghosh, T., Banna, H. Al, Rahman, S., Kaiser, M. S., Mahmud, M., Hosen, A. S. M. S., Hwan, G.: Artificial intelligence and internet of things in screening and management of autism spectrum disorder. Sustainable Cities and Society, 74(June), 103189 (2021). https://doi.org/10.1016/j.scs.2021.103189

McCarthy, J.: Artificial intelligence, logic and formalizing common sense. Philosophical logic and artificial intelligence (pp. 161–190). Springer (1989).

Quinlan, J. R.: C4. 5: Programs for machine learning. Elsevier (2014).

Ashton, K., et al.: That ’internet of things’ thing. RFID journal, 22(7), 97–114 (2009).

Knight, V., McKissick, B. R., Saunders, A.: A review of technology-based interventions to teach academic skills to students with autism spectrum disorder. Journal of autism and developmental disorders, 43(11), 2628–2648 (2013).

Kaur, N., Kaur, A., Dhiman, N., Sharma, A., Rana, R. A.: systematic analysis of detection of autism spectrum disorder: Iot perspective. International Journal of Innovative Science and Modern Engineering (IJISME), 6 (2020).

Hyde, K. K., Novack, M. N., LaHaye, N., Parlett-Pelleriti, C., Anden, R., Dixon, D. R., Linstead, E. Applications of supervised machine learning in autism spectrum disorder research: A review. Review Journal of Autism and Developmental Disorders, 6 (2), 128–146 (2019).

Jaliaawala, M. S., Khan, R. A.: Can autism be catered with artificial intelligence-assisted intervention technology? a comprehensive survey. Artificial intelligence review, 53(2), 1039–1069 (2020).

Moon, S. J., Hwang, J., Hill, H. S., Kervin, R., Birtwell, K. B., Torous, J., Kim, J. W.: Mobile device applications and treatment of autism spectrum disorder: A systematic review and meta-analysis of effectiveness. Archives of disease in childhood, 105(5), 458–462 (2020).

Jouaiti, M., Henaff, P.: Robot-based motor rehabilitation in autism: A systematic review. International journal of social robotics, 11(5), 753–764 (2019).

Abirami, M., Banu, A. S., Miranda, T. B., Dhivya, M.: A systematic review for assisting the echolalia attacked autism people using robot and android application. International journal of computer applications, 115(6) (2015).

Lorenzo, G., Lledó, A., Arráez-Vera, G., Lorenzo-Lledó, A.: The application of immersive virtual reality for students with asd: A review between 1990–2017. Education and Information Technologies, 24 (2018). https://doi.org/10.1007/s10639-018- 9766-7

Ha, M. N.: A review of serious game for autism children. Computer Games, Multimedia and Allied Technology (CGAT 2012), 90 (2012).

Park, D. S., Chan, W., Zhang, Y., Chiu, C.C., Zoph, B., Cubuk, E. D., Le. Q. V.: Specaugment: A simple data augmentation method for automatic speech recognition. In Proc. of Interspeech (2019).

Synnaeve G., et al.: End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures(2019). arXiv, abs/1911.08460

Han W., et al.: Contextnet: Improving convolutional neural networks for automatic speech recognition with global context. arXiv (2020).

Gulati, A., Qin, J., Chiu, C.C., Parmar, N., Zhang, Y.: Conformer: Convolutionaugmented transformer for speech recognition. arXiv (2020).

Lewis, M. P., Simon, G. F., Fennig, C. D.: Ethnologue: Languages of the world, nineteenth edition. Online version (2016). http://www.ethnologue.com

Liu, A. H., Lee, H.-Y., Lee, L.-S.: Adversarial training of end-to-end speech recognition using a criticizing language model. arXiv (2018).

Baskar, M. K., Watanabe, S., Astudillo, R., Hori, T., Burget, L., Cernocký, J.: Semi-supervised sequence-to-sequence asr using unpaired speech and text. arXiv (2019).

Hsu, W.-N., Lee, A., Synnaeve, G., Hannun, A.: Semi-supervised speech recognition via local prior matching. arXiv (2020).

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets, in: Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680 (2014).

Pasini, M.: MelGAN-VC : Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms arXiv : 1910 . 03713v2 [ eess . AS ] (2019).

Kain, A., Macon., M. W.: Spectral voice conversion for text-to-speech synthesis. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing No.98CH36181. ICASSP ’98 (Cat, volume 1, pages 285–288 vol.1) (1998).

Kain, A. B., Hosom, J-P., Niu, X., Santen, J. P. V., Fried-Oken, M., Staehely, J.: Improving the intelligibility of dysarthric speech. Speech communication, 49(9):743–759 (2007).

Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using gmm-based voice conversion for electrolaryngeal speech. Speech Communication, 54(1):134–146 (2012).

Inanoglu, Z., Young, S.: Data-driven emotion conversion in spoken english. Speech Communication, 51(3):268–283 (2009).

Turk, O., Schroder, M.: Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques. IEEE Transactions on Audio, Speech, and Language Processing, 18(5):965–973 (2010).

Toda, T., Nakagiri, M., Shikano, K.: Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 20(9):2505–2517 (2012).

Baer, D.M., Wolf, M.M., Risley, T.R.: Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis. 1968;1:91–97. doi: 10.1901/jaba.1968.1-91 (1968).

Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation, 1–8 (2015).

Szegedy, C., Com, S. G.: Batch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift, 37 (2015).

Cornec, K. Le. : Apprentissage Few Shot et méthode d ’ élagage pour la détection d ’ émotions sur bases de données restreintes To cite this version : HAL Id : tel-03143123 Apprentissage Few Shot et Méthode d’Élagage pour la Détection d’Émotions sur Bases de Données Restreintes (2021).

Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F.: Recent Progress on Generative Adversarial Networks (GANs): A Survey. IEEE Access, PP(c), 1 (2019). https://doi.org/10.1109/ACCESS.2019.2905015

Saxena, D., Cao, J. Generative Adversarial Networks ( GANs ): Challenges , Solutions , and Future Directions, 54(3) (2021).

Kumar, K., Gestin, L., & Courville, A. : MelGAN : Generative Adversarial Networks for Conditional Waveform Synthesis, NeurIPS (2019).

Lanham, M.: Generating a New Reality. Book (2021).

Wang, D., Dong, L., Wang, R., Yan, D.: Fast speech adversarial example generation for keyword spotting system with conditional GAN Computer Communications, 179(202003), 145–156 (2021). https://doi.org/10.1016/j.comcom.2021.08.010

Embarcadero-ruiz, D., Gómez-adorno, H., Embarcadero-ruiz, A., Sierra, G.: Graph-Based Siamese Network for Authorship Verification, 1–24 (2022).

Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33 (2020).




DOI: https://doi.org/10.31449/inf.v48i18.5204

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.