Automatic speech recognition and text-to-speech technologies for L2 pronunciation improvement: reflections on their affordances

William Gottardi; Janaina Fernanda de Almeida; Celso Henrique Soufen Tumolo

doi:10.35699/1983-3652.2022.36736

Autores/as

William Gottardi Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Florianópolis, SC, Brasil https://orcid.org/0000-0002-1291-3953
Janaina Fernanda de Almeida Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Florianópolis, SC, Brasil https://orcid.org/0000-0003-3747-0279
Celso Henrique Soufen Tumolo Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Florianópolis, SC, Brasil https://orcid.org/0000-0001-5045-8712

DOI:

https://doi.org/10.35699/1983-3652.2022.36736

Palabras clave:

Automatic speech recognition, Text-to-speech, CALL, Pronunciation teaching, Pronunciation improvement

Resumen

This paper presents a reflection on two technologies – automatic speech recognition (ASR) and Text-to-Speech (TTS) – to improve learners’ pronunciation, aiming for successful spoken communication. It sheds some light on the practical usage of these technologies, demonstrating their effectiveness, qualities, and limitations to assist teachers in deciding the most efficient digital resources applied to their students’ needs. A review of literature on previous empirical studies was carried out, with quantitative and/or qualitative studies conducted by researchers in the field, investigating teachers’ and learners' perceptions and the use of ASR and TTS as a pedagogical tool for pronunciation practice. As a result, it was concluded that a) the presented resources seem to have the potential to enhance pronunciation practice, both in terms of perception and production; b) technology can result in considerable benefits to learners, mainly as a supplement to pronunciation teaching; and c) the use of these digital resources is a way of giving learners the opportunity to focus on their specific difficulties and receive personalized feedback while becoming more autonomous in their learning process.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

ASHWELL, T.; ELAM, J. R. How accurately can the Google Web Speech API recognize and transcribe Japanese L2 English learners’ oral production? The JALT CALL Journal, v. 13, n. 1, p. 59–76, Apr. 2017. DOI: 10.29140/jaltcall.v13n1.212. Available from: https://www.castledown.com/journals/jaltcall/article/?reference=j212. Visited on: 9 Feb. 2022.

BIONE ALVES, T. Synthetic voices in the foreign language context. 2017. Master’s Thesis – Concordia University, Montreal, CA

BIONE ALVES, T.; GRIMSHAW, J.; CARDOSO, W. An evaluation of text-to-speech synthesizers in the foreign language classroom: learners’ perceptions. In: PAPADIMA-SOPHOCLEOUS, Salomi; BRADLEY, Linda; THOUËSNY, Sylvie (Eds.). CALL communities and culture – short papers from EUROCALL 2016. [S.l.]: Research-publishing.net, Dec. 2016. p. 50–54. DOI: 10.14705/rpnet.2016.eurocall2016.537. Available from: https://research-publishing.net/manuscript?10.14705/rpnet.2016.eurocall2016.537. Visited on: 9 Feb. 2022

BOGACH, N. et al. Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching. Electronics, v. 10, n. 3, p. 235, Jan. 2021. DOI: 10.3390/electronics10030235. Available from: https://www.mdpi.com/2079-9292/10/3/235. Visited on: 9 Feb. 2022.

CARDOSO, W. Learning L2 pronunciation with a text-to-speech synthesizer. In: FUTURE-PROOF CALL: language learning as exploration and encounters – short papers from EUROCALL 2018. [S.l.]: Research-publishing.net, Dec. 2018. p. 16–21. DOI: 10.14705/rpnet.2018.26.806. Available from: https://research-publishing.net/manuscript?10.14705/rpnet.2018.26.806. Visited on: 9 Feb. 2022.

CARDOSO, W.; SMITH, G.; GARCIA FUENTES, C. Evaluating text-to-speech synthesizers. In: CRITICAL CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy. [S.l.]: Research-publishing.net, Dec. 2015. p. 108–113. DOI: 10.14705/rpnet.2015.000318. Available from: https://research-publishing.net/manuscript?10.14705/rpnet.2015.000318. Visited on: 9 Feb. 2022.

CARLET, A.; KIVISTÖ-DE SOUZA, H. Improving L2 pronunciation inside and outside the classroom. Ilha do Desterro A Journal of English Language, Literatures in English and Cultural Studies, v. 71, n. 3, p. 99–124, Sept. 2018. DOI: 10.5007/2175-8026.2018v71n3p99. Available from: https://periodicos.ufsc.br/index.php/desterro/article/view/2175-8026.2018v71n3p99. Visited on: 9 Feb. 2022.

CARRIER, M. Automated Speech Recognition in language learning: Potential models, benefits and impact. Training Language and Culture, v. 1, n. 1, p. 46–61, Feb. 2017. DOI: 10.29366/2017tlc.1.1.3. Available from: http://rudn.tlcjournal.org/issues/1(1)-03.html. Visited on: 9 Feb. 2022.

CHAPELLE, C.; JAMIESON, J. Tips for teaching with CALL: practical approaches to computer assisted language learning. White Plains, NY: Pearson Education, 2008. (Tips on teaching).

CHEN, H. H.-J. Developing and evaluating an oral skills training website supported by automatic speech recognition technology. en. ReCALL, v. 23, n. 1, p. 59–78, Jan. 2011. DOI: 10.1017/S0958344010000285. Available from: https://www.cambridge.org/core/product/identifier/S0958344010000285/type/journal_article. Visited on: 9 Feb. 2022.

CHUN, D.; KERN, R.; SMITH, B. Technology in Language Use, Language Teaching, and Language Learning. The Modern Language Journal, v. 100, S1, p. 64–80, Jan. 2016. DOI: 10.1111/modl.12302. Available from: https://onlinelibrary.wiley.com/doi/10.1111/modl.12302. Visited on: 9 Feb. 2022.

DARCY, I. Powerful and Effective Pronunciation Instruction: How Can We Achieve It? en. CATESOL Journal, v. 30, n. 1, p. 13–45, 2018. Available from: https://eric.ed.gov/?id=EJ1174218. Visited on: 9 Feb. 2022.

DARCY, I.; ROCCA, B.; HANCOCK, Z. A Window into the Classroom: How Teachers Integrate Pronunciation Instruction. RELC Journal, v. 52, n. 1, p. 110–127, Apr. 2021. DOI: 10.1177/0033688220964269. Available from: http://journals.sagepub.com/doi/10.1177/0033688220964269. Visited on: 9 Feb. 2022.

DAVIES, G. Computer-Assisted Language Education. In: BERNS, M.; BROWN, C. (Eds.). Concise Encyclopedia of Applied Linguistics. Oxford: Elsevier, 2006. p. 261–271.

DEKEYSER, R. Skill Acquisition Theory. In: VANPATTEN, B.; WILLIAMS, J. (Eds.). Theories in Second Language Acquisition: An Introduction. New York and London: Routhledge, 2015. p. 97–113.

DEMENKO, G.; WAGNER, A.; CYLWIK, N. The Use of Speech Technology in Foreign Language Pronunciation Training. Archives of Acoustics, v. 35, n. 3, p. 309–329, Sept. 2010. DOI: 10.2478/v10168-010-0027-z. Available from: https://content.sciendo.com/doi/10.2478/v10168-010-0027-z. Visited on: 9 Feb. 2022.

DERWING, T. M. The eﬀicacy of pronunciation instruction. In: KANG, O.; THOMSON, R. I.; MURPHY, J. M. (Eds.). he Routledge Handbook of Contemporary English Pronunciation. Milton Park: Routledge, 2018. p. 320–334.

DERWING, T. M.; MUNRO, Murray J.; CARBONARO, M. Does Popular Speech Recognition Software Work with ESL Speech? TESOL Quarterly, v. 34, n. 3, p. 592, 2000. DOI: 10.2307/3587748. Available from: https://www.jstor.org/stable/3587748?origin=crossref. Visited on: 9 Feb. 2022.

DIZON, G. Evaluating Intelligent Personal Assistants for L2 Listening and Speaking Development. Language Learning & Technology, v. 24, p. 16–26, 2020.

DIZON, G.; TANG, D. Intelligent personal assistants for autonomous second language learning: An investigation of Alexa. The JALT CALL Journal, v. 16, n. 2, p. 107–120, Aug. 2020. DOI: 10.29140/jaltcall.v16n2.273. Available from: https://www.castledown.com/journals/jaltcall/article/?reference=273. Visited on: 9 Feb. 2022.

EKSI, G. Y.; YESILCINAR, S. An Investigation of the Effectiveness of Online Text-to-Speech Tools in Improving EFL Teacher Trainees’ Pronunciation. English Language Teaching, v. 9, n. 2, p. 205, Jan. 2016. DOI: 10.5539/elt.v9n2p205. Available from: http://www.ccsenet.org/journal/index.php/elt/article/view/56606. Visited on: 9 Feb. 2022.

GOLONKA, E. M. et al. Technologies for foreign language learning: a review of technology types and their effectiveness. Computer Assisted Language Learning, v. 27, n. 1, p. 70–105, Feb. 2014. DOI: 10.1080/09588221.2012.700315. Available from: http://www.tandfonline.com/doi/abs/10.1080/09588221.2012.700315. Visited on: 9 Feb. 2022.

GOMES, A. A. de A.; CARDOSO, W.; LUCENA, R. M. de. Can TTS help L2 learners develop their phonological awareness? In: FUTURE-PROOF Call: language learning as exploration and encounters – short papers from EUROCALL 2018. [S.l.: s.n.], 2018. p. 29–34. DOI: http://dx.doi.org/10.14705/rpnet.2018.26.808. Available from: https://research-publishing.net. Visited on: 9 Feb. 2022.

GORDON, J.; DARCY, I. The development of comprehensible speech in L2 learners: A classroom study on the effects of short-term pronunciation instruction. Journal of Second Language Pronunciation, v. 2, n. 1, p. 56–92, Mar. 2016. DOI: 10.1075/jslp.2.1.03gor. Available from: http://www.jbe-platform.com/content/journals/10.1075/jslp.2.1.03gor. Visited on: 9 Feb. 2022.

GRASS, S. M.; MACKEY, A. Input, Interaction, and Output in Second Language Acquisition. In: VANPATTEN, B.; WILLIAMS, J. (Eds.). Theories in second language acquisition: an introduction. Second Edition. New York: Routledge, 2015. (Second Language Acquisition Research Series).

GRIMSHAW, J.; BIONE, T.; CARDOSO, W. Who’s got talent? Comparing TTS systems for comprehensibility, naturalness, and intelligibility. In: FUTURE-PROOF CALL: language learning as exploration and encounters – short papers from EUROCALL 2018. [S.l.]: Research-publishing.net, Dec. 2018. p. 83–88. DOI: 10.14705/rpnet.2018.26.817. Available from: https://research-publishing.net/manuscript?10.14705/rpnet.2018.26.817. Visited on: 9 Feb. 2022.

HANDLEY, Z. Text-to-Speech Synthesis in Computer-Assisted Language Learning. In: CHAPELLE, C. A. (Ed.). The encyclopedia of applied linguistics. New York: Wiley-Blackwell, 2013. p. 5846–5851.

HARMER, J. Essential teacher knowledge. Buch. Harlow: Pearson Education, 2012. (Always learning).

HENRICHSEN, L. E. An Illustrated Taxonomy of Online CAPT Resources. RELC Journal, v. 52, n. 1, p. 179–188, Apr. 2021. DOI: 10.1177/0033688220954560. Available from: http://journals.sagepub.com/doi/10.1177/0033688220954560. Visited on: 9 Feb. 2022.

INCEOGLU, S.; LIM, H.; CHEN, W.-H. ASR for EFL Pronunciation Practice: Segmental Development and Learners’ Beliefs. The Journal of AsiaTEFL, v. 17, n. 3, p. 824–840, Sept. 2020. DOI: 10.18823/asiatefl.2020.17.3.5.824. Available from: http://journal.asiatefl.org/main/main.php?inx_journals=64&inx_contents=842&submode=3&PageMode=JournalView&s_title=ASR_for_EFL_Pronunciation_Practice_Segmental_Development_and_Learners_Beliefs. Visited on: 9 Feb. 2022.

JURAFSKY, D.; MARTIN, J. H. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Uttar Pradesh (India): Pearson, 2000.

KIM, I.-S. Automatic Speech Recognition: Reliability and Pedagogical Implications for Teaching Pronunciation. Journal of Educational Technology & Society, v. 9, n. 1, p. 322–334, 2006. Available from: https://www.jstor.org/stable/jeductechsoci.9.1.322. Visited on: 9 Feb. 2022.

KNILL, K. et al. Impact of ASR Performance on Free Speaking Language Assessment. In: INTERSPEECH 2018. [S.l.]: ISCA, Sept. 2018. p. 1641–1645. DOI: 10.21437/Interspeech.2018-1312. Available from: https://www.isca-speech.org/archive/interspeech_2018/knill18_interspeech.html. Visited on: 9 Feb. 2022.

LEE, B.; PLONSKY, L.; SAITO, K. The effects of perception- vs. production-based pronunciation instruction. System, v. 88, p. 102185, Feb. 2020. DOI: 10.1016/j.system.2019.102185. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0346251X19305196. Visited on: 9 Feb. 2022.

LEVIS, J.; SUVOROV, R. Automatic speech recognition. In: CHAPELLE, C. (Ed.). The encyclopedia of applied linguistics. [S.l.: s.n.], 2013. Available from: http://www.credoreference.com/book/wileyenapl. Visited on: 9 Feb. 2022.

LIAKIN, D.; CARDOSO, W.; LIAKINA, N. Mobilizing Instruction in a Second-Language Context: Learners’ Perceptions of Two Speech Technologies. Languages, v. 2, n. 3, p. 11, July 2017. DOI: 10.3390/languages2030011. Available from: http://www.mdpi.com/2226-471X/2/3/11. Visited on: 9 Feb. 2022.

LONG, M. H. Focus on form: A design feature in language teaching methodology. In: DE BOT, K.; GINSBERG, R.; KRAMSCH, C. (Eds.). Foreign language research in cross-cultural perspective. Amsterdam: John Benjamins, 1991. p. 39. Available from: https://benjamins.com/catalog/sibil.2.07lon. Visited on: 9 Feb. 2022.

MARTINS, C. B.; MOREIRA, H. O campo CALL (Computer Assisted Language Learning): definições, escopo e abrangência. Calidoscópio, v. 10, n. 3, p. 247–255, Dec. 2012. DOI: 10.4013/cld.2012.103.01. Available from: http://revistas.unisinos.br/index.php/calidoscopio/article/view/3254. Visited on: 9 Feb. 2022.

MCCROCKLIN, S.; EDALATISHAMS, I. Revisiting Popular Speech Recognition Software for ESL Speech. TESOL Quarterly, v. 54, n. 4, p. 1086–1097, Dec. 2020. DOI: 10.1002/tesq.3006. Available from: https://onlinelibrary.wiley.com/doi/10.1002/tesq.3006. Visited on: 9 Feb. 2022.

MENEZES, V. Tecnologias digitais no ensino de línguas: passado, presente e futuro. Revista da ABRALIN, Aug. 2019. DOI: 10.25189/rabralin.v18i1.1323. Available from: https://revista.abralin.org/index.php/abralin/article/view/1323. Visited on: 9 Feb. 2022.

MOON, D. Web-Based Text-to-Speech Technologies in Foreign Language Learning: Opportunities and Challenges. In: KIM, T.-H. et al. (Eds.). Computer Applications for Database, Education, and Ubiquitous Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. v. 352. p. 120–125. DOI: 10.1007/978-3-642-35603-2_19. Available from: http://link.springer.com/10.1007/978-3-642-35603-2_19. Visited on: 9 Feb. 2022.

MOUSSALLI, S.; CARDOSO, W. Intelligent personal assistants: can they understand and be understood by accented L2 learners? Computer Assisted Language Learning, v. 33, n. 8, p. 865–890, Nov. 2020. DOI: 10.1080/09588221.2019.1595664. Available from: https://www.tandfonline.com/doi/full/10.1080/09588221.2019.1595664. Visited on: 9 Feb. 2022.

MROZ, A. Seeing how people hear you: French learners experiencing intelligibility through automatic speech recognition. Foreign Language Annals, v. 51, n. 3, p. 617–637, Sept. 2018. DOI: 10.1111/flan.12348. Available from: https://onlinelibrary.wiley.com/doi/10.1111/flan.12348. Visited on: 9 Feb. 2022.

MUNOZ, C. Symmetries and Asymmetries of Age Effects in Naturalistic and Instructed L2 Learning. Applied Linguistics, v. 29, n. 4, p. 578–596, Jan. 2008. DOI: 10.1093/applin/amm056. Available from: https://academic.oup.com/applij/article-lookup/doi/10.1093/applin/amm056. Visited on: 9 Feb. 2022.

MUNRO, M. J.; DERWING, T. M. Foreign Accent, Comprehensibility, and Intelligibility in the Speech of Second Language Learners. Language Learning, v. 45, n. 1, p. 73–97, Mar. 1995. DOI:

1111/j.1467-1770.1995.tb00963.x. Available from: https://onlinelibrary.wiley.com/doi/10.1111/j.1467-1770.1995.tb00963.x. Visited on: 9 Feb. 2022.

MUNRO, M. J.; DERWING, T. M. Intelligibility in Research and Practice: Teaching Priorities. In: REED, M.; LEVIS, J. M. (Eds.). The Handbook of English Pronunciation. 1. ed. [S.l.]: Wiley, May 2015. p. 375–396. DOI: 10.1002/9781118346952.ch21. Available from: https://onlinelibrary.wiley.com/doi/10.1002/9781118346952.ch21. Visited on: 9 Feb. 2022.

MUNRO, M. J.; DERWING, T. M.; MORTON, S. L. THE MUTUAL INTELLIGIBILITY OF L2 SPEECH. Studies in Second Language Acquisition, v. 28, n. 01, Mar. 2006. DOI: 10.1017/S0272263106060049. Available from: http://www.journals.cambridge.org/abstract_S0272263106060049. Visited on: 9 Feb. 2022.

ORTEGA, L. Understanding second language acquisition. London: Routledge, 2009. (Understanding language series).

PENNINGTON, M. C.; ROGERSON-REVELL, P. English Pronunciation Teaching and Research: Contemporary Perspectives. London: Palgrave Macmillan UK, 2019. DOI: 10.1057/978-1-137-47677-7. Available from: http://link.springer.com/10.1057/978-1-137-47677-7. Visited on: 9 Feb. 2022.

ROCCAMO, A. Effective pronunciation instruction in basic language classrooms: A modular approach. In: LEVIS, J.; MCCROCKLIN, S. (Eds.). Proceedings of the 5th Pronunciation in Second Language Learning and Teaching Conference. Ames, IA: Iowa State University, 2014. p. 183–189.

ROGERSON-REVELL, P. M. Computer-Assisted Pronunciation Training (CAPT): Current Issues and Future Directions. en. RELC Journal, v. 52, n. 1, p. 189–205, Apr. 2021. DOI: 10.1177/0033688220977406. Available from: http://journals.sagepub.com/doi/10.1177/0033688220977406. Visited on: 9 Feb. 2022.

SICOLA, L.; DARCY, I. Integrating Pronunciation into the Language Classroom. In: REED, M.; LEVIS, J. M. (Eds.). The Handbook of English Pronunciation. 1. ed. [S.l.]: Wiley, May 2015. p. 471–487. DOI: 10.1002/9781118346952.ch26. Available from: https://onlinelibrary.wiley.com/doi/10.1002/9781118346952.ch26. Visited on: 9 Feb. 2022.

SLABAKOVA, R. Second Language Acquisition. Oxford: Oxford University Press, 2016.

THOMSON, R. I.; DERWING, T. M. The effectiveness of L2 pronunciation instruction: a narrative review. Applied Linguistics, v. 36, n. 3, p. 326–344, July 2014. DOI: 10.1093/applin/amu076. Available from: https://academic.oup.com/applij/article-lookup/doi/10.1093/applin/amu076. Visited on: 9 Feb. 2022.

TYLER, M. PAM-L2 and phonological category acquisition in the foreign language classroom. In: [s.l.: s.n.], May 2019. p. 607–630.

VANPATTEN, B. Processing matters in input enhancement. In: PISKE, T.; YOUNG-SCHOLTEN, M. (Eds.). Input Matters in SLA. [S.l.]: Multilingual Matters, Dec. 2008. p. 47–61. DOI: 10.21832/9781847691118-005. Available from: https://www.degruyter.com/document/doi/10.21832/9781847691118-005/html. Visited on: 9 Feb. 2022.

VANPATTEN, B.; SMITH, M.; BENATI, A. G. Key questions in second language acquisition: an introduction. New York: Cambridge University Press, 2019.

YAVAŞ, M. Applied English phonology. 2nd ed. Oxford ; Malden, MA: Wiley-Blackwell, 2011.

YOSHIDA, M. T. Choosing Technology Tools to Meet Pronunciation Teaching and Learning Goals. CATESOL Journal, v. 30, n. 1, p. 195–212, 2018. Available from: https://eric.ed.gov/?id=EJ1174226. Visited on: 9 Feb. 2022.

ZHANG, R.; YUAN, Z.-M. Examining the effects of explicit pronunciation instruction on the development of l2 pronunciation. Studies in Second Language Acquisition, v. 42, n. 4, p. 905–918, Sept. 2020. DOI: 10.1017/S0272263120000121. Available from: https://www.cambridge.org/core/product/identifier/S0272263120000121/type/journal_article. Visited on: 9 Feb. 2022.