Lexical density in texts generated by ChatGPT

implications of artificial intelligence for writing in additional languages

Authors

DOI:

https://doi.org/10.1590/1983-3652.2024.47836

Keywords:

Additional languages, ChatGPT, Artificial Intelligence, Systemic Functional Linguistics, Lexical density

Abstract

Technological advancement has had a significant impact on written production, especially in Additional Languages (ALs). Although technology has brought new opportunities for AL teaching, it also poses challenges, including concerns about the complexity of writing and the authenticity of students’ work. One such tool is ChatGPT, an artificial intelligence (AI) platform that has been the subject of debate since its popularization in 2022. This study analyses a corpus consisting of six tasks produced by ChatGPT in five languages (German, Spanish, French, Italian, and Portuguese), considering the proficiency levels proposed by the Common European Framework of Reference for Languages (CEFR), totalling 2991 texts and 706,401 words. The data were generated by students in a computer lab at a British university from 100 different profiles on the ChatGPT platform, following the researchers’ instructions. Data analysis employs Systemic Functional Linguistics (SFL) and the concept of lexical density (Halliday, 1985, 1987, 1993; Halliday; Matthiessen, 2014) to investigate the complexity of the produced texts, as lexical complexity is related to proficiency in writing, where more advanced texts proportionally use more “content words” (nouns, verbs, adjectives, and some adverbs of manner). The results reveal that ChatGPT does not adhere to task instructions regarding the requested word count, thereby impacting the calculation of lexical density, nor does it produce texts that show significant differences in lexical density among additional languages and proficiency levels.

Downloads

Download data is not yet available.

References

ANDERSON, Nash; BELAVY, Daniel L.; PERLE, Stephen M.; HENDRICKS, Sharief; HESPANHOL, Luiz; VERHAGEN, Evert; MEMON, Aamir R. AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in Sports & Exercise Medicine manuscript generation. BMJ Open Sport & Exercise Medicine, v. 9, n. 1, e001568, fev. 2023. ISSN 2055-7647. DOI: 10.1136/bmjsem-2023-001568. Disponível em: https://bmjopensem.bmj.com/content/9/1/e001568. Acesso em: 21 nov. 2023.

CLAVEL-ARROITIA, Begônia; PENNOCK-SPECK, Barry. Analysing lexical density, diversity, and sophistication in written and spoken telecollaborative exchanges. Computer Assisted Language Learning Electronic Journal (CALL-EJ), v. 22, n. 3, p. 230–250, 2021. Disponível em: http://callej.org/journal/22-3/Clavel-Speck2021.pdf. Acesso em: 23 jun. 2023.

COLOMBI, Maria Cecilia. Academic language development in Latino student’s writing. In: SCHLEPPEGRELL, Mary J.; COLOMBI, Maria Cecilia (ed.). Developing advanced literacy in first and second languages. Mahwah: Lawrence Erlbaum Associates, 2000. p. 67–86.

DALE, Robert. GPT-3: What’s it good for? Natural Language Engineering, v. 27, n. 1, p. 113–118, jan. 2021. ISSN 1351-3249, 1469-8110. DOI: 10.1017/S1351324920000601. Disponível em: https://www.cambridge.org/core/product/identifier/S1351324920000601/type/journal_article. Acesso em: 21 nov. 2023.

DEHOUCHE, N. Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3). Ethics in Science and Environmental Politics, v. 21, p. 17–23, mar. 2021. ISSN 1863-5415, 1611-8014. DOI: 10.3354/esep00195. Disponível em: https://www.int-res.com/abstracts/esep/v21/p17-23/. Acesso em: 21 nov. 2023.

DÖRNYEI, Zoltán. Research methods in Applied Linguistics. New York: Oxford University Press, 2007.

FRÖHLING, Leon; ZUBIAGA, Arkaitz. Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover. PeerJ Computer Science, v. 7, e443, abr. 2021. ISSN 2376-5992. DOI: 10.7717/peerj-cs.443. Disponível em: https://peerj.com/articles/cs-443. Acesso em: 21 nov. 2023.

GEHRMANN, Sebastian; STROBELT, Hendrik; RUSH, Alexander M. GLTR: Statistical Detection and Visualization of Generated Text, 2019. DOI: 10.48550/ARXIV.1906.04043. Disponível em: https://arxiv.org/abs/1906.04043. Acesso em: 21 nov. 2023.

GIL, Antônio Carlos. Como elaborar projetos de pesquisa. 4. ed. São Paulo: Atlas, 2002.

GONZÁLEZ FERNÁNDEZ, Adela. Big data y corpus lingüísticos para el estudio de la densidad léxica. Skopos 9, 107-122 (2018), 2018. ISSN 2255-3703. Disponível em: http://helvia.uco.es/xmlui/handle/10396/19125. Acesso em: 21 nov. 2023.

GREGORI-SIGNES, Carmen; CLAVEL-ARROITIA, Begoña. Analysing Lexical Density and Lexical Diversity in University Students’ Written Discourse. Procedia - Social and Behavioral Sciences, v. 198, p. 546–556, jul. 2015. ISSN 18770428. DOI: 10.1016/j.sbspro.2015.07.477. Disponível em: https://linkinghub.elsevier.com/retrieve/pii/S187704281504478X. Acesso em: 21 nov. 2023.

HALLIDAY, Michael Alexander Kirkwood. Spoken and written language. Geelong: Deakin University Press, 1985. (Language education).

HALLIDAY, Michael Alexander Kirkwood. Spoken and written modes of meaning. In: HOROWITZ, Rosalind; SAMUELS, S. Jay (ed.). Comprehending oral and written language. Orlando: Academic Press, 1987. p. 55–82.

HALLIDAY, Michael Alexander Kirkwood. Part A. In: HALLIDAY, Michael Alexander Kirkwood; HASAN, Ruqaiya (ed.). Language, context and text. 2. ed. Oxford: Oxford University Press, 1989. p. 3–49.

HALLIDAY, Michael Alexander Kirkwood. Some Grammatical Problems in Scientific English. In: HALLIDAY, Michael Alexander Kirkwood; MARTIN, Jim Robert (ed.). Writing science: Literacy and discursive power. London, New York: Routledge, 1993. p. 76–94.

HALLIDAY, Michael Alexander Kirkwood. The spoken language corpus: A foundation for grammatical theory. In: WEBSTER, Jonathan J. (ed.). Computational and quantitative studies. London; New York: Continuum, 2005. p. 157–190.

HALLIDAY, Michael Alexander Kirkwood; MATTHIESSEN, Christian Mathias Ingemar Martin. An Introduction to Functional Grammar. 4. ed. London: Edward Arnold, 2014.

JOHANSSON, Victoria. Lexical diversity and lexical density in speech and writing: a developmental perspective. Working Papers in Linguistics, v. 53, p. 61–79, 2008.

KASNECI, Enkelejda; SESSLER, Kathrin; KÜCHEMANN, Stefan; BANNERT, Maria; DEMENTIEVA, Daryna; FISCHER, Frank; GASSER, Urs; GROH, Georg; GÜNNEMANN, Stephan; HÜLLERMEIER, Eyke; KRUSCHE, Stephan; KUTYNIOK, Gitta; MICHAELI, Tilman; NERDEL, Claudia; PFEFFER, Jürgen; POQUET, Oleksandra; SAILER, Michael; SCHMIDT, Albrecht; SEIDEL, Tina; STADLER, Matthias; WELLER, Jochen; KUHN, Jochen; KASNECI, Gjergji. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, v. 103, p. 102274, abr. 2023. ISSN 10416080. DOI: 10.1016/j.lindif.2023.102274. Disponível em: https://linkinghub.elsevier.com/retrieve/pii/S1041608023000195. Acesso em: 21 nov. 2023.

KEMBAREN, Farida Repelita; ASWANI, Ade Novira. Exploring Lexical Density in the New York Times. ELLITE: Journal of English Language, Literature, and Teaching, v. 7, n. 2, p. 109–119, nov. 2022. ISSN 25280066, 25274120. DOI: 10.32528/ellite.v7i2.8795. Disponível em: http://jurnal.unmuhjember.ac.id/index.php/ELLITE/article/view/8795. Acesso em: 21 nov. 2023.

KING, Michael R.; CHATGPT. A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education. Cellular and Molecular Bioengineering, v. 16, n. 1, p. 1–2, fev. 2023. ISSN 1865-5025, 1865-5033. DOI: 10.1007/s12195-022-00754-8. Disponível em: https://link.springer.com/10.1007/s12195-022-00754-8. Acesso em: 21 nov. 2023.

KONDAL, Bonala. Effects of lexical density and lexical variety in language performance and proficiency. International Journal of IT, Engineering and Applied Sciences Research (IJIEASR), v. 4, n. 10, p. 25–29, 2015.

KUMAR, Arun. Analysis of ChatGPT Tool to Assess the Potential of its Utility for Academic Writing in Biomedical Domain. Biology, Engineering, Medicine and Science Reports, v. 9, n. 1, p. 24–30, jan. 2023. ISSN 24546895. DOI: 10.5530/bems.9.1.5. Disponível em: https://www.bemsreports.org/index.php/bems/article/view/132. Acesso em: 21 nov. 2023.

LANCASTER, Thomas. Artificial intelligence, text generation tools and ChatGPT – does digital watermarking offer a solution? International Journal for Educational Integrity, v. 19, n. 1, p. 10, jun. 2023. ISSN 1833-2595. DOI: 10.1007/s40979-023-00131-6. Disponível em: https://edintegrity.biomedcentral.com/articles/10.1007/s40979-023-00131-6. Acesso em: 21 nov. 2023.

MARTINS, Mário. Densidade lexical na escrita de textos escolares. Signum: Estudos da Linguagem, v. 20, n. 1, p. 218, maio 2017. ISSN 2237-4876. DOI: 10.5433/2237-4876.2017v20n1p218. Disponível em: http://www.uel.br/revistas/uel/index.php/signum/article/view/25225. Acesso em: 21 nov. 2023.

MITROVIĆ, Sandra; ANDREOLETTI, Davide; AYOUB, Omran. ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text. arXiv:2301.13852v1 [cs.CL], 2023. DOI: 10.48550/ARXIV.2301.13852. Disponível em: https://arxiv.org/abs/2301.13852. Acesso em: 21 nov. 2023.

MOOHEBAT, Mohammadreza; RAJ, Ram Gopal; KAREEM, Sameem Binti Abdul; THORLEUCHTER, Dirk. Identifying ISI-Indexed articles by their lexical usage: A text analysis approach. Journal of the Association for Information Science and Technology, v. 66, n. 3, p. 501–511, mar. 2015. ISSN 2330-1635, 2330-1643. DOI: 10.1002/asi.23194. Disponível em: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.23194. Acesso em: 21 nov. 2023.

NALESSO, Giulia. El desarrollo de la competencia léxica de estudiantes italianos universitarios de ELE. Orillas: rivista d’ispanistica, n. 7, p. 381–394, 2018. ISSN 2280-4390. Disponível em: https://dialnet.unirioja.es/servlet/articulo?codigo=7819128. Acesso em: 21 nov. 2023.

NASSERI, Maryam; THOMPSON, Paul. Lexical density and diversity in dissertation abstracts: Revisiting English L1 vs. L2 text differences. Assessing Writing, v. 47, p. 100511, jan. 2021. ISSN 10752935. DOI: 10.1016/j.asw.2020.100511. Disponível em: https://linkinghub.elsevier.com/retrieve/pii/S1075293520300726. Acesso em: 21 nov. 2023.

NATION, I.S. Paul. Learning vocabulary in another language. 2. ed. [S. l.]: Cambridge University Press, 2013. PERKINS, Mike. Academic integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond. Journal of University Teaching and Learning Practice, v. 20, n. 2, fev. 2023. ISSN 14499789, 14499789. DOI: 10.53761/1.20.02.07. Disponível em: https://ro.uow.edu.au/jutlp/vol20/iss2/07/. Acesso em: 21 nov. 2023.

RAMOS, Anatália Saraiva Martins. Inteligência Artificial Generativa baseada em grandes modelos de linguagem - ferramentas de uso na pesquisa acadêmica. [S. l.], maio 2023. DOI: 10.1590/SciELOPreprints.6105. Disponível em: https://preprints.scielo.org/index.php/scielo/preprint/view/6105/version/6463. Acesso em: 21 nov. 2023.

READ, John. Assessing vocabulary. Cambridge: Cambridge University Press, 2010.

RIFFO, Karina Fuentes; OSUNA, Sergio Hernández; LAGOS, Pedro Salcedo. Descripción de la diversidad y densidad léxicas en noticias escritas por estudiantes de periodismo. Revista Brasileira de Linguística Aplicada, v. 19, n. 3, p. 499–528, set. 2019. ISSN 1984-6398. DOI: 10.1590/1984-6398201914113. Disponível em: http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1984-63982019000300499&tlng=es. Acesso em: 21 nov. 2023.

ROSPIGLIOSI, Pericles ‘Asher’. Artificial intelligence in teaching and learning: what questions should we ask of ChatGPT? Interactive Learning Environments, v. 31, n. 1, p. 1–3, jan. 2023. ISSN 1049-4820, 1744-5191. DOI: 10.1080/10494820.2023.2180191. Disponível em: https://www.tandfonline.com/doi/full/10.1080/10494820.2023.2180191. Acesso em: 21 nov. 2023.

SCHNUR, Erin; RUBIO, Fernando. Lexical complexity, writing proficiency and task effects in Spanish Dual Language Immersion. Language Learning & Technology, v. 25, n. 1, p. 53–72, fev. 2021. ISSN 1094-3501. Disponível em: http://hdl.handle.net/10125/73425. Acesso em: 21 nov. 2023.

URE, Jean. Lexical density and register differentiation. In: PERREN, George Ernest; TRIM, John Leslie Melville (ed.). Applications of Linguistics: selected papers of the Second International Congress of Applied Linguistics. Cambridge: Cambridge University Press, 1971. p. 443–452.

URE, Jean; ELLIS, Jeffrey. Register in descriptive linguistics and linguistic sociology. In: URIBE-VILLEGAS, Oscar (ed.). Issues in Sociolinguistics. The Hague: Mouton, 1977. p. 197–243.

Published

2023-11-29

How to Cite

DA SILVA, A. M.; ROTTAVA, L. Lexical density in texts generated by ChatGPT: implications of artificial intelligence for writing in additional languages. Texto Livre, Belo Horizonte-MG, v. 17, p. e47836, 2023. DOI: 10.1590/1983-3652.2024.47836. Disponível em: https://periodicos.ufmg.br/index.php/textolivre/article/view/47836. Acesso em: 17 jul. 2024.

Issue

Section

Dossier 2024: Linguistic and cultural education mediated by digital technologies