Challenges for large generative language models in reproducing textual complexity
a study on newspaper editorials
DOI:
https://doi.org/10.1590/1983-3652.2025.58530Keywords:
Generative language models, Textual complexity, Discourse genres, Artificial intelligence and language, Human-AI comparative analysisAbstract
This article evaluates the performance of the generative language model Sabiá-3 in the task of reproducing aspects of textual complexity characteristic of the editorial discourse genre, using human-produced editorials as a reference point. For this task, we employed metrics from the computational tool NILC-Metrix. Our results revealed differences in four out of the five analyzed metrics. Human texts showed greater complexity in the measures of “type-token ratio” and “cross-entropy”. We argue that this outcome may be linked, for instance, to the human ability to select words or form lexical combinations without the constraints of probabilistic parameters. In contrast, the texts generated by the Sabiá-3 model displayed higher complexity in the “syllables per word” and “subordinate clauses” metrics, possibly due, among other factors, to the fact that such tools do not face cognitive processing limitations. The only metric without a statistically significant difference was “hard conjunctions”. We attribute this result to the closed-class nature of this grammatical category, which tends to limit variation. This study reinforces the importance of considering multiple dimensions of textual complexity when evaluating the output of large generative language models, especially in genres that require refined linguistic control, such as the editorial.
Downloads
References
ABONIZIO, Hugo; ALMEIDA, Thales Sales; LAITZ, Thiago; MALAQUIAS JUNIOR, Roseval; BONÁS, Giovana Kerche; NOGUEIRA, Rodrigo; PIRES, Ramon. Sabiá-3 technical report. [S. l.: s. n.], 2025. arXiv: 2410.12049. Disponível em: https://arxiv.org/abs/2410.12049.
ALMEIDA, Erica; CALLOU, Dinah. Sobre o uso variável do subjuntivo em português: um estudo de tendência. In: BRITO, Ana Maria; SILVA, Maria de Fátima Henriques da; VELOSO, João; FIÉIS, Alexandra (ed.). XXV Encontro Nacional da Associação Portuguesa de Linguística: Textos seleccionados. Porto: APL, 2010. p. 143–152.
BAKHTIN, Mikhail. Estética da criação verbal. São Paulo: Martins Fontes, 1997.
BIBER, Douglas; CONRAD, Susan. Register, genre, and style. Cambridge: Cambridge University Press, 2009. FRANCESCHELLI, Giorgio; MUSOLESI, Mirco. On the creativity of large language models. AI & SOCIETY, p. 1–11, 2024.
FRANTZ, Roger; STARR, Laura; BAILEY, Alison. Syntactic complexity as an aspect of text complexity. Educational Researcher, v. 44, n. 7, p. 387–393, 2015.
GIBSON, Edward. The dependency locality theory: a distance-based theory of linguistic complexity. In: MARANTZ, Alec; MIYASHITA, Yasushi; O’NEIL, Wayne (ed.). Image, Language, Brain: Papers from the First Mind Articulation Project Symposium. Cambridge, MA: The MIT Press, 2000. p. 95–126.
GULORDAVA, Kristina; BOJANOWSKI, Piotr; GRAVE, Edouard; LINZEN, Tal; BARONI, Marco. Colorless green recurrent networks dram hierarchically. [S. l.: s. n.], 2018. arXiv: 1803.11138. Disponível em: https://arxiv.org/abs/1803.11138.
HOLTZMAN, Ari; BUYS, Jan; DU, Li; FORBES, Maxwell; CHOI, Yejin. The curious case of neural text degeneration. [S. l.: s. n.], 2020. arXiv: 1904.09751. Disponível em: https://arxiv.org/abs/1904.09751.
LEAL, Sidney Evaldo; DURAN, Magali Sanches; SCARTON, Carolina Evaristo; HARTMANN, Nathan Siegle; ALUÍSIO, Sandra Maria. NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese. Language Resources and Evaluation, v. 58, n. 1, p. 73–110, 2024.
LEVELT, Willem. Speaking: from intention to articulation. Cambridge, MA: The MIT Press, 1989.
MCNAMARA, Danielle; LOUWERSE, Max; GRAESSER, Arthur. Coh-Metrix: automated cohesion and coherence scores to predict text readability and facilitate comprehension. [S. l.: s. n.], 2002. Technical report, Institute for Intelligent Systems, University of Memphis, TN.
NAIK, Dishita; NAIK, Ishita; NAIK, Nitin. Applications of AI chatbots based on generative AI, large language models and large multimodal models. In: NAIK, Nitin; JENKINS, Paul; PRAJAPAT, Shaligram;
GRACE, Paul (ed.). Contributions Presented at The International Conference on Computing, Communication, Cybersecurity and AI, July 3–4, 2024, London, UK. Cham: Springer Nature Switzerland, 2024. p. 668–690.
PIANTADOSI, Steven T. Zipf’s word frequency law in natural language: a critical review and future directions. Psychonomic Bulletin & Review, v. 21, p. 1112–1130, 2014.
RADFORD, Alec; WU, Jeffrey; CHILD, Rewon; LUAN, David; AMODEI, Dario; SUTSKEVER, Ilya. Language models are unsupervised multitask learners. OpenAI blog, v. 1, n. 8, p. 1–9, 2019.
REISENBICHLER, Martin; REUTTERER, Thomas; SCHWEIDEL, David A.; DAN, Daniel. Frontiers: supporting content marketing with natural language generation. Marketing Science, v. 41, n. 3, p. 441–452, 2022.
TEAM, R Core. R: A Language and Environment for Statistical Computing. Vienna, Austria, 2021. Disponível em: https://www.R-project.org/.
TOUVRON, Hugo; LAVRIL, Thibaut; IZACARD, Gautier; MARTINET, Xavier; LACHAUX, Marie-Anne; LACROIX, Timothée; ROZIÈRE, Baptiste; GOYAL, Naman; HAMBRO, Eric; AZHAR, Faisal; RODRIGUEZ, Aurelien; JOULIN, Armand; GRAVE, Edouard; LAMPLE, Guillaume. LLaMA: open and efficient foundation language models. [S. l.: s. n.], 2023. arXiv: 2302.13971. Disponível em: https://arxiv.org/abs/2302.13971.
VASWANI, Ashish; SHAZEER, Noam; PARMAR, Niki; USZKOREIT, Jakob; JONES, Llion; GOMEZ, Aidan; KAISER, Lukasz; POLOSUKHIN, Illia. Attention is all you need. In: LUXBURG, Ulrike von; GUYON, Isabelle; BENGIO, Samy; WALLACH, Hanna; FERGUS, Rob; VISHWANATHAN, S.; GARNETT, Roman (ed.). Advances in Neural Information Processing Systems 30. Red Hook: Curran Associates, Inc., 2018. p. 5999–6009.
VIEIRA, Rosaura Maria Marques. O editorial de jornal. In: DELL’ISOLA, Regina Lúcia Péret (ed.). Nos domínios dos gêneros textuais. Belo Horizonte: FALE/UFMG, 2009. v. 2. p. 15–20.
WELLECK, Sean; KULIKOV, Ilia; ROLLER, Stephen; DINAN, Emily; CHO, Kyunghyun; WESTON, Jason. Neural text generation with unlikelihood training. [S. l.: s. n.], 2019. arXiv: 1908.04319. Disponível em: https://arxiv.org/abs/1908.04319.
Downloads
Published
Data Availability Statement
Os dados de pesquisa só estão disponíveis mediante solicitação.
Issue
Section
License
Copyright (c) 2025 André Luis Antonelli

This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an open access article that allows unrestricted use, distribution and reproduction in any medium as long as the original article is properly cited.








