Grammar in multimodal datasets: a case-study of image captions

um estudo de caso de legendas de imagens

Authors

DOI:

https://doi.org/10.35699/2317-2096.2025.57565

Keywords:

verbalization of experience, usage-based theory , grammatical construction, meaning, multimodality

Abstract

This article presents a linguistic study of a sample of 150 image captions from the multimodal dataset Framed Multi30k (Viridiano, 2024). The objective is to analyze image captions based on socio-cognitive factors involved in the process of verbalizing experience (Chafe, 2002, 2005; Croft, 2007), in order to partially explain the variation observed in the lexico-grammatical construction of captions. To this end, the article discusses a computational methodology for linguistic annotation of captions, which serves as the basis for extracting grammatical information. Additionally, it examines the experimental nature of controlled elicitation in caption production and its linguistic implications. The grammatical analysis focuses on verbalization processes, such as selection, categorization, and orientation. Finally, the study explores the nature of image captions as descriptive textual units, highlighting descriptive operations observed in their production.

References

ABBOTT, B. Reference. Oxford: Oxford University Press, 2010.

ADAM, J. M.; PETIT JEAN, A. Le texte descriptif. Paris: Édition Nathan, 1989.

ADAM, J. M. Textos: tipos e protótipos. Tradução de Monica Cavalcante. São Paulo: Editora Contexto, 2018.

BASILE, V. et al. Toward a Perspectivist Turn in Ground Truthing for Predictive Computing. Proceedings of the AAAI Conference on Artificial Intelligence, v. 39, n. 1, p. 1-17, 2023. Disponível em: h https://ojs.aaai.org/index.php/AAAI/article/view/25840. Acesso em: 14 abr. 2025.

BELCAVELLO, F. FrameNet Annotation for Multimodal Corpora: Devising a Methodology for the Semantic Representation of Text-image Interactions in Audiovisual Productions. 2023. 134f. Tese (Doutorado em Estudos Linguísticos) – Universidade Federal de Juiz de Fora, 2023. Disponível em: https://repositorio.ufjf.br/jspui/handle/ufjf/15527. Acesso em: 14 abr. 2025.

BYBEE, J. What is Usage-based Linguistics? In: DÍAZ-CAMPOS, M.; BALASCH, S. (orgs.) The Handbook of Usage-based Linguistics. New York: Wilet Blackwell, 2023. p. 9-30.

CHAFE, W. Some Thoughts on Schemata. In: ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. Proceedings TNLAP’75, p. 89-91, 1975a.

CHAFE, W. Creativity on Verbalization as Evidence for Analogic Knowledge. In: DEPARTAMENT OF LINGUISTICS. Proceedings TNLAP’75, p. 144-145, 1975a.

CHAFE, W. Creativity in Verbalization and its Implications for the Nature of Stored Knowledge. In: FREEDLE, R. O. (org.) Discourse Production and Comprehension. Norwood: Ablex, 1977a. p. 41-55.

CHAFE, W. The Recall and Verbalization of Past Experience. In: COLE, R. (org.) Current Issues in Linguistic Theory. London: Indiana Univeristy Press, 1977b. p. 215-246.

CHAFE, W. Things we can Learn from Repeated Tellings of the Same Experience. Narrative Inquiry, v. 8, n. 2, p. 269-285, 1998.

CHAFE, W. Putting Grammaticalization in its Place. In: WISCHER, Ilse; DIEWALD, Gabriele (org.) New Reflections on Rrammaticalization. Amsterdam: John Benjamins, 2002. p. 395-412.

CHAFE, W. The Relation of Grammar to Thought. In: BUTLER, C. S.; GÓMEZ-GONZÁLLEZ; M. de los Á.; DOVAL-SUÁREZ, S. (org.) The Dynamics of Language Use. Amsterdam: John Benjamins: 2005. p. 57-78.

CHELLIAH, S. L.; DE REUSE, W. J. Handbook of Descriptive Linguistic Fieldwork. New York: Springer, 2011.

CLARK, H. H. Arenas of Language Use. Cambridge, UK: Cambridge University Press, 1992.

CLARK, E. Languages and Representations. In: GENTNER, D.; GOLDIN-MEADOW, S. (orgs.) Language in Mind: Advances in the Study of Language and Thought. Cambridge, MA: The MIT Press, 2003.

p. 13-27.

COMRIE, B. Language Universals and Linguistic Typology. 2. ed. Chicago: The University of

Chicago Press, 1989.

CONEGLIAN, A. V. L. O modelo das dependências universais: assentando bases teóricas e revisando diretrizes metodológicas. Revista da Abralin, v. 23, n. 2, p. 187-214, 2023.

CROFT, W. Typology and Universals. 2. ed. Cambridge, UK: 2012.

CROFT, W. The Origins of Grammar in the Verbalization of Experience. Cognitive Linguistics, v. 18, n. 3, p. 339-382, 2007.

CROFT, W. The origins of grammaticalization in the verbalization of experience. Linguistics, v. 48, n. 1, p. 1-48, 2010.

CROFT, W. Ten lectures on construction grammar and typology. New York: Brill, 2020.

DE MARNEFFE, Marie-Catherine et al. Universal dependencies. Computational Linguistics, v. 47, n. 2, p. 255-308, 2021.

DIK, S. The theory of functional grammar. 2. ed. Berlin: Mouton de Gruyter, 1997.

FRAMED MULTI30K. Imagem 1000092795.jpg. In: FRAMED MULTI30K. Banco de dados. [S. l.]: [S. n.]. Disponível em:. Acesso em: https://github.com/FrameNetBrasil/framed-multi30k. Acesso em: 25 abr. 2025.

GIVÓN, T. Functionalism and grammar. Amsterdam: John Benjamins, 1995.

HALLIDAY, M.; HASAN, R. Cohesion in English. London: Longman, 1976.

KEMMER, S.; BARLOW, M. Introduction: a usage-based conception of language. In: BARLOW, M.; KEMMER, S. (orgs.) Usage-based models of language. Sanford: CSLI Publications, 2000. p. vii-xxviii.

JAIMES, A.; CHANG, S.-F. A Conceptual Framework for Indexing Visual Information at Multiple Levels. IS&T/SPIE Internet Imaging, v. 3964, s.p., 2000.

MAURI, C. Ad hoc categorization in linguistic interaction. In: MAURI, C. et al. (org.) Building categories in interaction: linguistic resources at work. Amsterdam: John Benjamins, 2021. p. 9-34.

MILTENVERG, E. Stereotyping and bias in the Flickr30K dataset. In: MULTIMODAL CORPORA: COMPUTER VISION AND LANGUAGE PROCESSING (MMC 2016). Proceedings [...]. [S. l.] p. 1-4, 2016. Disponível em: http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-MCC-2016-proceedings.pdf, Acesso em: 14 fev. 2024.

NEVES, M. H. M. Gramática de usos do português. 2. ed. São Paulo: Editora Unesp, 2011.

PARDO, T et al. Porttinari: a Large Multi-genre Treebank for Brazilian Portuguese. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DE INFORMAÇÃO E DA LINGUAGEM HUMANA. Anais... Porto Alegre: SBC, 2021. p. 1-10.

PEZZATI, E. A ordem das palavras no português. São Paulo: Parábola, 2014.

RINKE, E. A combinação de artigo definido e pronome possessivo na história do português. Estudos de Linguística Galega, v. 2, p. 121-139, 2010.

SAMARIN, W. J. Field Linguistics. New York: Holt, 1967.

SLOBIN, D. Language and Thought Online: Cognitive Consequences of Linguistic Relativity. In: GENTNER, D.; GOLDIN-MEADOW, S. (org.) Language in Mind: Advances in the Study of Language and Thought. Cambridge, MA: The MIT Press, 2003. p. 157-192.

STRAKA, M. et al. UDPipe: Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing. In: CALZOLARI, Nicoletta et al. (org.) c (LREC’16). European Language Resources Association, 2016. p. 4290-4297. Disponível em: https://aclanthology.org/L16-1680.pdf. Acesso em: 08 jan. 2025.

TALMY, L. Toward cognitive semantics. Cambridge, MA: The MIT Press, 2000. 2 v.

VIRIDIANO, M. Framed Multi30k: um dataset multimodal-multilíngue baseado em semântica de frames. 2024. 107f. Tese (Doutorado em Estudos Linguísticos) – Universidade Federal de Juiz de Fora, 2024. Disponível em: https://repositorio.ufjf.br/jspui/handle/ufjf/16854. Acesso em: 14 abr. 2025.

VIDIRIANO, M. et al. Framed Multi30k: a frame-based multimodal multilingual dataset. In: THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION (LREC-COLING 2024). Proceedings […]. Torino, Italia, 2024. p.7438-7449. Disponível em: https://aclanthology.org/2024.lrec-main.656.pdf. Acesso em: 14 abr. 2025.

YOUNG, P. et al. From image descriptions to visual descriptions: new similarity metrics for semantic inference over event descriptions. Transactions for Computational Linguistics, v. 2, p. 67-68, 2014.

Published

2025-04-30

Issue

Section

Multimodalidade: abordagens cognitivas e representações computacionais

How to Cite

Grammar in multimodal datasets: a case-study of image captions: um estudo de caso de legendas de imagens. (2025). Caligrama: Revista De Estudos Românicos, 30(1), 24-51. https://doi.org/10.35699/2317-2096.2025.57565