Multimodalidade: abordagens cognitivas e representações computacionais

Tiago Timponi Torrent; André V. Lopes  Coneglian

doi:10.35699/2317-2096.2025.58837

Autores/as

Tiago Timponi Torrent Universidade Federal de Juiz de Fora (UFJF) | Juiz de Fora | MG | BR https://orcid.org/0000-0001-5373-2297
André V. Lopes Coneglian Universidade Federal de Minas Gerais (UFMG) | Belo Horizonte | MG | BR https://orcid.org/0000-0003-1726-8890

DOI:

https://doi.org/10.35699/2317-2096.2025.58837

Palabras clave:

Multimodalide, visão computacional, datasets anotados, semântica de frames

Resumen

Este artigo apresenta uma introdução ao conceito de multimodalidade, discutido sob duas perspectivas principais: a metateórica, que compreende a multimodalidade como um campo de investigação sobre a produção de significado por meio de múltiplas formas semióticas; e a fenomenológica, que a entende como a integração de diferentes modalidades expressivas (fala, gesto, imagem, entre outras) em práticas comunicativas. A partir dessa base conceitual, o texto destaca a ausência histórica de atenção à multimodalidade nos campos da Linguística e da Ciência da Computação, refletida em modelos teóricos e computacionais que privilegiam formas linguísticas isoladas e convencionalizadas. Frente a esses desafios, apresentam-se os projetos no desenvolvidos no âmbito da ReINVenTA, uma rede de pesquisa dedicada à construção e anotação de datasets multimodais com base na Semântica de Frames, visando integrar linguística cognitiva e modelos computacionais. A conclusão aponta para a necessidade de abordagens interdisciplinares que reconheçam a linguagem como um fenômeno social, interacional e intrinsecamente multimodal.

Referencias

ADAM, J. M. Textos: tipos e protótipos. Tradução de Monica Cavalcante. São Paulo:

Editora Contexto, 2018.

ADAMI, E.; KRESS, G. Introduction: multimodality, meaning making, and the issue of “text”. Text & Talk, [S. l.], v. 34, n. 3, p. 231-237, 2014.

BATEMAN, J. A.; WILDEFEUER, J.; HIIPPALA, T. Multimodality: Foundations, Research and Analysis – A Problem-Oriented Introduction. Berlin: De Gruyter Mouton, 2017.

BAVELAS, J. B. Face-to-Face Dialogue: Theory, Research, and Applications. Oxford: Oxford University Press, 2022.

BELCAVELLO, F. et al. Frame2: A FrameNet-Based Multimodal Dataset for Tackling Text-image Interactions in Video. In: JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION (LREC-COLING 2024), 2024, Torino. Proceedings […]. Torino: European Language Resources Association (ELRA)/ ICCL, 2024. p. 7429-7437.

BOMMASANI, R. et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.DOI: https://doi.org/10.48550/arXiv.2108.07258.

CAFFAGNI, D.; COCCHI, F.; BARSELLOTTI, L.; MORATELLI, N.; SARTO, S.; BARALDI, L.; CORNIA, M.; CUCCHIARA, R. The Revolution of Multimodal Large Language Models: A Survey. In: FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024. Findings of the Association for Computational Linguistics: ACL 2024. Bangkok: Association for Computational Linguistics, 2024. p. 13590-13618.

CHAFE, W. Creativity on Verbalization as Evidence for Analogic Knowledge. In: DEPARTMENT OF LINGUISTICS. Proceedings TNLAP’75, p. 144-145, 1975. Acesso em: https://aclanthology.org/T75-2029.pdf. Acesso em: 30 abr. 2025.

COHN, N.; SCHILPEROORD, J. A Multimodal Language Faculty: A Cognitive Framework for Human Communication. Londres: Bloomsbury Academic, 2024.

CROFT, W. The Origins of Grammar in the Verbalization of Experience. Cognitive Linguistics, v. 18, n. 3, p. 339-382, 2007.

CROFT, William; CRUSE, D. Alan. Cognitive Linguistics. Cambridge: Cambridge University Press, 2004.

CZULO, O.; ZIEM, A.; TORRENT, T. T. Beyond Lexical Semantics: Notes on Pragmatic Frames. In: LREC INTERNATIONAL FRAMENET WORKSHOP. Proceedings […]. Marseille: ELRA, 2020. p. 1-7.

DANNÉLLS, D.; TORRENT, T. T.; SIGILIANO, N. S.; DOBNIK, S. Beyond Strings of Characters: Resources Meet NLP – Again. In: VOLODINA, E.; DANNÉLLS, D.; BERDICEVSKIS, A.; FORSBERG, M.; VIRK, S. (ed.). Live and Learn: Festschrift in Honor of Lars Borin. Gothenburg: Institutionen för Svenska, Flerspråkighet och Språkteknologi – Göteborgs Universitet, 2022. p. 29-36.

DEVLIN, J.; CHANG, M-W.; LEE, K.; TOUTANOVA, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In: ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL). Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, v. 1 (Long and Short Papers), 2019. p. 4171-4186.

DORNELAS, L. G.; GAMONAL, M. A.; PAGANO, A. S. Semantic analysis of audio description in short films: a multimodal approach based on Frame Semantics. Domínios de Lingu@gem, 1866, e1801, p. 1-30, 2024.

ENFIELD, Nick. The Anatomy of Meaning: Speech, Gesture, and Compositionality. Cambridge: Cambridge University Press, 2009.

ENGLE, R. A. Not channels but composite signals: speech, gesture, diagrams and object demonstrations are integrated in multimodal explanations. In: GERNSBACHER, M. A.; DERRY, S. J. (org.). Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates, 1998. p. 321-326.

FILLMORE, C. J. Frame Semantics. In: LINGUISTIC SOCIETY OF KOREA (ed.). Linguistics in the Morning Calm. Seoul: Hanshin Publishing, 1982. p. 111-137.

FILLMORE, C. J.; BAKER, C. F. A frames approach to semantic analysis. In: HEINE, B.; NARROG, H. (org.). The Oxford Handbook of Linguistic Analysis. Oxford: Oxford University Press, 2009. p. 313-340.

GRICE, H. P. Logic and conversation. In: COLE, P.; MORGAN, J. (ed.). Syntax and Semantics, Volume 3. New York: Academic Press, 1975. p. 41-58.

JEWITT, C. Multimodal approaches. In: NORRIS, S.; MAIER, C. D. (orgs.). Interactions, Images and Texts: A Reader in Multimodality. Berlin; München; Boston: De Gruyter Mouton, 2014. p. 127–136.

KOCKELMAN, P. The semiotic stance. Semiotica, Berlin, v. 2005, n. 157, p. 233-304, 2005.

KUZNETSOVA, A.; ROM, H.; ALLDRIN, N.; UIJLINGS, J.; KRASIN, I.; PONT-TUSET, J.; KAMALI, S.; POPOV, S.; MALLOCI, M.; KOLESNIKOV, A.; DUERIG, T.; FERRARI, V. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. International journal of computer vision, n. 128, v. 7, p. 1956-1981, 2020.

LI, L. H.; YATSKAR, M.; YIN, D.; HSIEH, C-H.; CHANG, K-W. VisualBERT: A Simple and Performant Baseline for Vision and Language. arXiv preprint arXiv:1908.03557, 2019. DOI: https://doi.org/10.48550/arXiv.1908.03557.

LINELL, P. The Written Language Bias in Linguistics: its Nature, Origins and Transformations. Londres: Routledge, 2005.

LINELL, P. The Written Language Bias (WLB) in linguistics 40 years after. Language Sciences, [S. l.], v. 76, p. 101-109, jun. 2019.

OPENAI. ChatGPT: comida a chute explicada. Disponível em: https://chatgpt.com/share/6806cc07-d5c8-8000-b47e-c30029bc8849. Acesso em: 21 abr. 2025.

ROJO, A. Applying Frame Semantics to Translation: A Practical Example. Meta, 47(3), p. 312-350. 2002.

ROMBACH, R.; BLATTMANN, A.; LORENZ, D.; ESSER, P.; OMMER, B. High-resolution Image Synthesis with Latent Diffusion Models. In: CONFERENCE: 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR). Proceedings [...]. [S. l] , 2022. p. 10684-10695.

SALOMÃO, M. M. M. Gramática das construções: a questão da integração entre sintaxe e léxico. Veredas, n. 6, v. 1, p. 63-74. 2002.

SALOMÃO, M. M. M. Teorias da linguagem: a perspectiva sociocognitiva. In: MIRANDA, N. S.; SALOMÃO, M. M. M. (ed.). Construções do português do Brasil: da gramática ao discurso. Belo Horizonte: Editora UFMG, 2009. p. 20-32.

TOMASELLO, M. Origins of Human Communication. Cambridge, Mass.: MIT Press, 2008.

TORRENT, T. T.; MATOS, E. E. D. S.; COSTA, A. D. D.; GAMONAL, M. A.; PERON-CORRÊA, S.; PAIVA, V. M. R. L. A Flexible Tool for a Qualia-enriched FrameNet: the FrameNet Brasil WebTool. Language Resources and Evaluation, p. 1-29. 2024.

VIDIRIANO, M. et al. Framed Multi30k: A Frame-based Multimodal Multilingual Dataset. In: THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION (LREC-COLING 2024). Proceedings [...]. [S. l.] p. 7438-7449, 2024. ADAM, J. M. Textos: tipos e protótipos. Tradução de Monica Cavalcante. São Paulo: