Representações multimodais de conteúdos do gênero jornalístico: ganhos e desafios da expansão dos datasets da ReINVenTA

Frederico Belcavello; Marcelo Viridiano

doi:10.35699/2317-2096.2025.57569

Autores

Frederico Belcavello Universidade Federal de Juiz de Fora (UFJF) | Juiz de Fora | MG | BR / College of Arts and Science, Case Western Reserve University (CWRU) | Cleveland | OH | EUA https://orcid.org/0000-0001-5808-5201
Marcelo Viridiano Case Western Reserve University https://orcid.org/0000-0002-9706-8663

DOI:

https://doi.org/10.35699/2317-2096.2025.57569

Palavras-chave:

Semântica de Frames, multimodalidade, jornalismo, FrameNet

Resumo

Este artigo discute os ganhos e desafios da expansão do dataset da ReINVenTA para a inclusão gêneros multimodais jornalísticos, explorando as especificidades e relações entre elementos visuais e textuais neste novo gênero, e buscando aprimorar a semântica das representações multimodais da atual base de dados. Dois novos corpora são propostos: um de imagens e textos jornalísticos, e outro de telejornais, com foco nas matérias televisivas. A metodologia envolve a extração e rotulação automática de dados visuais e textuais, com validação humana para garantir a precisão e mitigar vieses, e anotação integrada de áudio falado e imagens de conteúdos audiovisuais jornalísticos conforme as características peculiares do gênero.

Referências

AANGELO, M. H. Gêneros textuais e telejornalismo: caminhos da produção escrita de matérias televisivas. 2014. 286 p. Tese (Doutorado em Linguística) – Faculdade de Letras, Universidade Federal de Juiz de Fora, 2014.

BARTHES, Roland. Rhetoric of the Image. In: BARTHES, Roland (ed.) Image-Music-text. London: Fontana, 1977[1964]. p. 33-51.

BELCAVELLO, Frederico; VIRIDIANO, Marcelo; COSTA, Alexandre Diniz da; MATOS, Ely E. S.; TORRENT, Tiago T. Frame-Based Annotation of Multimodal Corpora: Tracking (A) Synchronies in Meaning Construction. In: LANGUAGE RESOURCES AND EVALUATION CONFERENCE (LREC 2020). Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet. Marseille: ELRA, 2020. p. 23-30.

BELCAVELLO, Frederico. FrameNet Annotation for Multimodal Corpora: Devising a Methodology for the Semantic Representation of Text-image Interactions in Audiovisual Productions. 2023. 134 p. Tese (Doutorado em Linguística) – Faculdade de Letras, Programa de Pós-graduação em Linguística, Universidade Federal de Juiz de Fora, Juiz de Fora, 2023.

BELCAVELLO, Frederico et al. Frame2: A FrameNet-based Multimodal Dataset for Tackling Text-image Interactions in Video. In: INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC). Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino: ELRA and ICCL, 2024. p. 7429-7437. Disponível em: https://aclanthology.org/2024.lrec-main.655/. Acesso em: 22 abr. 2025.

COHN, N., MAGLIANO, J. P. Editors’ Introduction and Review: Visual Narrative Research: An Emerging Field in Cognitive Science. Topics in Cognitive Science, v. 12, n. 1, p. 197-223, 2020.

ELLIOTT, D. et al. Multi30k: Multilingual english-german image descriptions. arXiv preprint arXiv:1605.00459, 2016.

FERNANDEZ, Leohoho. Green and yellow scissors on white graphing paper. 2021. Fotografia. Disponível em: https://unsplash.com/photos/green-and-yellow-scissors-on-white-graphing-paper-J_galDuu4kc. Acesso em: 22 abr. 2025.

FILLMORE, C. J. Frame semantics. In: THE LINGUISTIC SOCIETY OF KOREA. Linguistics in the Morning Calm. Seoul: Hanshin, 1982. p. 111-137.

FILLMORE C. J., PETRUCK, M. R., RUPPENHOFER, J., & WRIGHT, A. FrameNet in Action: The case of attaching. International journal of lexicography, 16 (3), 297-332. 2003.

GARG, M., WAZARKAR, S., SINGH, M., & BOJAR, O. Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022. p. 6837-6847.

HODOSH, M., YOUNG, P., & HOCKENMAIER, J. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. Journal of Artificial Intelligence Research, 47, 2013, p. 853-899.

LEONARD, Cathryn. Person holding pencil writing on notebook. 2021. Fotografia. Disponível em: https://unsplash.com/photos/person-holding-pencil-writing-on-notebook-RdmLSJR-tq8. Acesso em: 22 abr. 2025.

LIU, S., ZENG, Z., REN, T., LI, F., ZHANG, H., YANG, J., ZHANG, L. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In: European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024. p. 38-55.

MARTINEC, R.; SALWAY, A. A System for Image–text Relations in New (and old) Media. Visual communication, v. 4, n. 3, p. 337-371, 2005. Disponível em: https://journals.sagepub.com/doi/abs/10.1177/1470357205055928. Acesso em: 22 abr. 2025.

MATTHIESSEN, C. Introduction to functional grammar. London: Hodder Arnold, 1989.

MØLLER, A. G., PERA, A., DALSGAARD, J., & AIELLO, L. The Parrot Dilemma: Human-labeled vs. llm-augmented data in Classification Tasks. In: Graham, Y.; Purver, P. (ed.) Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), 2024. p. 179-192. Disponível em: https://aclanthology.org/volumes/2024.eacl-long/. Acesso em: 22 abr. 2025.

OPENAI. ChatGPT-4o: Multimodal AI Model, 2024. [Online]. Disponível em: https://openai.com. Acesso em: 22 abr. 2025.

OTTO, Christian; SPRINGSTEIN, Matthias; ANAND, Avishek; EWERTH, Ralph. Understanding, categorizing and predicting semantic image-text relations. In: INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, New York. Proceedings [...]. New York: Association for Computing Machinery, 2019. p. 168-176. Disponível em: https://doi.org/10.1145/3323873.3325049. Acesso em: 23 abr. 2025.

PRABHU, V. U., & BIRHANE, A. Large datasets: A pyrrhic win for computer vision. In: Institute of Electrical and Electronics Engineers/Computer Vision Foundation Conference on Applications of Computer Vision. 2021.

PLUMMER B. A., WANG, L., CERVANTES, C. M., CAICEDO, J. C., HOCKENMAIER, J., & LAZEBNIK, S. Flickr30k entities: Collecting Region-to-phrase Correspondences for Richer Image-to-sentence models. In: 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION. Proceedings of the IEEE international conference on computer vision, Chile, 2015. p. 2641-2649.

RICCIARDI, Dean. Pink blue and green pens. 2021. Fotografia. Disponível em: https://unsplash.com/photos/pink-blue-and-green-pens-uWh-hYisqAw. Acesso em: 22 abr. 2025

RIFFE, D.; AUST, C. F.; LACY, S. R. The Effectiveness of Random, Consecutive Day and Constructed Week Sampling in Newspaper Content Analyses. Journalism Quarterly, v. 70, n. 1, p. 133-139, spring, 1993.

ROGERS, A. Changing the World by Changing the Data. arXiv preprint arXiv:2105.13947. 2021. DOI: https://doi.org/10.48550/arXiv.2105.13947.

SANABRIA, Ramon et al. How2: a large-scale dataset for multimodal language understanding. Cornell University, 2018.doi: https://doi.org/10.48550/arXiv.1811.00347.

TORRENT, T.; MATOS, E. E. da S.; BELCAVELLO, F.; VIRIDIANO, M.; GAMONAL, M. A.; COSTA, A. D. da; MARIM, M. C. Representing Context in FrameNet: A Multidimensional, Multimodal Approach. Frontiers in Psychology, v. 13, 2022. Disponível em: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2022.838441. Acesso em: 22 abr. 2025. DOI: 10.3389/fpsyg.2022.838441. ISSN 1664-1078.

SALLES, Renato. Em Contagem, Lula discursa sobre questões econômicas e condições financeiras dos brasileiros. Tribuna De Minas, Juiz de Fora, 10 maio 2022a. Disponível em: https://tribunademinas.com.br/noticias/politica/eleicoes-2022/10-05-2022/em-contagem-lula-discursa-sobre-questoes-economicas-e-condicoes-financeiras-dos-brasileiros.html. Acesso em: 23 abr. 2025.

BOA Viagem. Tribuna De Minas, Juiz de Fora, 11 fevereiro 2025. Disponível em: https://tribunademinas.com.br/especiais/boa-viagem. Acesso em: 23 abr. 2025.

MAZOCOLI, Elisabetta. Nara Vidal lança novo romance “Eva”. Tribuna De Minas, Juiz de Fora, 7 abr. 2022b. Disponível em: https://tribunademinas.com.br/noticias/cultura/07-04-2022/nara-vidal-lanca-novo-romance-eva.html. Acesso em: 23 abr. 2025.

CHUVAS no RS: subida do Guaíba deixa bairros de POA em alerta. Tv Brasil. Brasília, 6 maio 2024. Disponível em: https://tvbrasil.ebc.com.br/reporter-brasil/2024/05/chuvas-no-rs-subida-do-guaiba-deixa-bairros-de-poa-em-alerta. Acesso em: 23 abr. 2025.

UPPAL, S., BHAGAT, S., HAZARIKA, D., MAJUMDER, N., PORIA, S., ZIMMERMANN, R., & ZADEH, A. Multimodal research in vision and language: A Review of Current and Emerging Trends. Information Fusion, v. 77, p. 149-171, 2022.

VAN MILTENBURG, E. Stereotyping and bias in the flickr30k dataset. arXiv preprint arXiv:1605.06083, 2016. DOI: https://doi.org/10.48550/arXiv.1605.06083.

VIRIDIANO, M., LORENZI, A., TORRENT, T. T., MATOS, E. E., PAGANO, A. S., SIGILIANO, N. S., de FREITAS, M. H. P. Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset. In:THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION (LREC-COLING 2024). Proceedings [...]. [S. l.] 2024. p. 7438-7449.

XIA, P., QIN, G., VASHISHTA, S., CHEN, Y., CHEN, T., MAY, C., HARMAN. C., RAWLINS, K., WHITE, A. S., VAN DURME, B. LOME: Large Ontology Multilingual Extraction. arXiv preprint arXiv:2101.12175. 2021. DOI: https://doi.org/10.18653/v1/2021.eacl-demos.19.

YOUNG, P., LAI, A.; HODOSH, M.; HOCKENMAIER, J. From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference Over Event Descriptions. Transactions of the Association for Computational Linguistics, v. 2, 67-78, 2014.

Representações multimodais de conteúdos do gênero jornalístico

ganhos e desafios da expansão dos datasets da ReINVenTA

Autores

DOI:

Palavras-chave:

Resumo

Referências

Downloads

Publicado

Edição

Seção

Licença

Como Citar

Artigos mais recentes

Informações

Idioma