Representações multimodais de conteúdos do gênero jornalístico: ganhos e desafios da expansão dos datasets da ReINVenTA

Frederico Belcavello; Marcelo Viridiano

doi:10.35699/2317-2096.2025.57569

Authors

Frederico Belcavello Universidade Federal de Juiz de Fora (UFJF) | Juiz de Fora | MG | BR / College of Arts and Science, Case Western Reserve University (CWRU) | Cleveland | OH | EUA https://orcid.org/0000-0001-5808-5201
Marcelo Viridiano Case Western Reserve University https://orcid.org/0000-0002-9706-8663

DOI:

https://doi.org/10.35699/2317-2096.2025.57569

Keywords:

Frame Semantics, multimodality, journalism, FrameNet, multimodal dataset

Abstract

This article examines the benefits and challenges of expanding the ReINVenTA dataset to include multimodal journalistic genres, exploring the specificities and relationships between visual and textual elements in this new genre while aiming to enhance the semantic representation of multimodal data in the existing database. Two new corpora are proposed: one corpus consisting of journalistic images and texts and another corpus focused on television news broadcasts, particularly news reports. The methodology involves the automatic extraction and labeling of visual and textual data, complemented by human validation to ensure accuracy and mitigate biases, as well as the integrated annotation of spoken audio and images from audiovisual journalistic content, considering the peculiar characteristics of the genre.

References

AANGELO, M. H. Gêneros textuais e telejornalismo: caminhos da produção escrita de matérias televisivas. 2014. 286 p. Tese (Doutorado em Linguística) – Faculdade de Letras, Universidade Federal de Juiz de Fora, 2014.

BARTHES, Roland. Rhetoric of the Image. In: BARTHES, Roland (ed.) Image-Music-text. London: Fontana, 1977[1964]. p. 33-51.

BELCAVELLO, Frederico; VIRIDIANO, Marcelo; COSTA, Alexandre Diniz da; MATOS, Ely E. S.; TORRENT, Tiago T. Frame-Based Annotation of Multimodal Corpora: Tracking (A) Synchronies in Meaning Construction. In: LANGUAGE RESOURCES AND EVALUATION CONFERENCE (LREC 2020). Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet. Marseille: ELRA, 2020. p. 23-30.

BELCAVELLO, Frederico. FrameNet Annotation for Multimodal Corpora: Devising a Methodology for the Semantic Representation of Text-image Interactions in Audiovisual Productions. 2023. 134 p. Tese (Doutorado em Linguística) – Faculdade de Letras, Programa de Pós-graduação em Linguística, Universidade Federal de Juiz de Fora, Juiz de Fora, 2023.

BELCAVELLO, Frederico et al. Frame2: A FrameNet-based Multimodal Dataset for Tackling Text-image Interactions in Video. In: INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC). Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino: ELRA and ICCL, 2024. p. 7429-7437. Disponível em: https://aclanthology.org/2024.lrec-main.655/. Acesso em: 22 abr. 2025.

COHN, N., MAGLIANO, J. P. Editors’ Introduction and Review: Visual Narrative Research: An Emerging Field in Cognitive Science. Topics in Cognitive Science, v. 12, n. 1, p. 197-223, 2020.

ELLIOTT, D. et al. Multi30k: Multilingual english-german image descriptions. arXiv preprint arXiv:1605.00459, 2016.

FERNANDEZ, Leohoho. Green and yellow scissors on white graphing paper. 2021. Fotografia. Disponível em: https://unsplash.com/photos/green-and-yellow-scissors-on-white-graphing-paper-J_galDuu4kc. Acesso em: 22 abr. 2025.

FILLMORE, C. J. Frame semantics. In: THE LINGUISTIC SOCIETY OF KOREA. Linguistics in the Morning Calm. Seoul: Hanshin, 1982. p. 111-137.

FILLMORE C. J., PETRUCK, M. R., RUPPENHOFER, J., & WRIGHT, A. FrameNet in Action: The case of attaching. International journal of lexicography, 16 (3), 297-332. 2003.

GARG, M., WAZARKAR, S., SINGH, M., & BOJAR, O. Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022. p. 6837-6847.

HODOSH, M., YOUNG, P., & HOCKENMAIER, J. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. Journal of Artificial Intelligence Research, 47, 2013, p. 853-899.

LEONARD, Cathryn. Person holding pencil writing on notebook. 2021. Fotografia. Disponível em: https://unsplash.com/photos/person-holding-pencil-writing-on-notebook-RdmLSJR-tq8. Acesso em: 22 abr. 2025.

LIU, S., ZENG, Z., REN, T., LI, F., ZHANG, H., YANG, J., ZHANG, L. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In: European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024. p. 38-55.

MARTINEC, R.; SALWAY, A. A System for Image–text Relations in New (and old) Media. Visual communication, v. 4, n. 3, p. 337-371, 2005. Disponível em: https://journals.sagepub.com/doi/abs/10.1177/1470357205055928. Acesso em: 22 abr. 2025.

MATTHIESSEN, C. Introduction to functional grammar. London: Hodder Arnold, 1989.

MØLLER, A. G., PERA, A., DALSGAARD, J., & AIELLO, L. The Parrot Dilemma: Human-labeled vs. llm-augmented data in Classification Tasks. In: Graham, Y.; Purver, P. (ed.) Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), 2024. p. 179-192. Disponível em: https://aclanthology.org/volumes/2024.eacl-long/. Acesso em: 22 abr. 2025.

OPENAI. ChatGPT-4o: Multimodal AI Model, 2024. [Online]. Disponível em: https://openai.com. Acesso em: 22 abr. 2025.

OTTO, Christian; SPRINGSTEIN, Matthias; ANAND, Avishek; EWERTH, Ralph. Understanding, categorizing and predicting semantic image-text relations. In: INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, New York. Proceedings [...]. New York: Association for Computing Machinery, 2019. p. 168-176. Disponível em: https://doi.org/10.1145/3323873.3325049. Acesso em: 23 abr. 2025.

PRABHU, V. U., & BIRHANE, A. Large datasets: A pyrrhic win for computer vision. In: Institute of Electrical and Electronics Engineers/Computer Vision Foundation Conference on Applications of Computer Vision. 2021.

PLUMMER B. A., WANG, L., CERVANTES, C. M., CAICEDO, J. C., HOCKENMAIER, J., & LAZEBNIK, S. Flickr30k entities: Collecting Region-to-phrase Correspondences for Richer Image-to-sentence models. In: 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION. Proceedings of the IEEE international conference on computer vision, Chile, 2015. p. 2641-2649.

RICCIARDI, Dean. Pink blue and green pens. 2021. Fotografia. Disponível em: https://unsplash.com/photos/pink-blue-and-green-pens-uWh-hYisqAw. Acesso em: 22 abr. 2025

RIFFE, D.; AUST, C. F.; LACY, S. R. The Effectiveness of Random, Consecutive Day and Constructed Week Sampling in Newspaper Content Analyses. Journalism Quarterly, v. 70, n. 1, p. 133-139, spring, 1993.

ROGERS, A. Changing the World by Changing the Data. arXiv preprint arXiv:2105.13947. 2021. DOI: https://doi.org/10.48550/arXiv.2105.13947.

SANABRIA, Ramon et al. How2: a large-scale dataset for multimodal language understanding. Cornell University, 2018.doi: https://doi.org/10.48550/arXiv.1811.00347.

TORRENT, T.; MATOS, E. E. da S.; BELCAVELLO, F.; VIRIDIANO, M.; GAMONAL, M. A.; COSTA, A. D. da; MARIM, M. C. Representing Context in FrameNet: A Multidimensional, Multimodal Approach. Frontiers in Psychology, v. 13, 2022. Disponível em: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2022.838441. Acesso em: 22 abr. 2025. DOI: 10.3389/fpsyg.2022.838441. ISSN 1664-1078.

SALLES, Renato. Em Contagem, Lula discursa sobre questões econômicas e condições financeiras dos brasileiros. Tribuna De Minas, Juiz de Fora, 10 maio 2022a. Disponível em: https://tribunademinas.com.br/noticias/politica/eleicoes-2022/10-05-2022/em-contagem-lula-discursa-sobre-questoes-economicas-e-condicoes-financeiras-dos-brasileiros.html. Acesso em: 23 abr. 2025.

BOA Viagem. Tribuna De Minas, Juiz de Fora, 11 fevereiro 2025. Disponível em: https://tribunademinas.com.br/especiais/boa-viagem. Acesso em: 23 abr. 2025.

MAZOCOLI, Elisabetta. Nara Vidal lança novo romance “Eva”. Tribuna De Minas, Juiz de Fora, 7 abr. 2022b. Disponível em: https://tribunademinas.com.br/noticias/cultura/07-04-2022/nara-vidal-lanca-novo-romance-eva.html. Acesso em: 23 abr. 2025.

CHUVAS no RS: subida do Guaíba deixa bairros de POA em alerta. Tv Brasil. Brasília, 6 maio 2024. Disponível em: https://tvbrasil.ebc.com.br/reporter-brasil/2024/05/chuvas-no-rs-subida-do-guaiba-deixa-bairros-de-poa-em-alerta. Acesso em: 23 abr. 2025.

UPPAL, S., BHAGAT, S., HAZARIKA, D., MAJUMDER, N., PORIA, S., ZIMMERMANN, R., & ZADEH, A. Multimodal research in vision and language: A Review of Current and Emerging Trends. Information Fusion, v. 77, p. 149-171, 2022.

VAN MILTENBURG, E. Stereotyping and bias in the flickr30k dataset. arXiv preprint arXiv:1605.06083, 2016. DOI: https://doi.org/10.48550/arXiv.1605.06083.

VIRIDIANO, M., LORENZI, A., TORRENT, T. T., MATOS, E. E., PAGANO, A. S., SIGILIANO, N. S., de FREITAS, M. H. P. Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset. In:THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION (LREC-COLING 2024). Proceedings [...]. [S. l.] 2024. p. 7438-7449.

XIA, P., QIN, G., VASHISHTA, S., CHEN, Y., CHEN, T., MAY, C., HARMAN. C., RAWLINS, K., WHITE, A. S., VAN DURME, B. LOME: Large Ontology Multilingual Extraction. arXiv preprint arXiv:2101.12175. 2021. DOI: https://doi.org/10.18653/v1/2021.eacl-demos.19.

YOUNG, P., LAI, A.; HODOSH, M.; HOCKENMAIER, J. From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference Over Event Descriptions. Transactions of the Association for Computational Linguistics, v. 2, 67-78, 2014.

Multimodal representations of content in the journalistic genre

gains and challenges of expanding ReINVenTA datasets

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Information

Language