Proposal of an algorithm for automatic classification of semantic roles in Portuguese within the Abstract Meaning Representation model

Authors

DOI:

https://doi.org/10.1590/1983-3652.2025.55346

Keywords:

Semantic Roles, Abstract Meaning Representation, Natural Language Processing

Abstract

Semantic level in Natural Language Processing (NLP) presents significant challenges due to the phenomena’s complexity, which are less amenable to objective description. Not all linguistic approaches, such as the semantic role theory proposed by Cançado and Amaral (2017), can be easily implemented in computational systems due to their terminological and methodological variability. The Abstract Meaning Representation (AMR) model (Banarescu et al., 2013; Weischedel et al., 2013) has gained prominence for providing a clear representation of argument structure, offering explicability both for humans and computational systems on how meaning is organized in sentences of natural languages. Based on AMR, we developed an automatic semantic role classifier. Using Machine Learning techniques, our classifier was trained and tested on a multigenre corpus in Brazilian Portuguese. We conducted two experiments: the first comparing Arguments 0 and 1, and the second comparing Arguments 0 to 4, achieving better results in the former. The results highlight the importance of applying semantic models in NLP for Portuguese and open possibilities for new research initiatives.

Downloads

Download data is not yet available.

References

ALVA-MANCHEGO, Fernando Emilio; ROSA, João Luís G. Semantic Role Labeling for Brazilian Portuguese: A Benchmark. In: PAVÓN, Juan; DUQUE-MÉNDEZ, Néstor D.; FUENTES-FERNÁNDEZ, Rubén (ed.). Advances in Artificial Intelligence–IBERAMIA 2012: 13th Ibero-American Conference on AI. Cartagena de Indias, Colombia: Springer Berlin Heidelberg, 2012. p. 481–490. DOI: 10.1007/978-3-642-34654-5_49.

BANARESCU, Laura et al. Abstract Meaning Representation for Sembanking. In: PROCEEDINGS of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. Sofia, Bulgaria: Association for Computational Linguistics, 2013. p. 178–186.

CAI, Shu; KNIGHT, Kevin. Smatch: An Evaluation Metric for Semantic Feature Structures. In: SCHUETZE, Hinrich; FUNG, Pascale; POESIO, Massimo (ed.). Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Sofia, Bulgaria: Association for Computational Linguistics, 2013. p. 748–752.

CAMACHO, R.G. Estrutura Argumental e Funções Semânticas. Alfa, São Paulo, v. 43, p. 145–170, 1999.

CANÇADO, Márcia. Argumentos: Complementos e Adjuntos. ALFA: Revista de Linguística, v. 53, n. 1, p. 35–59, 2009.

CANÇADO, Márcia. Verbos Psicológicos: Uma Classe Relevante Gramaticalmente? Veredas-Revista de Estudos Linguísticos, v. 16, n. 2, p. 1–18, 2012.

CANÇADO, Márcia; AMARAL, Luana. Introdução à Semântica Lexical: Papéis Temáticos, Aspecto Lexical e Decomposição de Predicados. [S. l.]: Editora Vozes Limitada, 2017.

CANÇADO, Márcia; GONÇALVES, Anabela. Lexical Semantics: Verb Classes and Alternations. In: WETZELS, Leo; COSTA, João; MENUZZI, Sergio (ed.). The Handbook of Portuguese Linguistics. [S. l.: s. n.], 2016. p. 374–391. DOI: 10.1002/9781118791844.ch20.

CASELI, Helena de Medeiros; NUNES, Maria das Graças Volpe; PAGANO, Adriana. O que é PLN? In: CASELI, Helena de Medeiros; NUNES, Maria das Graças Volpe (ed.). Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português. [S. l.]: Bpln, 2023. ISBN 978-65-00-80693-9. Disponível em: https://brasileiraspln.com/livro-pln/1a-edicao/parte1/cap1/cap1.html.

CHAFE, W.L. Meaning and the Structure of Language. Chicago, USA: University of Chicago Press, 1970.

CHEN, T.; GUESTRIN, C. XGBoost: A Scalable Tree Boosting System. In: KRISHNAPURAM, Balaji; SHAH, Mohak; SMOLA, Alexander J.; AGGARWAL, Charu; SHEN, Dou; RASTOGI, Rajeev (ed.). Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: Acm, 2016. p. 785–794. DOI: 10.1145/2939672. Disponível em: http://doi.acm.org/10.1145/2939672.

DAMONTE, Marco; COHEN, Shay B. Structural Neural Encoders for AMR-to-Text Generation. In: BURSTEIN, Jill; DORAN, Christy; SOLORIO, Thamar (ed.). Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota: Association for Computational Linguistics (ACL), 2019.

DURAN, Magali Sanches; ALUÍSIO, Sandra Maria. PropBank-Br: A Brazilian Treebank Annotated with Semantic Role Labels. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid; DECLERCK, Thierry; DOĞAN, Mehmet Uğur; MAEGAARD, Bente; MARIANI, Joseph; MORENO, Asuncion; ODIJK, Jan; PIPERIDIS, Stelios (ed.). Proceedings of the Eighth International Conference on Language Resources and Evaluation. Istanbul, Turkey: European Language Resources Association, 2012. p. 1862–1867.

FILLMORE, C.J. Lexical Entries for Verbs. Foundations of Language, v. 4, p. 373–393, 1968.

FONSECA, Erick R.; ROSA, João Luís Garcia. A Two-Step Convolutional Neural Network Approach for Semantic Role Labeling. In: THE 2013 International Joint Conference on Neural Networks. Dallas, USA: Ieee, 2013. p. 1–7.

FREITAS, Cláudia; SALGUEIRO PARDO, Thiago Alexandre. PropBank e Anotação de Papéis Semânticos para a Língua Portuguesa: O que Há de Novo? In: ANAIS do 15º Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL). Belém: Sociedade Brasileira de Computação, 2024. p. 118–128. DOI: 10.5753/stil.2024.245377. Disponível em: https://sol.sbc.org.br/index.php/stil/article/view/31123.

GERALDI, J.W.; ILARI, R. Semântica. São Paulo: Ática, 1987. v. 3.

GILDEA, Daniel; JURAFSKY, Daniel. Automatic Labeling of Semantic Roles. Computational Linguistics, v. 28, n. 3, p. 245–288, 2002.

HALLIDAY, M. Some Notes on ’Deep’ Grammar. Journal of Linguistics, v. 2, n. 1, p. 57–67, 1966.

HARTMANN, Nathan Siegle. Anotação Automática de Papéis Semânticos de Textos Jornalísticos e de Opinião sobre Árvores Sintáticas Não Revisadas. 2015. Dissertação (Mestrado em Ciências de Computação e Matemática Computacional) – Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, Brasil. DOI: 10.11606/D.55.2015.tde-27112015-140053.

HARTMANN, Nathan Siegle; DURAN, Magali Sanches; ALUÍSIO, Sandra Maria. Automatic Semantic Role Labeling on Non-Revised Syntactic Trees of Journalistic Texts. In: SILVA, João; RIBEIRO, Ricardo; QUARESMA, Paulo; ADAMI, André; BRANCO, António (ed.). Computational Processing of the Portuguese Language. Cham: Springer International Publishing, 2017. p. 202–212. ISBN 978-3-319-41552-9. DOI: 10.1007/978-3-319-41552-9_20.

ILMY, Adylan Roaffa; KHODRA, Masayu Leylia. Parsing Indonesian Sentence into Abstract Meaning Representation using Machine Learning Approach. In: 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA). [S. l.: s. n.], 2020. p. 1–6. DOI: 10.1109/icaicta49861.2020.9429051.

JACKENDOFF, Ray. Toward an Explanatory Semantic Representation. Linguistic Inquiry, The MIT Press, Cambridge, USA, v. 7, n. 1, p. 89–150, 1976. Disponível em: http://www.jstor.org/stable/4177913.

JURAFSKY, D.; MARTIN, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3. ed. [S. l.: s. n.], 2023.

LEMAÎTRE, Guillaume; NOGUEIRA, Fernando; ARIDAS, Christos K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research, v. 18, p. 1–5, 2017.

LIMA INÁCIO, Marcio; SOBREVILLA CABEZUDO, Marco Antonio; RAMISCH, Renata; DI FELIPPO, Ariani; SALGUEIRO PARDO, Thiago Alexandre. The AMR-PT Corpus and the Semantic Annotation of Challenging Sentences from Journalistic and Opinion Texts. DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada, v. 39, e202339355159, 2023. DOI: 10.1590/1678-460x202339355159. Disponível em: https://revistas.pucsp.br/index.php/delta/article/view/55159.

LUNDBERG, Scott M.; LEE, Su-In. A Unified Approach to Interpreting Model Predictions. In: PROCEEDINGS of the 31st Conference on Neural Information Processing Systems. Long beach,California, USA: Curran Associates, 2017. v. 30, p. 4768–4777.

MIGUELES-ABRAIRA, Noelia; AGERRI, Rodrigo; DIAZ DE ILARRAZA, Arantza. Annotating Abstract Meaning Representations for Spanish. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid;

CIERI CHRISTOPHER ANDDECLERCK, Thierry; GOGGI, Sara; HASIDA, Koiti; ISAHARA, Hitoshi; MAEGAARD, Bente; MARIANI, Joseph; MAZO, Hélène; MORENO, Asuncion; ODIJK, Jan; PIPERIDIS, Stelios; TOKUNAGA, Takenobu (ed.). Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki, Japan: European Language Resources Association, 2018. p. 3074–3078.

PALMER, Martha; GILDEA, Daniel; KINGSBURY, Paul. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, v. 31, n. 1, p. 71–106, 2005.

RODRIGUES, Roana; SOUZA, Jackson Wilke da Cruz; SANTOS, Roney Lira de Sales. Descrição Linguística e Aprendizado de Máquina: Análise de Verbos Locativos do Espanhol. Cadernos de Estudos Linguísticos, v. 64, n. 00, e022038, 2022. DOI: 10.20396/cel.v64i00.8666995.

SPACY. Industrial-Strength Natural Language Processing. [S. l.: s. n.], 2024. https://spacy.io. Acesso em: 20 jul. 2024.

TORRES ANCHIÊTA, Rafael; SALGUEIRO PARDO, Thiago Alexandre. Towards AMR-BR: A Sembank for Brazilian Portuguese Language. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid; CIERI, Christopher; DECLERCK, Thierry; GOGGI, Sara; HASIDA, Koiti; ISAHARA, Hitoshi; MAEGAARD, Bente; MARIANI, Joseph; MAZO, Hélène; MORENO, Asuncion; ODIJK, Jan; PIPERIDIS, Stelios; TOKUNAGA, Takenobu (ed.). Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki, Japan: European Language Resources Association (ELRA), 2018. p. 974–979.

TORRES ANCHIÊTA, Rafael; SALGUEIRO PARDO, Thiago Alexandre. Análise Semântica com Base em AMR para o Português. Linguamática, v. 14, n. 1, p. 33–48, 2022. DOI: 10.21814/lm.14.1.358. Disponível em: https://linguamatica.com/index.php/linguamatica/article/view/358.

VANDERWENDE, Lucy; MENEZES, Arul; QUIRK, Chris. An AMR Parser for English, French, German, Spanish and Japanese and a New AMR-annotated Corpus. In: GERBER, Matt; HAVASI, Catherine; LACATUSU, Finley (ed.). Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Denver, Colorado, USA: Association for Computational Linguistics, 2015. p. 26–30. DOI: 10.3115/v1/N15-3006.

WEISCHEDEL, Ralph et al. OntoNotes Release 5.0 LDC2013T19. Philadelphia, USA: Linguistic Data Consortium, 2013.

XUE, Nianwen; BOJAR, Ondřej; HAJIČ, Jan; PALMER, Martha; UREŠOVÁ, Zdeňka; ZHANG, Xiuhong. Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid; DECLERCK, Thierry; LOFTSSON, Hrafn; MAEGAARD, Bente; MARIANI, Joseph; MORENO, Asuncion; ODIJK, Jan; PIPERIDIS, Stelios (ed.). Proceedings of the Ninth International Conference on Language Resources and Evaluation. Reykjavik, Iceland: European Language Resources Association, 2014. p. 1765–1772.

ZHANG, Sheng; MA, Xutai; DUH, Kevin; VAN DURME, Benjamin. AMR Parsing as Sequence-to-Graph Transduction. In: KORHONEN, Anna; TRAUM, David; MÀRQUEZ, Lluís (ed.). Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019. p. 80–94. DOI: 10.18653/v1/P19-1009. Disponível em: https://aclanthology.org/P19-1009/.

Published

2025-02-23

How to Cite

SOUZA, Jackson Wilke da Cruz; SEMCOVICI, Pedro; PARDO, Thiago Alexandre Salgueiro. Proposal of an algorithm for automatic classification of semantic roles in Portuguese within the Abstract Meaning Representation model. Texto Livre, Belo Horizonte-MG, v. 18, p. e55346, 2025. DOI: 10.1590/1983-3652.2025.55346. Disponível em: https://periodicos.ufmg.br/index.php/textolivre/article/view/55346. Acesso em: 8 dec. 2025.