Proposta de algoritmo de classificação automática de papéis semânticos em português no âmbito do modelo Abstract Meaning Representation

Jackson Wilke da Cruz Souza; Pedro Semcovici; Thiago Alexandre Salgueiro Pardo

doi:10.1590/1983-3652.2025.55346

Autores

Jackson Wilke da Cruz Souza Universidade Federal da Bahia,Instituto de Ciência, Tecnologia e Inovação, Camaçari, BA, Brasil https://orcid.org/0000-0003-1881-6780
Pedro Semcovici Universidade de São Paulo, Escola de Artes, Ciências e Humanidades, São Paulo, SP, Brasil https://orcid.org/0009-0008-8455-8509
Thiago Alexandre Salgueiro Pardo Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, São Carlos, SP, Brasil https://orcid.org/0000-0003-2111-1319

DOI:

https://doi.org/10.1590/1983-3652.2025.55346

Palavras-chave:

Papeis semânticos, Abstract Meaning Representation, Processamento de Linguagem Natural

Resumo

O nível semântico em Processamento de Linguagem Natural (PLN) apresenta desafios significativos devido à complexidade dos fenômenos, que são menos suscetíveis a descrições objetivas. Nem todas as abordagens linguísticas, como o modelo teórico de papéis semânticos proposto por Cançado e Amaral (2017), são facilmente implementáveis em sistemas computacionais devido à sua variabilidade terminológica e metodológica. O modelo Abstract Meaning Representation (AMR) (Banarescu et al., 2013; Weischedel et al., 2013) tem se destacado por oferecer uma representação clara da estrutura argumental, proporcionando explicabilidade tanto para humanos quanto para sistemas computacionais sobre como o sentido se organiza em sentenças de línguas naturais. Baseando-se no AMR, desenvolvemos um classificador automático de papéis semânticos. Utilizando técnicas de Aprendizado de Máquina, nosso classificador foi treinado e testado em um corpus multigênero em Português do Brasil. Realizamos dois experimentos: o primeiro comparando Argumentos 0 e 1, e o segundo comparando Argumentos de 0 a 4, obtendo melhores resultados no primeiro experimento. Os resultados ressaltam a importância da aplicação de modelos semânticos em PLN para o português e abrem possibilidades para novas iniciativas de pesquisas.

Downloads

Os dados de download ainda não estão disponíveis.

Referências

ALVA-MANCHEGO, Fernando Emilio; ROSA, João Luís G. Semantic Role Labeling for Brazilian Portuguese: A Benchmark. In: PAVÓN, Juan; DUQUE-MÉNDEZ, Néstor D.; FUENTES-FERNÁNDEZ, Rubén (ed.). Advances in Artificial Intelligence–IBERAMIA 2012: 13th Ibero-American Conference on AI. Cartagena de Indias, Colombia: Springer Berlin Heidelberg, 2012. p. 481–490. DOI: 10.1007/978-3-642-34654-5_49.

BANARESCU, Laura et al. Abstract Meaning Representation for Sembanking. In: PROCEEDINGS of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. Sofia, Bulgaria: Association for Computational Linguistics, 2013. p. 178–186.

CAI, Shu; KNIGHT, Kevin. Smatch: An Evaluation Metric for Semantic Feature Structures. In: SCHUETZE, Hinrich; FUNG, Pascale; POESIO, Massimo (ed.). Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Sofia, Bulgaria: Association for Computational Linguistics, 2013. p. 748–752.

CAMACHO, R.G. Estrutura Argumental e Funções Semânticas. Alfa, São Paulo, v. 43, p. 145–170, 1999.

CANÇADO, Márcia. Argumentos: Complementos e Adjuntos. ALFA: Revista de Linguística, v. 53, n. 1, p. 35–59, 2009.

CANÇADO, Márcia. Verbos Psicológicos: Uma Classe Relevante Gramaticalmente? Veredas-Revista de Estudos Linguísticos, v. 16, n. 2, p. 1–18, 2012.

CANÇADO, Márcia; AMARAL, Luana. Introdução à Semântica Lexical: Papéis Temáticos, Aspecto Lexical e Decomposição de Predicados. [S. l.]: Editora Vozes Limitada, 2017.

CANÇADO, Márcia; GONÇALVES, Anabela. Lexical Semantics: Verb Classes and Alternations. In: WETZELS, Leo; COSTA, João; MENUZZI, Sergio (ed.). The Handbook of Portuguese Linguistics. [S. l.: s. n.], 2016. p. 374–391. DOI: 10.1002/9781118791844.ch20.

CASELI, Helena de Medeiros; NUNES, Maria das Graças Volpe; PAGANO, Adriana. O que é PLN? In: CASELI, Helena de Medeiros; NUNES, Maria das Graças Volpe (ed.). Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português. [S. l.]: Bpln, 2023. ISBN 978-65-00-80693-9. Disponível em: https://brasileiraspln.com/livro-pln/1a-edicao/parte1/cap1/cap1.html.

CHAFE, W.L. Meaning and the Structure of Language. Chicago, USA: University of Chicago Press, 1970.

CHEN, T.; GUESTRIN, C. XGBoost: A Scalable Tree Boosting System. In: KRISHNAPURAM, Balaji; SHAH, Mohak; SMOLA, Alexander J.; AGGARWAL, Charu; SHEN, Dou; RASTOGI, Rajeev (ed.). Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: Acm, 2016. p. 785–794. DOI: 10.1145/2939672. Disponível em: http://doi.acm.org/10.1145/2939672.

DAMONTE, Marco; COHEN, Shay B. Structural Neural Encoders for AMR-to-Text Generation. In: BURSTEIN, Jill; DORAN, Christy; SOLORIO, Thamar (ed.). Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota: Association for Computational Linguistics (ACL), 2019.

DURAN, Magali Sanches; ALUÍSIO, Sandra Maria. PropBank-Br: A Brazilian Treebank Annotated with Semantic Role Labels. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid; DECLERCK, Thierry; DOĞAN, Mehmet Uğur; MAEGAARD, Bente; MARIANI, Joseph; MORENO, Asuncion; ODIJK, Jan; PIPERIDIS, Stelios (ed.). Proceedings of the Eighth International Conference on Language Resources and Evaluation. Istanbul, Turkey: European Language Resources Association, 2012. p. 1862–1867.

FILLMORE, C.J. Lexical Entries for Verbs. Foundations of Language, v. 4, p. 373–393, 1968.

FONSECA, Erick R.; ROSA, João Luís Garcia. A Two-Step Convolutional Neural Network Approach for Semantic Role Labeling. In: THE 2013 International Joint Conference on Neural Networks. Dallas, USA: Ieee, 2013. p. 1–7.

FREITAS, Cláudia; SALGUEIRO PARDO, Thiago Alexandre. PropBank e Anotação de Papéis Semânticos para a Língua Portuguesa: O que Há de Novo? In: ANAIS do 15º Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL). Belém: Sociedade Brasileira de Computação, 2024. p. 118–128. DOI: 10.5753/stil.2024.245377. Disponível em: https://sol.sbc.org.br/index.php/stil/article/view/31123.

GERALDI, J.W.; ILARI, R. Semântica. São Paulo: Ática, 1987. v. 3.

GILDEA, Daniel; JURAFSKY, Daniel. Automatic Labeling of Semantic Roles. Computational Linguistics, v. 28, n. 3, p. 245–288, 2002.

HALLIDAY, M. Some Notes on ’Deep’ Grammar. Journal of Linguistics, v. 2, n. 1, p. 57–67, 1966.

HARTMANN, Nathan Siegle. Anotação Automática de Papéis Semânticos de Textos Jornalísticos e de Opinião sobre Árvores Sintáticas Não Revisadas. 2015. Dissertação (Mestrado em Ciências de Computação e Matemática Computacional) – Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, Brasil. DOI: 10.11606/D.55.2015.tde-27112015-140053.

HARTMANN, Nathan Siegle; DURAN, Magali Sanches; ALUÍSIO, Sandra Maria. Automatic Semantic Role Labeling on Non-Revised Syntactic Trees of Journalistic Texts. In: SILVA, João; RIBEIRO, Ricardo; QUARESMA, Paulo; ADAMI, André; BRANCO, António (ed.). Computational Processing of the Portuguese Language. Cham: Springer International Publishing, 2017. p. 202–212. ISBN 978-3-319-41552-9. DOI: 10.1007/978-3-319-41552-9_20.

ILMY, Adylan Roaffa; KHODRA, Masayu Leylia. Parsing Indonesian Sentence into Abstract Meaning Representation using Machine Learning Approach. In: 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA). [S. l.: s. n.], 2020. p. 1–6. DOI: 10.1109/icaicta49861.2020.9429051.

JACKENDOFF, Ray. Toward an Explanatory Semantic Representation. Linguistic Inquiry, The MIT Press, Cambridge, USA, v. 7, n. 1, p. 89–150, 1976. Disponível em: http://www.jstor.org/stable/4177913.

JURAFSKY, D.; MARTIN, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3. ed. [S. l.: s. n.], 2023.

LEMAÎTRE, Guillaume; NOGUEIRA, Fernando; ARIDAS, Christos K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research, v. 18, p. 1–5, 2017.

LIMA INÁCIO, Marcio; SOBREVILLA CABEZUDO, Marco Antonio; RAMISCH, Renata; DI FELIPPO, Ariani; SALGUEIRO PARDO, Thiago Alexandre. The AMR-PT Corpus and the Semantic Annotation of Challenging Sentences from Journalistic and Opinion Texts. DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada, v. 39, e202339355159, 2023. DOI: 10.1590/1678-460x202339355159. Disponível em: https://revistas.pucsp.br/index.php/delta/article/view/55159.

LUNDBERG, Scott M.; LEE, Su-In. A Unified Approach to Interpreting Model Predictions. In: PROCEEDINGS of the 31st Conference on Neural Information Processing Systems. Long beach,California, USA: Curran Associates, 2017. v. 30, p. 4768–4777.

MIGUELES-ABRAIRA, Noelia; AGERRI, Rodrigo; DIAZ DE ILARRAZA, Arantza. Annotating Abstract Meaning Representations for Spanish. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid;

CIERI CHRISTOPHER ANDDECLERCK, Thierry; GOGGI, Sara; HASIDA, Koiti; ISAHARA, Hitoshi; MAEGAARD, Bente; MARIANI, Joseph; MAZO, Hélène; MORENO, Asuncion; ODIJK, Jan; PIPERIDIS, Stelios; TOKUNAGA, Takenobu (ed.). Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki, Japan: European Language Resources Association, 2018. p. 3074–3078.

PALMER, Martha; GILDEA, Daniel; KINGSBURY, Paul. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, v. 31, n. 1, p. 71–106, 2005.

RODRIGUES, Roana; SOUZA, Jackson Wilke da Cruz; SANTOS, Roney Lira de Sales. Descrição Linguística e Aprendizado de Máquina: Análise de Verbos Locativos do Espanhol. Cadernos de Estudos Linguísticos, v. 64, n. 00, e022038, 2022. DOI: 10.20396/cel.v64i00.8666995.

SPACY. Industrial-Strength Natural Language Processing. [S. l.: s. n.], 2024. https://spacy.io. Acesso em: 20 jul. 2024.

TORRES ANCHIÊTA, Rafael; SALGUEIRO PARDO, Thiago Alexandre. Towards AMR-BR: A Sembank for Brazilian Portuguese Language. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid; CIERI, Christopher; DECLERCK, Thierry; GOGGI, Sara; HASIDA, Koiti; ISAHARA, Hitoshi; MAEGAARD, Bente; MARIANI, Joseph; MAZO, Hélène; MORENO, Asuncion; ODIJK, Jan; PIPERIDIS, Stelios; TOKUNAGA, Takenobu (ed.). Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki, Japan: European Language Resources Association (ELRA), 2018. p. 974–979.

TORRES ANCHIÊTA, Rafael; SALGUEIRO PARDO, Thiago Alexandre. Análise Semântica com Base em AMR para o Português. Linguamática, v. 14, n. 1, p. 33–48, 2022. DOI: 10.21814/lm.14.1.358. Disponível em: https://linguamatica.com/index.php/linguamatica/article/view/358.

VANDERWENDE, Lucy; MENEZES, Arul; QUIRK, Chris. An AMR Parser for English, French, German, Spanish and Japanese and a New AMR-annotated Corpus. In: GERBER, Matt; HAVASI, Catherine; LACATUSU, Finley (ed.). Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Denver, Colorado, USA: Association for Computational Linguistics, 2015. p. 26–30. DOI: 10.3115/v1/N15-3006.

WEISCHEDEL, Ralph et al. OntoNotes Release 5.0 LDC2013T19. Philadelphia, USA: Linguistic Data Consortium, 2013.

XUE, Nianwen; BOJAR, Ondřej; HAJIČ, Jan; PALMER, Martha; UREŠOVÁ, Zdeňka; ZHANG, Xiuhong. Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech. In: CALZOLARI, Nicoletta; CHOUKRI, Khalid; DECLERCK, Thierry; LOFTSSON, Hrafn; MAEGAARD, Bente; MARIANI, Joseph; MORENO, Asuncion; ODIJK, Jan; PIPERIDIS, Stelios (ed.). Proceedings of the Ninth International Conference on Language Resources and Evaluation. Reykjavik, Iceland: European Language Resources Association, 2014. p. 1765–1772.

ZHANG, Sheng; MA, Xutai; DUH, Kevin; VAN DURME, Benjamin. AMR Parsing as Sequence-to-Graph Transduction. In: KORHONEN, Anna; TRAUM, David; MÀRQUEZ, Lluís (ed.). Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019. p. 80–94. DOI: 10.18653/v1/P19-1009. Disponível em: https://aclanthology.org/P19-1009/.