Omitted subjects revealed

a quantitative-descriptive approach

Authors

  • Cláudia Freitas Pontifícia Universidade Católica do Rio de Janeiro
  • Elvis de Souza Pontifícia Universidade Católica do Rio de Janeiro

DOI:

https://doi.org/10.17851/2237-2083.29.2.1033-1058

Keywords:

linguistic description, omitted subject, syntactic dependencies, computational linguistics, machine learning, corpus linguistics

Abstract

In this paper, we present descriptive and computational studies related to omitted subjects. Firstly, we develop a quantitative descriptive study based on three corpora, which consist of journalistic, literary and encyclopedic genres. Specifically, we quantify the omitted subjects in sentences for each of these corpora; omitted subjects were found in 24%, 41% and 46% of their sentences, respectively. Secondly, applying rule-based strategies, we reconstitute those subjects and place them back to the corpora, with the goal of evaluating how much the omission of subjects can impact the automatic learning of syntactic dependencies. The results indicate that the formal subject reconstitution can enhance the learning of syntactic dependencies in up to 2% according to the CLAS metric, highlighting the relevant role of linguistic modeling in the automatic learning process.

Published

2024-10-06

Issue

Section

Thematic issue 29:2 (2021): Corpus Linguistics: Achievements and Challenges

How to Cite

Omitted subjects revealed: a quantitative-descriptive approach. Revista de Estudos da Linguagem, [S. l.], v. 29, n. 2, p. 1033–1058, 2024. DOI: 10.17851/2237-2083.29.2.1033-1058. Disponível em: https://periodicos.ufmg.br/index.php/relin/article/view/54371. Acesso em: 26 dec. 2024.