Omitted subjects revealed
a quantitative-descriptive approach
DOI:
https://doi.org/10.17851/2237-2083.29.2.1033-1058Keywords:
linguistic description, omitted subject, syntactic dependencies, computational linguistics, machine learning, corpus linguisticsAbstract
In this paper, we present descriptive and computational studies related to omitted subjects. Firstly, we develop a quantitative descriptive study based on three corpora, which consist of journalistic, literary and encyclopedic genres. Specifically, we quantify the omitted subjects in sentences for each of these corpora; omitted subjects were found in 24%, 41% and 46% of their sentences, respectively. Secondly, applying rule-based strategies, we reconstitute those subjects and place them back to the corpora, with the goal of evaluating how much the omission of subjects can impact the automatic learning of syntactic dependencies. The results indicate that the formal subject reconstitution can enhance the learning of syntactic dependencies in up to 2% according to the CLAS metric, highlighting the relevant role of linguistic modeling in the automatic learning process.