Uso de sintagmas nominais na classificação automática de documentos eletrônicos

Luiz Cláudio Gomes Maia; Renato Rocha Souza

USE OF NOUN PHRASES IN AUTOMATIC CLASSIFICATION OF ELECTRONIC DOCUMENTS

Authors

Luiz Cláudio Gomes Maia
Renato Rocha Souza

Abstract

This research work presents a proposal for the classification of electronic documents using techniques and algorithms based on natural language processing and noun phrases indexing along with plain keywords. Two tools, OGMA and Weka, were used for the experiments proposed. OGMA was developed by the author to automate the extraction of noun phrases and to perform the calculation of the weight of each term in the process of document indexing for each of the six proposed methods. The WEKA was used to analyze the OGMA results using the algorithms of clustering and classification "Simplekmeans" and "NaiveBayes", respectively. This process resulted in a percentage value indicating how many documents were classified correctly. The best performing methods were those with the terms without stopwords and the classified and scored noun phrases.

Downloads

Download data is not yet available.

Downloads

PDF (Português (Brasil))

Published

2010-03-24

How to Cite

Maia, L. C. G., & Souza, R. R. (2010). USE OF NOUN PHRASES IN AUTOMATIC CLASSIFICATION OF ELECTRONIC DOCUMENTS. Perspectivas Em Ciência Da Informação, 15(1), 154–172. Retrieved from https://periodicos.ufmg.br/index.php/pci/article/view/22418

Download Citation

Issue

Vol. 15 No. 1 (2010)

Section

Articles

USE OF NOUN PHRASES IN AUTOMATIC CLASSIFICATION OF ELECTRONIC DOCUMENTS

Authors

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Pci

Social Media

Language

Information

Scimago Lab

Digital preservation

Indexers

Keywords