Modelagem de tópicos: Resumir e organizar corpus de dados por meio de algoritmos de aprendizagem de máquina

Marcos de Souza; Renato Rocha Souza

Topic modeling

Summarize and organize data corpus using machine learning algorithms

Authors

Marcos de Souza Universidade Federal de Minas Gerais
Renato Rocha Souza Universidade Federal de Minas Gerais

Keywords:

Modeling topics, Machine learning, Latent Dirichlet allocation, Latent semantic indexing

Abstract

The research compares the results and performance of the Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) models of Machine Learning when applied Topic Modeling in documents of formal channels of scientific communication, consisting of 2006 scientific articles and expanded abstracts from the XIII to the XVII National Meeting of Research in Information Science (ENANCIB). The steps of empirical research are the collection of data for the constitution, cleaning, manipulation, combination, normalization, treatment and transformation of data from the corpus to connect to machine learning models. The models summarized and organized the data corpus into topics that are made up of terms and weights. The LSI model presented a greater variety between the terms and weights contained in each topic, different from the LDA model which presented a greater similarity in the results, thus making it easier for the domain specialist to create the assumption for the names of the topics.

Downloads

Download data is not yet available.

Downloads

PDF (Português (Brasil))

Published

2020-01-31

How to Cite

SOUZA , M. de; SOUZA , R. R. Topic modeling: Summarize and organize data corpus using machine learning algorithms. Múltiplos Olhares em Ciência da Informação , Belo Horizonte, v. 9, n. 2, 2020. Disponível em: https://periodicos.ufmg.br/index.php/moci/article/view/19138. Acesso em: 21 nov. 2024.

Download Citation

Issue

Vol. 9 No. 2 (2019): PPGGOG - Discentes

Section

Artigos

License

Autores que publicam na Revista Múltiplos Olhares em Ciência da Informação mantêm os direitos autorais e concedem à revista o direito de primeira publicação, com o trabalho simultaneamente licenciado sob a Licença Creative Commons Attribution que permite o compartilhamento do trabalho com reconhecimento da autoria e publicação inicial nesta revista. Contratos adicionais poderão ser assumidos, separadamente, pelos autores, para distribuição não-exclusiva da versão do trabalho publicada nesta revista (exemplo: publicar em repositório institucional ou como capítulo de livro), com reconhecimento de autoria e publicação inicial nesta revista.

Topic modeling

Summarize and organize data corpus using machine learning algorithms

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Language

Make a Submission

Information

Developed By

INDEXERS

CONTATO

Keywords

Topic modeling

Summarize and organize data corpus using machine learning algorithms

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Language

Make a Submission

Information

Developed By

INDEXERS

SOCIAL NETWORK

CONTATO

Keywords