Computational tools development for the dialectal and lexicographical data processing

Authors

DOI:

https://doi.org/10.1590/1983-3652.2023.42302

Keywords:

Dialectology, Lexicography, Computational tools, Programming languages, Database

Abstract

This paper is situated at the intersection of Corpus Linguistics (O’KEEFFE; MCCARTHY, 2010); Computational Linguistics (KEDIA; RASU, 2020; SRINIVASA-DESIKAN, 2018; MANNING, 2008; MANNING; SCHUTZE, 1999; CHOMSKY, 1965); Dialectology (CARDOSO, 2010; RADTKE; THUN, 1996; CHAMBERS; TRUDGILL, 1994) and Lexicography (TARP, 2008, 2011, 2015; FUERTES-OLIVEIRA; BERGENHOLTZ, 2015; LEROYER, 2011). It aims to present the development of computational tools capable of processing dialectal and lexicographic data using a methodology that does not require the hiring of programming services, inviting the researcher to study the necessary computer resources to perform an automatic manipulation of information in a database. For this purpose, the corpus used was Atlas Linguístico do Brazil Project (COMITÊ NACIONAL DO PROJETO ALIB, 2001) relating to the interior municipalities from the ALiB, network, pointed out in the country’s North region. The construction of these small programs was mainly motivated by two reasons: i) provide lexicographical and electronic treatment to ALiB dialect data; ii) develop their own computational tools to meet the Doctoral research goals in progress, to which this article is linked. Thus, a database in Extensible Markup Language (XML) was built to store dialectal information in lexicographical format, and through the execution of code lines, it was possible to electronically retrieve specific data from the corpus and filter the results based on 'gender', 'age', and 'location' variants present in the data from the ALiB corpus.

References

CARDOSO, Suzana Alice Marcelino. A dialetologia e os estudos da variação linguística. In: CARDOSO, Suzana Alice Marcelino (Ed.). Geolinguística - tradição e modernidade. São Paulo: Parábola Editorial, 2010. p. 15–30.

CARDOSO, Suzana Alice Marcelino et al. Atlas linguı́stico do Brasil: Cartas Linguísticas 1. Londrina: EDUEL, 2014. v. 2.

CHAMBERS, Jack; TRUDGILL, Peter. La dialectología. Madrid: Visor Libros, 1994.

CHOMSKY, Noam. Aspects of the theory of syntax. Cambridge: MA: MIT Press, 1965.

COMITÊ NACIONAL DO PROJETO ALIB. Atlas Lingüístico do Brasil: questionário 2001. Londrina: EDUEL, 2001.

CORREIA DE SOUSA, Cemary. Vocabulário dialetal da região norte do Brasil: um estudo das capitais com base nos dados do projeto ALIB. 2019. 134 f. Mestrado em Língua e Cultura – Universidade Federal da Bahia, Salvador.

COSTA, Daniela de Souza Silva. Vocabulário Dialetal do Centro-Oeste: interfaces entre a Lexicografia e a Dialetologia. 2018. 353 f. Doutorado em Estudos da Linguagem – Universidade Estadual de Londrina, Londrina.

FUERTES-OLIVEIRA, Pedro Antonio; BERGENHOLTZ, Henning. Introduction: The Construction of Internet Dictionaries. In: FUERTES-OLIVEIRA, Pedro Antonio; BERGENHOLTZ, Henning (Ed.). e-Lexicography: The Internet, Digital Initiative and Lexicography. London/New York: Continuum, 2011. p. 1–16.

FUERTES-OLIVEIRA, Pedro Antonio; BERGENHOLTZ, Henning. Los Diccionarios en Línea de Español “Universidad de Valladolid.” Estudios de Lexicografía. Revista Mensual del grupo de las dos vidas de las palabras, n. 4, p. 71–98, jun. 2015. Disponível em: https://issuu.com/ldvp/docs/elex_4-_def. Acesso em: 2 ago. 2022.

KEDIA, Aman; RASU, Mayank. Hands-on Python natural language processing: explore tools and techniques to analyze and process text with a view to building real-world NLP applications. Birmingham: Packt Publishing Ltd, 2020.

LEROYER, Patrick. Change of paradigm: from Linguistics to Information Science and from dictionaries to lexicographic information tools. In: FUERTES-OLIVEIRA, Pedro Antonio; BERGENHOLTZ, Henning (Ed.). e-Lexicography: The Internet, Digital Initiative and Lexicography. London/New York: Continuum, 2011. p. 121–140.

MACHADO FILHO, Américo Venâncio Lopes. Um ponto de interseção para a dialectologia e a lexicografia: a proposição de um dicionário dialetal brasileiro com base nos dados do ALiB. Estudos Linguı́sticos e Literários, v. 41, p. 49–70, 2010.

MANNING, Christopher D. Introduction to information retrieval. Cambridge: Cambridge University Press, 2008.

MANNING, Christopher D; SCHUTZE, Hinrich. Foundations of statistical natural language processing. Cambridge: MIT press, 1999.

MARAMALDO FERREIRA, Camila. Vocabulário Dialetal Maranhense: a contribuição do Maranhão para o Dicionário Dialetal Brasileiro 2019. 2019. 119 f. Mestrado em Letras – Universidade Federal do Maranhão, São Luís.

NEIVA, Isamar. Vocabulário Dialetal Baiano. 2017. 270 f. Doutorado em Língua e Cultura – Universidade Federal da Bahia, Salvador.

O’KEEFFE, Anne; MCCARTHY, Michael. What are corpora and how have they evolved? In: O’KEEFFE, Anne; MCCARTHY, Michael (Ed.). The Routledge handbook of corpus linguistics. London/New York: Routledge, 2010. p. 3–10.

RADTKE, Edgar; THUN, Harald. Nuevos caminos de la geolinguística románica. In: RADTKE, Edgar; THUN, Harald (Ed.). Neue Wege der Romanischen Geolinguistik. Kiel: Westensee-Verlag, 1996. p. 25–49.

SRINIVASA-DESIKAN, Bhargav. Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Birmingham: Packt, 2018.

TARP, Sven. Lexicography in the borderland between knowledge and non-knowledge: General Lexicographical Theory with Particular Focus on Learner’s Lexicography. Tübingen: Niemeyer, 2008.

TARP, Sven. Lexicographical and other e-tools for consultation purposes: towards the individualization of needs satisfaction. In: FUERTES-OLIVEIRA, Pedro Antonio; BERGENHOLTZ, Henning (Ed.). e-Lexicography: The Internet, Digital Initiative and Lexicography. London/New York: Continuum, 2011. p. 54–70.

TARP, Sven. La teorı́a funcional en pocas palabras. Estudios de Lexicografı́a. Revista Mensual del grupo de las dos vidas de las palabras, v. 4, p. 31–42, 2015. Disponível em: https://issuu.com/ldvp/docs/elex_4-_def. Acesso em: 2 ago. 2022.

Published

2023-04-11

How to Cite

Computational tools development for the dialectal and lexicographical data processing. Texto Livre, Belo Horizonte-MG, v. 16, p. e42302, 2023. DOI: 10.1590/1983-3652.2023.42302. Disponível em: https://periodicos.ufmg.br/index.php/textolivre/article/view/42302. Acesso em: 18 dec. 2024.

Similar Articles

1-10 of 509

You may also start an advanced similarity search for this article.