Holistic Corpus-Based Dialectology
Keywords:
corpus-based dialectology, holistic approach, corpus-based dialectometry, feature aggregates, multivariate analysis, visualization techniquesAbstract
This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpus-based dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain.
Downloads
References
ALDENDERFER, M. S.; BLASHFIELD, R. K. Cluster Analysis Newbury Park, London, New Delhi: Sage Publications, 1984.
ANDERWALD, L.; SZMRECSANYI, B. Corpus linguistics and dialectology. In: LÜDELING, A.; KYTÖ, M. (Ed.). Corpus Linguistics. An International Handbook. Handbücher zur Sprache und Kommunikationswissenschaft/ Handbooks of Linguistics and Communication Science. Berlin / New York: Mouton de Gruyter, 2009.
ARPPE, A.; GILQUIN, G.; GLYNN, D.; HILPERT, M.; ZESCHEL, A. Cognitive Corpus Linguistics: Five points of debate on current theory and methodology. Corpora, v. 5, n. 2, p. 1-27, 2010.
BIBER, D. Variation across Speech and Writing Cambridge: Cambridge University Press, 1988.
BLOOMFIELD, L. Language Chicago: University of Chicago Press, 1984 [1933]
BRYANT, D.; MOULTON, V. Neighbor-Net: An Agglomerative Method for the Construction of Phylogenetic Networks. Mol. Biol. Evol., v. 21, n. 2, p. 255-265, 2004.
CYSOUW, M. New approaches to cluster analysis of typological indices. In: KÖHLER, R.; GRZBEK, P. (Ed.). Exact Methods in the Study of Language and Text Berlin, New York: Mouton de Gruyter, 2007.
DRESS, A. W. M.; HUSON, D. H. Constructing Splits Graphs. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), v. 1, n. 3, p. 109-115, 2004.
DUNTEMAN, G. H. Principal components analysis Newbury Park: Sage Publications, 1989.
EMBLETON, S. Multidimensional scaling as a dialectometrical technique: Outline of a research project. In: KÖHLER, R.; RIEGER, B. (Ed.). Contributions to quantitative linguistics Dordrecht: Kluwer, 1993.
GOEBL, H. Dialektometrie: Prinzipien und Methoden des Einsatzes der Numerischen Taxonomie im Bereich der Dialektgeographie. Wien: Österreichische Akademie der Wissenschaften, 1982.
GOEBL, H. Dialektometrische Studien: Anhand italoromanischer, rätroromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. Tübingen: Niemeyer, 1984. 3 v.
GOEBL, H. Arealtypologie und Dialektologie. In: HASPELMATH, M.; E. KÖNIG, E.; OESTERREICHER, W.; RAIBLE, W. (Ed.). Language Typology and Language Universals / La typologie des langues et les universaux linguistiques / Sprachtypologie und sprachliche Universalien: An International Handbook / Manuel international / Ein internationales Handbuch Berlin, New York: Walter de Gruyter, 2001. v. 2.
GOEBL, H. Recent Advances in Salzburg Dialectometry. Literary and Linguistic Computing, v. 21, n. 4, p. 411-435, 2006.
GOEBL, H. A bunch of dialectometric flowers: a brief introduction to dialectometry. In: SMIT, U.; DOLLINGER, S.; HÜTTNER, J.; KALTENBÖCK, G.; LUTZKY, U. (Ed.). Tracing English through time: Explorations in language variation. Wien: Braumüller, 2007.
GOEBL, H.; SCHILTZ, G. A dialectometrical compilation of CLAE 1 and CLAE 2: Isoglosses and dialect integration. In: VIERECK, W.; RAMISCH, H. (Ed.). Computer developed linguistic atlas of England (CLAE) Tübingen: Max Niemeyer Verlag, 1997. v. 2.
GOOSKENS, C. Traveling time as a predictor of linguistic distance. Dialectologia et Geolinguistica, v. 13, p. 38-62, 2005.
GOOSKENS, C.; HEERINGA, W. Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data. Language Variation and Change, v. 16, n. 3, p. 189-207, 2004.
GRIEVE, J. A Corpus-Based Regional Dialect Survey of Grammatical Variation in Written Standard American English 340f. 2009. PhD (Dissertation) Northern Arizona University.
HAIMERL, E. Database Design and Technical Solutions for the Management, Calculation, and Visualization of Dialect Mass Data. Literary and Linguistic Computing, v. 21, n. 4, p. 437-444, 2006.
HEERINGA, W. Measuring dialect pronunciation differences using Levenshtein distance, 2004. 312f. PhD (Dissertation) University of Groningen.
HEERINGA, W.; NERBONNE, J. Dialect areas and dialect continua. Language Variation and Change, v. 13, n. 3, p. 375-400, 2001.
HERNÁNDEZ, N. User's Guide to FRED. URN: urn:nbn:de:bsz:25-opus24895, URL: http://www.freidok.uni-freiburg.de/volltexte/2489/ Freiburg: University of Freiburg, 2006.
HUSON, D. H.; BRYANT, D. Application of phylogenetic networks in evolutionary studies. Molecular Biology Evolution, v. 23, n. 2, p. 254-267, 2006.
JAIN, A. K.; MURTY, M. N.; FLYNN, P. J. Data clustering: a review. ACM Computing Surveys, v. 31, n. 3, p. 264-323, 1999.
KORTMANN, B.; SZMRECSANYI, B. Global synopsis: morphological and syntactic variation in English. In: KORTMANN, B.; SCHNEIDER, E.; BURRIDGE, K.; MESTHRIE, R.; UPTON, C. (Ed.). A Handbook of Varieties of English Berlin/New York: Mouton de Gruyter, 2004. v. 2.
KRUSKAL, J. B.; WISH, M. Multidimensional Scaling Newbury Park, London / New Delhi: Sage Publications, 1978.
LEINONEN, T. Factor Analysis of Vowel Pronunciation in Swedish Dialects. International Journal of Humanities and Arts Computing, v. 2, n. 1-2, p. 189-204, 2008.
MCMAHON, A.; HEGGARTY, P.; MCMAHON, R.; MAGUIRE, W. The sound patterns of Englishes: representing phonetic similarity. English Language and Linguistics, v. 11, n. 1, p. 113-142, 2007.
MCMAHON, A. M. S.; MCMAHON, R. Language classification by numbers Oxford New York: Oxford University Press, 2005.
NERBONNE, J. Computational Contributions to Humanities. Linguistic and Literary Computing, v. 20, n. 1, p. 25-40, 2005.
NERBONNE, J. Identifying Linguistic Structure in Aggregate Comparison. Literary and Linguistic Computing, v. 21, n. 4, p. 463-475, 2006.
NERBONNE, J. Variation in the aggregate: an alternative perspective for variationist linguistics. In: DEKKER, K.; MACDONALD, A.; NIEBAUM, H. (Eds.); Northern Voices: Essays on Old Germanic and Related Topics offered to Professor Tette Hofstra. Leuven: Peeters, 2008.
NERBONNE, J. Data-driven dialectology. Language and Linguistics Compass, v. 3, n. 1, p. 175-198, 2009.
NERBONNE, J.; HEERINGA, W.; KLEIWEG, P. Edit Distance and Dialect Proximity. In: SANKOFF, D.; KRUSKAL, J. (Ed.). Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Stanford: CSLI Press, 1999.
NERBONNE, J.; KLEIWEG, P. Toward a Dialectological Yardstick. Journal of Quantitative Linguistics, v. 14, n. 2, p. 148-166, 2007.
NERBONNE, J.; KLEIWEG, P.; MANNI, F. Projecting dialect differences to geography: bootstrapping clustering vs. clustering with noise. In: PREISACH, C.; SCHMIDT-THIEME, L.; BURKHARDT, H.; DECKER, R. (Ed.). Data Analysis, Machine Learning, and Applications. Proceedings of the 31st Annual Meeting of the German Classification Society Berlin: Springer, 2008.
NUNNALLY, J. C. Psychometric Theory McGraw-Hill, 1978.
ORTON, H.; SANDERSON, S.; WIDDOWSON, J. D. A. The Linguistic Atlas of England London, Atlantic Highlands, N.J.: Croom Helm, 1978.
PENKE, M.; ROSENBACH, A. What counts as evidence in linguistics? An introduction. Studies in Language, v. 28, n. 3, p. 480-526, 2004.
SÉGUY, J. La relation entre la distance spatiale et la distance lexicale. Revue de Linguistique Romane, v. 35, p. 335-357, 1971.
SHACKLETON, R. G. J. English-American Speech Relationships: A Quantitative Approach. Journal of English Linguistics, v. 33, n. 2, p. 99-160, 2005.
SHACKLETON, R. G. J. Phonetic variation in the traditional English dialects: a computational analysis. Journal of English Linguistics, v. 35, n. 1, p. 30-102, 2007.
SZMRECSANYI, B. Corpus-based dialectometry: aggregate morphosyntactic variability in British English dialects. International Journal of Humanities and Arts Computing, v. 2, n. 1-2, p. 279-296, 2008.
SZMRECSANYI, B. The morphosyntax of BrE dialects in a corpus-based dialectometrical perspective: feature extraction, coding protocols, projections to geography, summary statistics. URN: urn:nbn:de:bsz:25-opus-73209, URL: http://www.freidok.uni-freiburg.de/volltexte/7320/ Freiburg: University of Freiburg, 2010.
SZMRECSANYI, B. Corpus-based dialectometry a methodological sketch. Corpora, v. 6, n. 1, 2011.
SZMRECSANYI, B. Geography is overrated. In: HANSEN, S.; SCHWARZ, C.; STOECKLE, P.; STRECK, T. (Ed.). Dialectological and folk dialectological concepts of space Berlin, New York: Walter de Gruyter, to appear.
SZMRECSANYI, B.; HERNÁNDEZ, N. Manual of Information to accompany the Freiburg Corpus of English Dialects Sampler ("FRED-S"). URN: urn:nbn:de:bsz:25-opus-28598, URL: http://www.freidok.uni-freiburg.de/ volltexte/2859/ Freiburg: University of Freiburg, 2007.
SZMRECSANYI, B.; KORTMANN, B. The morphosyntax of varieties of English worldwide: a quantitative perspective. Lingua, v. 119, n. 11, p. 1643-1663, 2009.
TRUDGILL, P. Linguistic change and diffusion: description and explanation in sociolinguistic dialect geography. Language in Society, v. 2, p. 215-246, 1974.
VIERECK, W. Linguistic atlases and dialectometry: The survey of English dialects. In: KIRK, J. M.; SANDERSON, S.; WIDDOWSON, J. D. A. (Ed.). Studies in linguistic geography: The dialects of English in Britain and Ireland. London: Croom Helm, 1985.
VORONOI, G. Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Journal für die Reine und Angewandte Mathematik, v. 133, p. 97-178, 1907.
WARD, J. H. J. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, v. 58, p. 236-244, 1963.
WIELING, M.; HEERINGA, W.; NERBONNE, J. An aggregate analysis of pronunciation in the Goeman-Taeldeman-van Reenen-Project data. Taal en Tongval, v. 59, n. 1, p. 84-116, 2007.
Downloads
Published
Issue
Section
License
Copyright (c) 2012 Revista Brasileira de Linguística Aplicada

This work is licensed under a Creative Commons Attribution 4.0 International License.
Autores de artigos publicados pela RBLA mantêm os direitos autorais de seus trabalhos, licenciando-os sob a licença Creative Commons BY Attribution 4.0, que permite que os artigos sejam reutilizados e distribuídos sem restrição, desde que o trabalho original seja corretamente citado.


