Quantitative social dialectology: Explaining linguistic variation geographically and socially

Martijn Wieling, John Nerbonne, R. Harald Baayen

Full Text: PDF   Paper Package: WielingNerbonneBaayen2013_1.0 tar.gz PID: 11022/0000-0000-1F17-5


In this study we examine linguistic variation and its dependence on both social and geographic factors. We follow dialectometry in applying a quantitative methodology and focusing on dialect distances, and social dialectology in the choice of factors we examine in building a model to predict word pronunciation distances from the standard Dutch language to 424 Dutch dialects. We combine linear mixed-effects regression modeling with generalized additive modeling to predict the pronunciation distance of 559 words. Although geographical position is the dominant predictor, several other factors emerged as significant. The model predicts a greater distance from the standard for smaller communities, for communities with a higher average age, for nouns (as contrasted with verbs and adjectives), for more frequent words, and for words with relatively many vowels. The impact of the demographic variables, however, varied from word to word. For a majority of words, larger, richer and younger communities are moving towards the standard. For a smaller minority of words, larger, richer and younger communities emerge as driving a change away from the standard. Similarly, the strength of the effects of word frequency and word category varied geographically. The peripheral areas of the Netherlands showed a greater distance from the standard for nouns (as opposed to verbs and adjectives) as well as for high-frequency words, compared to the more central areas. Our findings indicate that changes in pronunciation have been spreading (in particular for low-frequency words) from the Hollandic center of economic power to the peripheral areas of the country, meeting resistance that is stronger wherever, for well-documented historical reasons, the political influence of Holland was reduced. Our results are also consistent with the theory of lexical diffusion, in that distances from the Hollandic norm vary systematically and predictably on a word by word basis.


Akaike H (1974) A new look at the statistical identification model. IEEE transactions on Automatic Control 19: 716–723.

Baayen RH (2007) Storage and computation in the mental lexicon. In: Jarema G, Libben G, editors. The Mental Lexicon: Core Perspectives, Elsevier. pp. 81–104.

Baayen RH (2008) Analyzing linguistic data: A practical introduction to statistics using R. Cambridge University Press.

Baayen RH (2010) The directed compound graph of English. An exploration of lexical connectivity and its processing consequences. In: Olson S, editor. New impulses in word-formation (Linguistische Berichte Sonderheft 17). Hamburg: Buske. pp. 383–402.

Baayen RH, Davidson D, Bates D (2008) Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59: 390–412.

Baayen RH, Kuperman V, Bertram R (2010) Frequency effects in compound processing. In: Scalise S, Vogel I, editors. Compounding, Amsterdam/Philadelphia: Benjamins. pp. 257–270.

Baayen RH, Milin P, Durdević DF, Hendrix P, Marelli M (2011) An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review 118: 438–481.

Baayen RH, Piepenbrock R, Gulikers L (1996) CELEX2. Linguistic Data Consortium, Philadelphia.

Bailey G, Wikle T, Tillery J, Sand L (1991) The apparent time construct. Language Variation and Change 3: 241–264.

Blancquaert E, Pée W (1925–1982) Reeks Nederlandse Dialectatlassen. Antwerpen: De Sikkel.

Bloomfield L (1933) Language. London: Allen and Unwin.

Bybee J (2002) Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change 14: 261–290.

CBS Statline (2010) Kerncijfers wijken en buurten 1995. Available at http://statline.cbs.nl. Accessed: August 9, 2010.

Chambers J, Trudgill P (1998) Dialectology. Cambridge University Press, Second edition.

Cheshire J (2002) Sex and gender in variationist research. In: Chambers J, Trudgill P, Schilling-Estes N, editors. pp. 423–443. The Handbook of Language Variation and Change, Blackwell Publishing Ltd.

Church K, Hanks P (1990) Word association norms, mutual information, and lexicography. Computational Linguistics 16: 22–29.

Friedman L, Wall M (2005) Graphical views of suppression and multicollinearity in multiple regression. The American Statistician 59: 127–136.

Goebl H (1993) Dialectometry: A short overview of the principles and practice of quantitative classification of linguistic atlas data. In: Köhler R, Rieger B, editors. Contributions to Quantitative Linguistics. Dordrecht: Kluwer. pp. 277–315.

Goeman T, Taeldeman J (1996) Fonologie en morfologie van de Nederlandse dialecten. Een nieuwe materiaalverzameling en twee nieuwe atlasprojecten. Taal en Tongval 48: 38–59.

Gussenhoven C (1999) Illustrations of the IPA: Dutch. Handbook of the International Phonetic Association. Cambridge: Cambridge University Press. pp. 74–77.

Hasher L, Zacks RT (1984) Automatic processing of fundamental information. The case of frequency of occurrence. American Psychologist 39: 1372–1388.

Heeringa W (2004) Measuring Dialect Pronunciation Differences using Levenshtein Distance. Ph.D. thesis, Rijksuniversiteit Groningen.

Heeringa W, Joseph B (2007) The relative divergence of Dutch dialect pronunciations from their common source: An exploratory study. In: Nerbonne J, Ellison TM, Kondrak G, editors. Proceedings of the Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology. Stroudsburg, PA: ACL. pp. 31–39.

Heeringa W, Nerbonne J (1999) Change, convergence and divergence among Dutch and Frisian. In: Boersma P, Breuker PH, Jansma LG, van der Vaart J, editors. pp. 88–109. Philologia Frisica Anno 1999. Lêzingen fan it fyftjinde Frysk filologekongres, Fryske Akademy, Ljouwert.

Heeringa W, Nerbonne J (2001) Dialect areas and dialect continua. Language Variation and Change 13: 375–400.

Johnson DE (2009) Getting off the GoldVarb standard: Introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass 3: 359–383.

Keating P, Lindblom B, Lubker J, Kreiman J (1994) Variability in jaw height for segments in English and Swedish VCVs. Journal of Phonetics 22: 407–422.

Kloeke GG (1927) De Hollandse expansie in de zestiende en zeventiende eeuw en haar weerspiegeling in de hedendaagsche Nederlandse dialecten (The Hollandic expansion in the sixteenth and seventeenth centuries and her reflection in present-day Dutch dialects). The Hague: Martinus Nijhoff.

Kretzschmar W Jr (1996) Quantitative areal analysis of dialect features. Language Variation and Change 8: 13–39.

Labov W (1963) The social motivation of a sound change. Word 19: 273–309.

Labov W (1981) Resolving the Neogrammarian controversy. Language 57: 267–308.

Leinonen T (2010) An acoustic analysis of vowel pronunciation in Swedish dialects. Ph.D. thesis, University of Groningen.

Levenshtein V (1965) Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163: 845–848.

Milroy L (2002) Social Networks. In: Chambers J, Trudgill P, Schilling-Estes N, editors. pp. 549–572. The Handbook of Language Variation and Change, Blackwell Publishing Ltd.

Nerbonne J, Kleiweg P (2007) Toward a dialectological yardstick. Quantitative Linguistics 14: 148–167.

Nerbonne J, Prokić J, Wieling M, Gooskens C (2010) Some further dialectometrical steps. In: Aurrekoetxea G, Ormaetxea JL, editors. Tools for Linguistic Variation, Supplements of the Anuario de Filologia Vasca “Julio Urquijo”, XIII. Bilbao: University of the Basque Country. pp. 41–56.

Pagel M, Atkinson Q, Meade A (2007) Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449: 717–720.

Paolillo JC (2002) Analyzing linguistic variation: Statistical models and methods. Stanford, California: Center for the Study of Language and Information.

Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-PLUS. Statistics and Computing. New York: Springer.

Schmidt M, Kiviste A, von Gadow K (2011) A spatially explicit height–diameter model for Scots pine in Estonia. European Journal of Forest Research 130: 303–315.

Schneider E (1988) Qualitative vs. quantitative methods of area delimitation in dialectology: A comparison based on lexical data from Georgia and Alabama. Journal of English Linguistics 21: 175–212.

Schuchardt H (1885) Über die Lautgesetze: gegen die Junggrammatiker. Berlin: Oppenheim.

Smakman D (2006) Standard Dutch in the Netherlands. A sociolinguistic and phonetic description. Ph.D. thesis, Radboud Universiteit.

Séguy J (1971) La relation entre la distance spatiale et la distance lexicale. Revue de Linguistique Romane 35: 335–357.

Séguy J (1973) La dialectométrie dans l'atlas linguistique de Gascogne. Revue de Linguistique Romane 37: 1–24.

Tremblay A, Baayen RH (2010) Holistic processing of regular four-word sequences: A behavioral and ERP study of the effects of structure, frequency, and probability on immediate free recall. In: Wood D, editor. Perspectives on formulaic language: Acquisition and communication. London: The Continuum International Publishing Group. pp. 151–173.

Trudgill P (1974) Linguistic change and diffusion: Description and explanation in sociolinguistic dialect geography. Language in Society 3: 215–246.

Trudgill P (1986) Dialects in contact. Blackwell.

Van Reenen P (2006) In Holland staat een ‘Huis’. Kloekes expansietheorie met speciale aandacht voor de dialecten van Overijssel (Kloeke's expansion theory with special attention to the dialects of Overijssel). Amsterdam & Münster: Stichting Neerlandistiek VU & Nodus Publikationen.

Van der Wal M, van Bree C (2008) Geschiedenis van het Nederlands. Utrecht: Spectrum, fifth edition.

Wang W (1969) Competing changes as a cause of residue. Language 45: 9–25.

Wieling M, Heeringa W, Nerbonne J (2007) An aggregate analysis of pronunciation in the Goeman-Taeldeman-Van Reenen-Project data. Taal en Tongval 59: 84–116.

Wieling M, Nerbonne J (2011) Measuring linguistic variation commensurably. Dialectologia Special Issue II: Production, Perception and Attitude 141–162.

Wieling M, Prokić J, Nerbonne J (2009) Evaluating the pairwise alignment of pronunciations. In: Borin L, Lendvai P, editors. pp. 26–34. Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education.

Wood S (2003) Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65: 95–114.

Wood S (2006) Generalized additive models: an introduction with R. Chapman & Hall/CRC.

Woolhiser C (2005) Political borders and dialect divergence/convergence in Europe. In: Peter Auer FH, Kerswill P, editors. Dialect Change. Convergence and Divergence in European Languages. New York: Cambridge University Press. pp. 236–262.