Models, forests and trees of York English: Was/were variation as a case study for statistical practice

Sali A. Tagliamonte, R. Harald Baayen

Full Text: PDF   Paper Package: TagliamonteBaayen2012_1.0 tar.gz PID: 11022/0000-0000-1660-B


What is the explanation for vigorous variation between was and were in plural existential constructions and what is the optimal tool for analyzing it? The standard variationist tool — the variable rule program — is a generalized linear model; however, recent developments in statistics have introduced new tools, including mixed-effects models, random forests and conditional inference trees. In a step-by-step demonstration, we show how this well known variable benefits from these complementary techniques. Mixed-effects models provide a principled way of assessing the importance of random-effect factors such as the individuals in the sample. Random forests provide information about the importance of predictors, whether factorial or continuous, and do so also for unbalanced designs with high multicollinearity, cases for which the family of linear models is less appropriate. Conditional inference trees straightforwardly visualize how multiple predictors operate in tandem. Taken together the results confirm that polarity, distance from verb to plural element and the nature of the DP are significant predictors. Ongoing linguistic change and social reallocation via morphologization are operational. Furthermore, the results make predictions that can be tested in future research. We conclude that variationist research can be substantially enriched by an expanded tool kit