N-gram probability effects in a cloze task

Cyrus Shaoul, R. Harald Baayen, Chris F Westbury

Full Text: PDF   Paper Package: ShaoulBaayenWestbury2015_1.0 tar.gz PID: 11022/0000-0000-30AD-7


What knowledge influences our choice of words when we write or speak?Predicting which word a person will produce next is not easy, even when thelinguistic context is known. One task that has been used to assess context de-pendent word choice is the fill-in-the-blank task, also called the cloze task. Thecloze probability of specific context is an empirical measure found by askingmany people to fill in the blank. In this paper we harness the power of large cor-pora to look at the influence of corpus-derived probabilistic information from aword’s micro-context on word choice. We asked young adults to complete shortphrases called n-grams with up to 20 responses per phrase. The probabilityof the responded word and the conditional probability of the response giventhe context were predictive of the frequency with which each response wasproduced. Furthermore the order in which the participants generated multiplecompletions of the same context was predicted by the conditional probabilityas well. These results suggest that word choice in cloze tasks taps into implicitknowledge of a person’s past experience with that word in various contexts.Furthermore, the importance of n-gram conditional probabilities in our anal-ysis is further evidence of implicit knowledge about multi-word sequences andsupport theories of language processing that involve anticipating or predictingbased on context.

The Mental Lexicon


Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. DOI: 10.1109/TAC.1974.1100705

Arnon, I., & Cohen Priva, U. (2013). More than words: The effect of multi-word frequency and

constituency on phonetic duration. Language and Speech, 56(3), 349–371.

DOI: 10.1177/0023830913484891

Arnon, I., & Snider, N. (2010). More than words: frequency effects for multi-word phrases. Journal of Memory and Language, 62(1), 67–82. DOI: 10.1016/j.jml.2009.09.005

Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R.

Cambridge, UK: Cambridge University Press. DOI: 10.1017/CBO9780511801686

Baayen, R. H. (2010). Demythologizing the word frequency effect: A discriminative learning

perspective. The Mental Lexicon, 5(3), 436–461. DOI: 10.1075/ml.5.3.10baa

Baayen, R. H., Hendrix, P., & Ramscar, M. (2013). Sidestepping the combinatorial explosion:

An explanation of n-gram frequency effects based on naive discriminative learning. Language and Speech, 56(3), 329–347. DOI: 10.1177/0023830913484896

Baayen, R. H., Milin, P., Djurdjevic, D., Hendrix, P., & Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3), 438–481. DOI: 10.1037/a0023851

Bar, M. (2007). The proactive brain: Using analogies and associations to generate predictions.

Trends in Cognitive Sciences, 11(7), 280–289. DOI: 10.1016/j.tics.2007.05.005

Bates, D., Mächler, M., & Bolker, B. (2011). lme4: linear mixed-effects models using S4 classes.

Retrieved from http://cran.r-project.org/web/packages/lme4/.

Battig, W., & Montague, W. (1969). Category norms of verbal items in 56 categories: A replication and extension of the Connecticut category norms. Journal of Experimental Psychology,

, 1–46.

Beattie, G., & Butterworth, B. (1979). Contextual probability and word frequency as determinants of pauses and errors in spontaneous speech. Language and Speech, 22(3), 201.

Belsley, D. A., Kuh, E., & Welsch, R. E. (2004). Regression diagnostics: Identifying influential data

and sources of collinearity. Hoboken, NJ, USA: Wiley-Interscience.

Block, C., & Baldwin, C. (2010). Cloze probability and completion norms for 498 sentences: Behavioral and neural validation using event-related potentials. Behavior research methods,

(3), 665–670. DOI: 10.3758/BRM.42.3.665

Bloom, P., & Fischler, I. (1980). Completion norms for 329 sentence contexts. Memory and

Cognition, 8(6), 631–642. DOI: 10.3758/BF03213783

Bormuth, J. (1966). Readability: A new approach. Reading Research Quarterly, 1, 79–132. DOI: 10.2307/747021

Brants, T., & Franz, A. (2006). Web 1T 5-gram version 1. Philadelphia, PA USA: Linguistic Data Consortium.

Chambers, J. M. (1992). Linear models. In J. M. Chambers & T. J. Hastie (Eds.), Statistical models in S (Chap. 4). USA, NY: Wadsworth & Brooks.

Chou, Y. M., Polansky, A. M., & Mason, R. L. (1998). Transforming non-normal data to normality in statistical process control. Journal of Quality Technology, 30(2), 133–141.

Conway, C. M., Bauernschmidt, A., Huang, S., & Pisoni, D. (2010). Implicit statistical learning in language processing: word predictability is the key. Cognition, 114(3), 356–371. DOI: 10.1016/j.cognition.2009.10.009

Criss, A., Aue, W., & Smith, L. (2010). The effects of word frequency and context variability in cued recall. Journal of Memory and Language, 64(2), 119–132.

Crowe, S. (1998). Decrease in performance on the verbal fluency test as a function of time: Evaluation in a young healthy sample. Journal of Clinical and Experimental Neuropsychology, 20(3), 391–401. DOI: 10.1076/jcen.20.3.391.810

DeLong, K., Urbach, T., & Kutas, M. (2005). Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience, 8(8), 1117. DOI: 10.1038/nn1504

Dilkina, K., McClelland, J. L., & Plaut, D. C. (2010). Are there mental lexicons? The role of semantics in lexical decision. Brain Research, 1365, 66–81. DOI: 10.1016/j.brainres.2010.09.057

Ellis, W. (1999). A source book of Gestalt psychology. London, UK: Psychology Press.

Elman, J. (2011). Lexical knowledge without a lexicon? The Mental Lexicon, 6(1), 1–33. DOI: 10.1075/ml.6.1.01elm

Fano, R. M., & Hawkins, D. (1961). Transmission of information: A statistical theory of communications. American Journal of Physics, 29, 793. DOI: 10.1119/1.1937609

Fillenbaum, S., Jones, L., & Rapoport, A. (1963). The predictability of words and their grammatical classes as a function of rate of deletion from a speech transcript1. Journal of Verbal Learning and Verbal Behavior, 2(2), 186–194. DOI: 10.1016/S0022-5371(63)80084-5

Finn, P. (1977). Word frequency, information theory, and cloze performance: A transfer feature theory of processing in reading. Reading Research Quarterly, 13(4), 508–537. DOI: 10.2307/747510

Francis, W., & Kucera, H. (1982). Frequency analysis of English usage. Boston, MA, USA: Houghton Mifflin Company.

Frank, S. L., & Bod, R. (2011). Insensitivity of the human sentence-processing system to hierarchical structure. Psychological Science, 22(6), 829–834. DOI: 10.1177/0956797611409589

Griffin, Z., & Bock, K. (1998). Constraint, word frequency, and the relationship between lexical processing levels in spoken word production. Journal of Memory and Language, 38(3),313–338. DOI: 10.1006/jmla.1997.2547

Hahn, L. W., & Sivley, R. M. (2011). Entropy, semantic relatedness and proximity. Behavior Research Methods, 43(3), 746–760.

Hay, J., Pelucchi, B., Estes, K., & Saffran, J. (2011). Linking sounds to meanings: Infant statistical learning in a natural language. Cognitive Psychology, 63(2), 93–106. DOI: 10.1016/j.cogpsych.2011.06.002

Kamide, Y. (2008). Anticipatory processes in sentence processing. Language and Linguistics Compass, 2(4), 647. DOI: 10.1111/j.1749-818X.2008.00072.x

Kučera, H., & Francis, W. (1967). Computational analysis of present-day American English. Dartmouth, NH, USA: Dartmouth Publishing Group.

Kutas, M., & Hillyard, S. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307(5947), 161–163. DOI: 10.1038/307161a0

McEvoy, C. L., Nelson, D. L., & Komatsu, T. (1999). What is the connection between true and false memories? The differential roles of inter item associations in recall and recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(5), 1177. DOI: 10.1037/0278-7393.25.5.1177

McKenna, M. C. (1986). Cloze procedure as a memory-search process. Journal of Educational Psychology, 78, 433–440. DOI: 10.1037/0022-0663.78.6.433

Mirman, D., Graf Estes, K., & Magnuson, J. (2010). Computational modeling of statistical learning: Effects of transitional probability versus frequency and links to word learning. Infancy, 15(5), 471–486. DOI: 10.1111/j.1532-7078.2009.00023.x

Nelson, D. L., McEvoy, C. L., & Dennis, S. (2000). What is free association and what does it measure? Memory & Cognition, 28(6), 887–899. DOI: 10.3758/BF03209337

Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms. http://www.usf.edu/FreeAssociation/.

Nelson, D. L., McKinney, V., Gee, N., & Janczura, G. (1998). Interpreting the influence of implicitly activated memories on recall and recognition. Psychological Review, 105(2), 299. DOI: 10.1037/0033-295X.105.2.299

Norris, D., & Kinoshita, S. (2008). Perception as evidence accumulation and Bayesian inference: Insights from masked priming. Journal of Experimental Psychology: General, 137(3), 434–455. DOI: 10.1037/a0012799

Owens, M., O’Boyle, P., McMahon, J., Ming, J., & Smith, F. (1997). A comparison of human and statistical language model performance using missing-word tests. Language and Speech, 40(4), 377.

Pickering, M., & Garrod, S. (2007). Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences, 11(3), 105–110. DOI: 10.1016/j.tics.2006.12.002

R Development Core Team. (2009). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Ramscar, M., & Gitcho, N. (2007). Developmental change and the nature of learning in childhood. Trends in Cognitive Science, 11(7), 274–279. DOI: 10.1016/j.tics.2007.05.007

Ruff, R., Light, R., Parker, S., & Levin, H. (1997). The psychological construct of word fluency. Brain and Language, 57(3), 394–405. DOI: 10.1006/brln.1997.1755

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. DOI: 10.1126/science.274.5294.1926

Schwanenflugel, P., & LaCount, K. (1988). Semantic relatedness and the scope of facilitation for upcoming words in sentences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(2), 344. DOI: 10.1037/0278-7393.14.2.344

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423. DOI: 10.1002/j.1538-7305.1948.tb01338.x

Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30(1), 50–64. DOI: 10.1002/j.1538-7305.1951.tb01366.x

Shaoul, C., & Westbury, C. F. (2011). Formulaic sequences: Do they exist and do they matter? The Mental Lexicon, 6(1), 171–196. DOI: 10.1075/ml.6.1.07sha

Shaoul, C., Westbury, C. F., & Baayen, R. H. (2013). The subjective frequency of word n-grams. Psihologija, 46(4), 497–537. DOI: 10.2298/PSI1304497S

Smith, N. J. (2011). Scaling up psycholinguistics. Unpublished Doctoral Dissertation Downloaded in December, 2013 from http://vorpus.org/. San Diego, CA, USA: University of California, San Diego.

Smith, N. J., & Levy, R. (2011). Cloze but no cigar: The complex relationship between cloze, corpus, and subjective probabilities in language processing. In Proceedings of the 33rd annual meeting of the cognitive science conference (pp. 1637–1642).

Sprenger, S., & van Rijn, H. (2013). It’s time to do the math: Computation and retrieval in phrase production. The Mental Lexicon, 8(1), 1–25. DOI: 10.1075/ml.8.1.01spr

Taylor, W. (1953). “Cloze procedure”: A new tool for measuring readability. Journalism Quarterly, 30(4), 415–433.

Tremblay, A., & Tucker, B. V. (2011). The effects of N-gram probabilistic measures on the recognition and production of four-word sequences. The Mental Lexicon, 6(2), 302–324.DOI: 10.1075/ml.6.2.04tre

Willems, R., & Hagoort, P. (2007). Neural evidence for the interplay between language, gesture,and action: A review. Brain and Language, 101(3), 278–289. DOI: 10.1016/j.bandl.2007.03.004

Wood, S. (2006). Generalized additive models: An introduction with R. USA, NY: CRC Press.