N-gram probability effects in a cloze task

Cyrus Shaoul, R. Harald Baayen, Chris F Westbury

Full Text: PDF   Paper Package: ShaoulBaayenWestbury2015_1.0 tar.gz PID: 11022/0000-0000-30AD-7


What knowledge influences our choice of words when we write or speak?Predicting which word a person will produce next is not easy, even when thelinguistic context is known. One task that has been used to assess context de-pendent word choice is the fill-in-the-blank task, also called the cloze task. Thecloze probability of specific context is an empirical measure found by askingmany people to fill in the blank. In this paper we harness the power of large cor-pora to look at the influence of corpus-derived probabilistic information from aword’s micro-context on word choice. We asked young adults to complete shortphrases called n-grams with up to 20 responses per phrase. The probabilityof the responded word and the conditional probability of the response giventhe context were predictive of the frequency with which each response wasproduced. Furthermore the order in which the participants generated multiplecompletions of the same context was predicted by the conditional probabilityas well. These results suggest that word choice in cloze tasks taps into implicitknowledge of a person’s past experience with that word in various contexts.Furthermore, the importance of n-gram conditional probabilities in our anal-ysis is further evidence of implicit knowledge about multi-word sequences andsupport theories of language processing that involve anticipating or predictingbased on context.

The Mental Lexicon


