Cloze but no cigar: The complex relationship between cloze, corpus, and subjective probabilities in language processing


When performing online language comprehension, comprehenders probabilistically anticipate upcoming words. Psycholinguistic studies thus often depend on accurately estimating stimulus predictability, either to control it or to study it, and this estimation is conventionally accomplished via the cloze task. But we do not know how effectively --- or even, strictly speaking, whether --- cloze probabilities reflect comprehender predictions. This is both methodologically worrisome and an obstacle to detailed understanding of online predictive mechanisms. Here, we demonstrate first that cloze probabilities vary substantially and systematically from normative corpus statistics, and secondly that some portion of these deviations are also reflected in online comprehension measures. Therefore, while there is some reason to be concerned that cloze norming may be distorting the results of psycholinguistic studies, these apparent distortions may instead reflect genuine errors in native speakers' probabilistic models of their language.

