This chapter provides examples of different approaches to quantifying the co-occurrence of multiword sequences in corpora. It provides a brief overview of several methods for interpreting the frequency of word co-occurrence (including absolute and relative metrics), referencing some of the psycholinguistic research in this area. The chapter discusses several approaches, with illustrations from quantitative analyses of French corpus data, including comments on applications and potential statistical pitfalls. These discussions point toward a model of language change and cognition in which different statistical metrics serve complementary roles, both in cognition and in empirical corpus-based research. In a generative model, there is no reason to expect that frequency of multiword units would affect processing; sentences are expected to be generated by rules, and frequencies are irrelevant. However, a number of experiments provide evidence that sequences that are high in token frequency are easier for speakers to process.
© 2001-2026 Fundación Dialnet · Todos los derechos reservados