Back to list

Detail of contribution

Auteur: Scott LEE

The Role of Base Frequency in Evaluating Morphological Productivity

Abstract/Résumé: The Role of Base Frequency in Evaluating Morphological Productivity Baayen (1993) proposes two measures of morphological productivity: the category-conditioned and hapax-conditioned degrees of productivity (P and P*, respectively). However, they are effectively blind to the structural composition of the hapaxes, giving us no way to evaluate whether they are truly new or simply old and extremely rare. Intended as an extension of Hay and Baayen’s (2002) work on parsing and productivity, the solution proposed here is that we can quantitatively assess the structural restrictions imposed on a given word formation rule (as outlined in Aronoff 1976) by considering the base frequency of its hapax legomena. This proposal is motivated by two hopefully intuitive suggestions about the general nature of the lexicon: first, that derivation is a kind of sampling by which an affix chooses forms from an underlying population of bases for combination, and second, that such a process should be affected by the strength of the affix’s structural restrictions. The hypothesis this paper examines, then, is that there should be a direct correlation between the proportion of high-frequency bases among a category’s hapaxes and its productivity as measured by the number of its hapaxes not occurring in a comprehensive dictionary (in this case, the online OED). Data gathered from the Corpus of Contemporary American English (COCA) support this hypothesis. More interestingly, base-frequency rankings of affixes corrected counter-intuitive productivity rankings based exclusively on P values. For instance, both –able (P=0.00096) and –ness (P=0.00391) ranked lower than –esque (P=0.117411)—not normally assumed to be a fully-productive affix in English—but outranked nearly all other affixes in terms of their mean base frequency. Also, hapaxes found in the OED were far more likely to differ from their recorded meanings when containing a high-frequency base. Based on these findings, I propose that the notion of productivity in the general sense can be best captured by a new measure fP*, obtained by multiplying an affix’s fb value by its P* value, effectively weighting its contribution to the vocabulary’s growth rate by its statistical preference for high-frequency, and thus semantically maximally transparent bases. References Aronoff, M. (1976). Word formation in generative grammar. Cambridge: MIT Press. Baayen, H. (1993). On frequency, transparency, and productivity. In Booij, G. E. and van Marle, J. (eds), Yearbook of Morphology 1992, Kluwer Academic Publishers, Dordrecht, 181-208. Hay, J. & Baayen, R.H. (2002). Parsing and productivity. Yearbook of Morphology 2001, 203-235. Dordrecht, Netherlands: Kluwer Academic Publishers.