Back to list

Detail of contribution

Auteur: Mark JOHNSON

Titre:
Language Acquisition as Statistical Inference


Abstract/Résumé: Statistical approaches are often described by both opponents and proponents as antithetical to generative approaches to language, but are they really? This paper argues that statistical methods are indeed compatible with accounts of language that posit rich innate knowledge constraining complex compositional structures. Using a simple probabilistic context-free grammar as a motivating example, this paper shows how standard statistical techniques can infer the values of syntactic parameters and identify the syntactic categories of words given input strings alone. Bayesian priors can formalise the information innately available to a learner in advance of experience, and a Bayesian prior can bias a learner toward the correct analysis in situations where the input does not contain sufficient information. Interestingly, Bayesian priors can encode “soft” violable constraints as well as “hard” constraints on the grammar learned, providing a new way of understanding universal markedness preferences. The picture that emerges is one in which linguistic theory and statistical methods are complementary rather than antagonistic. Linguistic theory specifies the structures and parameters of variation of human language, while statistical inference provides a framework in which the acquisition of this variation (including the lexicon) can be studied. The paper ends with a survey of the state of the art. Statistical methods do provide new perspectives on thorny issues, such as logical problems of language acquisition involving the lack of negative evidence. Even though current statistical methods are still rather blunt tools involving approximations and abstractions whose linguistic impact is still largely unexplored, it is reasonable to expect that a deeper integration of linguistic theory and statistical inference will improve our understanding of language acquisition.