Back to list

Detail of contribution

Auteur: Athanasios KARASIMOS

Co-Auteur(s): Ioannis KARAVIDAS

Predicting the unpredictable: the ALLOMANTIS experiment

Abstract/Résumé: Predicting the unpredictable: the ALLOMANTIS experiment Keywords: Maximum Entropy, Supervised Morphology Learning, Allomorphy, Derivation, Greek Abstract Introduction The computational treatment of allomorphy is still a huge challenge since the first systematic attempts on predicting allomorphy with machine learning techniques (Rumelhart & McLelland 1986 Pinker & Prince 1988, Ling & Marinov 1993 among others). Our goals are to predict the allomorphic changes for the Greek nominal allomorphy in derivation and to show the essential contribution of various morphological, phonological and semantic characteristics. Our maximum entropy (ME) model is based on AMIS benefits from linguistic feature sets. Given a set of events as training data, the program sets parameters that optimize the likelihood of the training data. The diachronic research reveals that allomorphy is usually relics of non-active phonological and morphological rules and changes in a language, more specifically in Greek. Therefore we make the assumption that perhaps the Greek words “include” the necessary information to a system with minimal supervision to predict whether a stem or a word has allomorphs and its type. Various characteristics have been used such as Allomorphic Class, Inflectional Class, Syllables, Stress, etc. Our model was trained on a corpus of 4,677 inflected nouns. Training data contain inflected nouns (stem and inflectional suffixes), which are not derived by other words or have derivational suffixes that are synchronically morphological opaque. To evaluate the effectiveness of our model, a testing corpus with derived nominal nouns was created. This second corpus contains 2,755 carefully selected nouns to cover the full range of features and all the nominal derivational suffixes. We created ALLOMANTIS, a morphological prediction analyzer for nominal allomorphy, which takes an input imported data from our training corpora on AMIS. The overall accuracy of the model was 86.49% with the failure rate up to 13.51. However, the result of the upgraded version of ALLOMANTIS was the rise of the correct prediction to 91.43% with the failed cases to 8.47%. We will show that a (supervised) probabilistic model applied to a corpus with quite rich annotated words can extract some basic principles that can be the keystone to construct a computational model to process successfully the “unpredictable” and hard-to-deal phenomenon of allomorphy.