Back to list

Detail of contribution

Auteur: Francesca STRIK LIEVERS

Co-Auteur(s): Hongzhi XU, Hong Kong Polytechnic University, Hong Kong Ge XU, Peking University, China

A Methodology for the Extraction of Lexicalized Synaesthesia from Corpora

Abstract/Résumé: The automatic identification of metaphors in text is an extremely challenging task. We focus here on a specific type of metaphor: synaesthesia (e.g., warm colour). Studies mainly based on poetic language (starting from Ullmann 1957) suggest that a precise directionality can be found in synaesthetic transfers. However, a large-scale interlinguistic study based on ordinary language is still missing, and it could be more revealing of possible generalizations concerning the connection of different sensory modalities. Our purpose is to create a database of (more or less) lexicalized synaesthetic metaphors extracted from large corpora, which can be used for theoretical research on the topic. Lexicalized synaesthesia is rare: using a (semi)automatic extraction procedure is therefore the only way to obtain a sufficient number of istances for making quantitative considerations. In synaesthesia, a perception related to one sensory modality is described by lexical means usually associated to a different sensory modality (e.g., sweet voice describes hearing in terms of taste). In order to find synaesthesia in corpus data, we are therefore going to extract sentences that contain two lexemes referring to two different sensory modalities. The first step is thus compiling a list of perception-related lexemes, divided by sensory modality (as we will show, this is not a trivial issue). At the present stage, the study is based on English and Italian data. Using web corpora, we carried out two experiments: - Method 1: extracts sentences containing two perception lexemes that a) belong to two different sensory modalities, and b) are at a maximum distance of 1 token. E.g.: The petals smell[SMELL] as sweet[TASTE] as if nothing had happened. - Method 2: extracts sentences containing two perception lexemes that a) belong to two different sensory modalities, and b) are in a dependency relation. E.g.: It begins with a lovely soft[TOUCH], almost eastern inspired sound[HEARING] Method 2 is shown to be more efficient for at least two reasons: 1) The percentage of “true” synaesthesiae among the potential synaesthesiae extracted is considerably higher, which implies less need for manual control; 2) Provided that a relevant dependency relation between the two lexemes is given, there is no distance constraint, which allows finding instances that Method 1 would miss. However, it is worth noting that in future research Method 1 can still be usefully applied to languages for which dependency parsers are not available.