26.4260, Diss: English, Portuguese, Spanish, Applied Ling, Comp Ling: Jose De Lucca: ' PhraseNET: Detección y extracción automatizada de unidades fraseológicas'

The LINGUIST List via LINGUIST linguist at listserv.linguistlist.org
Tue Sep 29 19:59:25 UTC 2015


LINGUIST List: Vol-26-4260. Tue Sep 29 2015. ISSN: 1069 - 4875.

Subject: 26.4260, Diss: English, Portuguese, Spanish, Applied Ling, Comp Ling: Jose De Lucca: ' PhraseNET: Detección y extracción automatizada de unidades fraseológicas'

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
              http://funddrive.linguistlist.org/donate/

Editor for this issue: Ashley Parker <ashley at linguistlist.org>
================================================================


Date: Tue, 29 Sep 2015 15:59:08
From: Jose De Lucca [JLDLME at HOTMAIL.COM]
Subject: PhraseNET: Detección y extracción automatizada de unidades fraseológicas

 
Institution: Universidad Politécnica de Valencia 
Program: Lenguas y Tecnologia 
Dissertation Status: Completed 
Degree Date: 2011 

Author: Jose De Lucca

Dissertation Title: PhraseNET: Detección y extracción automatizada de unidades
fraseológicas 

Linguistic Field(s): Applied Linguistics
                     Computational Linguistics

Subject Language(s): English (eng)
                     Portuguese (por)
                     Spanish (spa)


Dissertation Director(s):
Maria Luisa Carrió Pastor

Dissertation Abstract:

The present thesis lies within the area of Information Extraction (IE). We investigate the effectiveness of PhraseNET, that is, the software developed for the detection and the extraction of phraseological units of a corpus. We present the tools of this software through the interface, the linguistic features and the computer resources associated to the evaluation results obtained using a training corpus. Our main interest is focused on the locutions and phraseological units according to the classifications proposed by Corpas Pastor (1997).

The main topic of this doctoral dissertation is a fact that worries translators and linguists. It is not an easy task to look for the linguistic equivalences of the phraseological units of two languages. We consider a very relevant fact to design and implement a tool able to detect variations in language, i.e. changes due to verabl tenses, plural, gender, etc. The tool that we propose identifies the phraseological units of a textual corpus and look for their equivalent in other languages; the novelty of the tool we have designed is that it detects the units even when they vary their representation in the text.

The core of the authomatic system of the phraseological unit extraction is an algorithm based on a corpus which provides a list of all the units after a constrative analysis with a dictionary of lexical patterns. The main advantage of this method, compared with others, is that it does not require a very specialized knowledge of Phraseology.

Nevertheless, this process entails some difficulties when adapted to the extraction of units from other languages, difficulties that are inherent to the methodology of IE. As a consequence, PhraseNET is constlantly evolving and we are regurlarly implementing some aspects.

The objectives that we consider in this study are, on the one hand, to design a tool that allows us to detect phraseological units not taking into account their linguistic expression. On the other hand, to detect the phraseological units in the texts with examples that can identify their location in the corpus. Finally, to identify the same patterns in other languages.

Once designed the tool and described its different parts and its utilities, we finish this study concluding that PhraseNET can extract the following variations of the phraseological units: morphologic, syntactic, lexical, diatopic, diastratic and diafasic, internal modifications (as the reduction of the phraseological units with the elimination or addition of the components) and the external, in the periphery. We are conscious that this study could include some asdpects that we have not mentioned, but we have, at the moment, delimitated the basic aspects of the tool in order to improve its characteristics in the future.



----------------------------------------------------------
LINGUIST List: Vol-26-4260	
----------------------------------------------------------







More information about the LINGUIST mailing list