[Corpora-List] PhD thesis: Parsing and Multi-Word Expressions.
Agata Savary
agata.savary at univ-tours.fr
Fri May 3 16:24:58 UTC 2013
The Computer Science Laboratory of the François Rabelais University in Tours, France
offers a PhD grant in Natural Language Processing.
***************
**PhD subject**
Parsing and Multi-Word Expressions
***********
** Profile **
- Master in computer science or computational linguistics
- Good knowledge of French and English, another language would be a plus
- Interests in linguistics and familiarity with language technology
- Capacity to work independently and as a part of a team
***********
** Dates **
Application deadline: Mai 14, 2013 (or until filled)
Position starts: September 2013
***********
** Grant **
Amount: 1700-2000 € / month
Funding institution: French Ministry of Higher Education and Research
*************
** Contact **
agata.savary at univ-tours.fr
*****************
** Application **
- CV
- cover letter
- transcript of MSc and BSc grades
*************************
** Hosting Institution **
University: Université François Rabelais Tours (http://international.univ-tours.fr/welcome-international-265902.kjsp?RH=INTER&RF=INTER-EN)
Laboratory: Laboratoire d'informatique (LI) (http://li.univ-tours.fr/)
Research team: Databases and Natural Language Processing (BdTln), Campus in Blois
*****************
** Supervisors **
Prof. Denis Maurel (François Rabelais Université Tours)
Dr Agata Savary (François Rabelais Université Tours)
Dr Yannick Parmentier (Université d'Orléans)
**************************
** Scientific challenge **
This PhD thesis will be dedicated to fixed and semi-fixed Multi-Word Expressions (MWEs) such as "French fries", "random access memory" "to do one's
best", "to spill the beans", "to kick the bucket", etc. Despite a long established tradition in linguistic studies dedicated to such expressions, they
still belong to the major challenges in bridging the gap between linguistic precision and computational efficiency in Natural Language Processing
(NLP) applications [Sag et al 2002].
MWEs [Rayson et al., 2010] are prevalent in written and spoken corpora as they cover up to 40% of all tokens in a natural language text. They are
however hard to detect, analyze and translate by NLP tools due to their heterogeneous properties on different levels of linguistic processing:
segmentation, lexicon, syntax, semantics, etc. Moreover, knowledge on MWEs has been subject to fragmentation. For instance, lexicons of MWEs such as
compounds [Gralinski et al 2010], multi-word proper names [Tran and Maurel, 2006], complex terms [Savary et al., 2012], valence dictionaries,
lexicon-grammars [Tolone and Sagot, 2011], etc. - are frequently created with no explicit links to grammar formalisms [Savary, 2008]. Thus, their
application to parsing in not always straightforward. Conversely, many existing grammars do not account for MWEs on a large scale, even if the
associated formalisms (HPSG, LFG, TAG, CCG, dependency grammars, etc.) allow for their representation.
*****************
** PhD Project **
This PhD thesis will aim at extending the comprehension of MWEs in order to overcome the abovementioned challenges. It will address the
lexicon/grammar interface in order to propose hybrid - both knowledge-based and data-driven - methods for MWE representation and processing. The
following aspects will be studied:
- How to account for the fixed character of MWEs with respect to some linguistic phenomena on the one hand, and their similarities to regular
syntactic structures on the other hand [Grégoire, 2010]? French, English, Polish, and possibly other languages known by the PhD student will be addressed.
- In particular, how to represent, at the lexical level, phenomena most relevant to parsing, e.g., agreement, discontinuity and free word order?
- How should MWE lexicons be structured (e.g. within meta-grammar frameworks [Duchier et al., 2011b], [Crabbé et al., 2013]) in order
to be easily convertible and maximally reusable by different parsing formalisms?
- How to integrate MWE lexicons in probabilistic parsers [Nivre and Nilsson, 2004, Constant et al., 2012]?
- How to express semantics of MWEs?
*****************************
** International framework **
This PhD thesis will be integrated into *PARSEME* (PARsing and Multi-word Expressions), a European action funded by the COST program
(http://www.cost.eu/domains\_actions/ict/Actions/IC1207). The Actions' consortium gathers partners from of 27 countries around scientific challenges
in automatic processing of Multi-Word Expressions.
****************
** References **
Constant, M., Sigogne, A., and Watrin, P. (2012). Discriminative strategies to
integrate multiword expression recognition and parsing. In Proceedings of the 50th Annual Meeting
of the Association for Computational Linguistics: Long Papers - Volume 1, ACL’12, pages 204–212,
Stroudsburg, PA, USA. Association for Computational Linguistics.
Crabbé, B., Duchier, D., Gardent, C., Le Roux, J., and Parmentier, Y. (2013).
XMG : eXtensible MetaGrammar. Computational Linguistics, 39(3):1–38.
Duchier, D., Dao, T.-B.-H., and Parmentier, Y. (2013). Model-Theory and
Implementation of Property Grammar. Journal of Logic and Computation, pages 1–19. To appear.
Duchier, D., Parmentier, Y., and Petitjean, S. (2011b). Cross-framework
Grammar Engineering using Constraint-driven Metagrammars. In 6th International Workshop on
Constraint Solving and Language Processing (CSLP’11), pages 32–43, Karlsruhe, Germany.
Graliński, F., Savary, A., Czerepowicka, M., and Makowiecki, F. (2010). Computational Lexicography of Multi-Word Units: How Efficient Can It Be? In
Proceedings of the COLING-MWE’10 Workshop, Beijing, China.
Grégoire, N. (2010). DuELME: a Dutch electronic lexicon of multiword expressions.
Language Resources and Evaluation, 44(1-2).
Nivre, J. and Nilsson, J. (2004). Multiword Units in Syntactic Parsing. In
MEMURA 2004 - Methodologies and Evaluation of Multiword Units in Real-World Applications,
Workshop at LREC 2004, pages 39–46, Lisbon, Portugal.
Rayson, P., Piao, S., Aharoff, S., Evert, S., and na Villada Moir ́n, B., editors (2010). Multiword expression: hard going or plain sailing, volume 44
of Language Resources and Evaluation. Springer.
Sag, I. A., Baldwin, T., Bond, F., Copestake, A., and Flickinger, D. (2002). Multiword Expresions: A Pain in the Neck for NLP. In Proceedings of
CICLING’02. Springer.
Savary, A. (2008). Computational Inflection of Multi-Word Units. A contrastive study
of lexical approaches. Linguistic Issues in Language Technology, 1(2):1–53.
Savary, A., Zaborowski, B., Krawczyk-Wieczorek, A., and Makowiecki, F. (2012).
SEJFEK - a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units. In Proceedings
of Cognitive Aspects of the Lexicon (COGALEX-III), a Workshop at COLING 2012.
Tolone, E. and Sagot, B. (2011). Using Lexicon-Grammar tables for French
verbs in a large-coverage parser. In Vetulani, Z., editor, Human Language Technology. Challenges for
Computer Science and Linguistics. 4th Language and Technology Conference, LTC 2009, Poznań, Poland, November 6-8, 2009, Revised Selected Papers,
volume 6562 of Lecture Notes in Artificial Intelligence (LNAI), pages 183–191. Springer Verlag.
Tran, M. and Maurel, D. (2006). Prolexbase : Un dictionnaire relationnel
multilingue de noms propres. Traitement automatique des langues, 47(3):115–139.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list