<p>L</p>
<div class="gmail_quote">On Jul 30, 2012 12:00 PM, <<a href="mailto:corpora-request@uib.no">corpora-request@uib.no</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Today's Topics:<br>
<br>
1. Querying Dependency-Annotated Corpora (Niels Ott)<br>
2. Multilingual Machine Translation and Text Mining Position at<br>
the Joint Research Centre - European Commission (marco turchi)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Mon, 30 Jul 2012 10:58:51 +0200<br>
From: Niels Ott <<a href="mailto:nott@sfs.uni-tuebingen.de">nott@sfs.uni-tuebingen.de</a>><br>
Subject: [Corpora-List] Querying Dependency-Annotated Corpora<br>
To: <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
<br>
Dear Corpora People,<br>
<br>
I spent some time googling for a tool that allows to explore and query<br>
huge dependency-annotated corpora. With huge I 'm referring to something<br>
as large as sDeWaC (~44M sentences), annotated the way MaltParser would<br>
do it automagically. I found no such tool.<br>
<br>
How do people search for things in dependency treebanks?<br>
<br>
Thanks for your time and help.<br>
<br>
Best<br>
<br>
Niels Ott<br>
<br>
<br>
--<br>
Niels Ott (M.A.), Computational Linguist<br>
SFB 833 "Bedeutungskonstitution", Projekt A4, Universität Tübingen<br>
<a href="http://www.sfs.uni-tuebingen.de/~nott" target="_blank">http://www.sfs.uni-tuebingen.de/~nott</a><br>
<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Mon, 30 Jul 2012 11:28:34 +0200<br>
From: marco turchi <<a href="mailto:marco.turchi@gmail.com">marco.turchi@gmail.com</a>><br>
Subject: [Corpora-List] Multilingual Machine Translation and Text<br>
Mining Position at the Joint Research Centre - European Commission<br>
To: <a href="mailto:mt-list@eamt.org">mt-list@eamt.org</a><br>
Cc: moses-support <<a href="mailto:moses-support@mit.edu">moses-support@mit.edu</a>>, <a href="mailto:elsnet-list@elsnet.org">elsnet-list@elsnet.org</a>,<br>
<a href="mailto:dbworld@cs.wisc.edu">dbworld@cs.wisc.edu</a>, <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
<br>
On behalf of the Optima Team at the Joint Research Centre - European<br>
Commission<br>
<br>
==================================================================<br>
<br>
Please pass onto any potentially interested parties.<br>
Apologies for cross-posting.<br>
<br>
==================================================================<br>
<br>
The Optima Team at the Joint Research Centre - JRC - European Commission is<br>
currently looking for a Postdoctoral Researcher in the fields of<br>
Multilingual Machine Translation and Text Mining.<br>
<br>
The successful candidate will help improve and extend several text mining<br>
applications, but to a large extent s/he will work on improving and<br>
extending the JRC?s in-house machine translation (MT) system ONTS (OPTIMA<br>
News Translation System). ONTS is a predominantly statistical MT system<br>
based on Moses, but it additionally makes use of JRC?s in-house resources<br>
(e.g. lists of person names and their variants across different languages<br>
and scripts). To date, ONTS has been trained for 11 language pairs (all<br>
into English). See <a href="http://optima.jrc.it/Translate/" target="_blank">http://optima.jrc.it/Translate/</a> for a demo of the<br>
current status of ONTS and the publication ?ONTS: OPTIMA News Translation<br>
System? (Turchi et al., EACL?2012) for a technical description of the work<br>
carried out so far.<br>
<br>
Possible research avenues related to improving the machine translation<br>
results include (a) finding and gathering more training data; (b)<br>
exploiting comparable news collections to improve the MT performance; and<br>
(c) investigating pre-processing techniques for morphologically complex<br>
languages. Further possible research avenues are related to using MT<br>
technology (d) to improve other text mining tools, e.g. event extraction,<br>
cross-lingual linking of related news or multilingual document<br>
categorisation. Much will be left to the initiative of the candidate as<br>
long as the efforts are targeted towards the objective of the OPTIMA action.<br>
<br>
The system within which the results will be deployed is implemented in Java<br>
as a set of servlets in Tomcat and the data processing chain makes<br>
extensive use mark-up languages. Java programming skills and experience<br>
with mark-up languages are therefore required.<br>
<br>
Qualifications:<br>
<br>
- University degree in computational linguistics, computer science or<br>
related areas;<br>
- Doctoral degree in a similar discipline, or equivalent work experience<br>
of 5 years;<br>
- Good written and spoken English language skills are required. Given<br>
the strong focus on multilinguality in the work, at least passive knowledge<br>
of other languages is also required.<br>
- Programming skills in Java;<br>
- Hands-on experience with Moses or other, similar statistical machine<br>
translation engines;<br>
- Experience in an application-oriented setting would be beneficial;<br>
- Ability to write scientific publications;<br>
- Team player, proactive in research, as well as an ability to work<br>
independently and to communicate efficiently.<br>
<br>
Indicative duration: 36 months<br>
Preferred starting date: ASAP<br>
<br>
JRC site: Ispra, Italy<br>
<br>
**CLOSING DATE FOR APPLICATIONS: 16/09/2012 23:59 CET**<br>
<br>
Further Information: <a href="http://recruitment.jrc.ec.europa.eu/?type=GH" target="_blank">http://recruitment.jrc.ec.europa.eu/?type=GH</a><br>
Code: 2012-IPR-G-30-000-00481 - CAT 30 - ISPRA<br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: not available<br>
Type: text/html<br>
Size: 3453 bytes<br>
Desc: not available<br>
URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20120730/6ec6cbb0/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20120730/6ec6cbb0/attachment.txt</a>><br>
<br>
----------------------------------------------------------------------<br>
Send Corpora mailing list submissions to<br>
<a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:corpora-request@uib.no">corpora-request@uib.no</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:corpora-owner@uib.no">corpora-owner@uib.no</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of Corpora digest..."<br>
<br>
<br>
_______________________________________________<br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br>
<br>
End of Corpora Digest, Vol 61, Issue 28<br>
***************************************<br>
</blockquote></div>