<p>L</p>

<div class="gmail_quote">On Jul 30, 2012 12:00 PM,  <<a href="mailto:corpora-request@uib.no">corpora-request@uib.no</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Today's Topics:<br>

<br>

   1.  Querying Dependency-Annotated Corpora (Niels Ott)<br>

   2.  Multilingual Machine Translation and Text Mining Position at<br>

      the Joint Research Centre - European Commission (marco turchi)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Mon, 30 Jul 2012 10:58:51 +0200<br>

From: Niels Ott <<a href="mailto:nott@sfs.uni-tuebingen.de">nott@sfs.uni-tuebingen.de</a>><br>

Subject: [Corpora-List] Querying Dependency-Annotated Corpora<br>

To: <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>

<br>

Dear Corpora People,<br>

<br>

I spent some time googling for a tool that allows to explore and query<br>

huge dependency-annotated corpora. With huge I 'm referring to something<br>

as large as sDeWaC (~44M sentences), annotated the way MaltParser would<br>

do it automagically. I found no such tool.<br>

<br>

How do people search for things in dependency treebanks?<br>

<br>

Thanks for your time and help.<br>

<br>

Best<br>

<br>

   Niels Ott<br>

<br>

<br>

--<br>

Niels Ott (M.A.), Computational Linguist<br>

SFB 833 "Bedeutungskonstitution", Projekt A4, Universität Tübingen<br>

<a href="http://www.sfs.uni-tuebingen.de/~nott" target="_blank">http://www.sfs.uni-tuebingen.de/~nott</a><br>

<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Mon, 30 Jul 2012 11:28:34 +0200<br>

From: marco turchi <<a href="mailto:marco.turchi@gmail.com">marco.turchi@gmail.com</a>><br>

Subject: [Corpora-List] Multilingual Machine Translation and Text<br>

        Mining Position at the Joint Research Centre - European Commission<br>

To: <a href="mailto:mt-list@eamt.org">mt-list@eamt.org</a><br>

Cc: moses-support <<a href="mailto:moses-support@mit.edu">moses-support@mit.edu</a>>, <a href="mailto:elsnet-list@elsnet.org">elsnet-list@elsnet.org</a>,<br>

        <a href="mailto:dbworld@cs.wisc.edu">dbworld@cs.wisc.edu</a>, <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>

<br>

On behalf of the Optima Team at the Joint Research Centre - European<br>

Commission<br>

<br>

==================================================================<br>

<br>

Please pass onto any potentially interested parties.<br>

Apologies for cross-posting.<br>

<br>

==================================================================<br>

<br>

The Optima Team at the Joint Research Centre - JRC - European Commission is<br>

currently looking for a Postdoctoral Researcher in the fields of<br>

Multilingual Machine Translation and Text Mining.<br>

<br>

The successful candidate will help improve and extend several text mining<br>

applications, but to a large extent s/he will work on improving and<br>

extending the JRC?s in-house machine translation (MT) system ONTS (OPTIMA<br>

News Translation System). ONTS is a predominantly statistical MT system<br>

based on Moses, but it additionally makes use of JRC?s in-house resources<br>

(e.g. lists of person names and their variants across different languages<br>

and scripts). To date, ONTS has been trained for 11 language pairs (all<br>

into English). See <a href="http://optima.jrc.it/Translate/" target="_blank">http://optima.jrc.it/Translate/</a> for a demo of the<br>

current status of ONTS and the publication ?ONTS: OPTIMA News Translation<br>

System? (Turchi et al., EACL?2012) for a technical description of the work<br>

carried out so far.<br>

<br>

Possible research avenues related to improving the machine translation<br>

results include (a) finding and gathering more training data; (b)<br>

exploiting comparable news collections to improve the MT performance; and<br>

(c) investigating pre-processing techniques for morphologically complex<br>

languages. Further possible research avenues are related to using MT<br>

technology (d) to improve other text mining tools, e.g. event extraction,<br>

cross-lingual linking of related news or multilingual document<br>

categorisation. Much will be left to the initiative of the candidate as<br>

long as the efforts are targeted towards the objective of the OPTIMA action.<br>

<br>

The system within which the results will be deployed is implemented in Java<br>

as a set of servlets in Tomcat and the data processing chain makes<br>

extensive use mark-up languages. Java programming skills and experience<br>

with mark-up languages are therefore required.<br>

<br>

Qualifications:<br>

<br>

   - University degree in computational linguistics, computer science or<br>

   related areas;<br>

   - Doctoral degree in a similar discipline, or equivalent work experience<br>

   of 5 years;<br>

   - Good written and spoken English language skills are required.  Given<br>

   the strong focus on multilinguality in the work, at least passive knowledge<br>

   of other languages is also required.<br>

   - Programming skills in Java;<br>

   - Hands-on experience with Moses or other, similar statistical machine<br>

   translation engines;<br>

   - Experience in an application-oriented setting would be beneficial;<br>

   - Ability to write scientific publications;<br>

   - Team player, proactive in research, as well as an ability to work<br>

   independently and to communicate efficiently.<br>

<br>

Indicative duration: 36 months<br>

Preferred starting date: ASAP<br>

<br>

JRC site: Ispra, Italy<br>

<br>

**CLOSING DATE FOR APPLICATIONS: 16/09/2012 23:59 CET**<br>

<br>

Further Information: <a href="http://recruitment.jrc.ec.europa.eu/?type=GH" target="_blank">http://recruitment.jrc.ec.europa.eu/?type=GH</a><br>

Code: 2012-IPR-G-30-000-00481 - CAT 30 - ISPRA<br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: text/html<br>

Size: 3453 bytes<br>

Desc: not available<br>

URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20120730/6ec6cbb0/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20120730/6ec6cbb0/attachment.txt</a>><br>

<br>

----------------------------------------------------------------------<br>

Send Corpora mailing list submissions to<br>

        <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:corpora-request@uib.no">corpora-request@uib.no</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:corpora-owner@uib.no">corpora-owner@uib.no</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of Corpora digest..."<br>

<br>

<br>

_______________________________________________<br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

<br>

<br>

End of Corpora Digest, Vol 61, Issue 28<br>

***************************************<br>

</blockquote></div>