[Corpora-List] Corpora Digest, Vol 61, Issue 28

Maarten Janssen maartenpt at gmail.com
Mon Jul 30 16:11:11 UTC 2012


L
On Jul 30, 2012 12:00 PM, <corpora-request at uib.no> wrote:

> Today's Topics:
>
>    1.  Querying Dependency-Annotated Corpora (Niels Ott)
>    2.  Multilingual Machine Translation and Text Mining Position at
>       the Joint Research Centre - European Commission (marco turchi)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 30 Jul 2012 10:58:51 +0200
> From: Niels Ott <nott at sfs.uni-tuebingen.de>
> Subject: [Corpora-List] Querying Dependency-Annotated Corpora
> To: corpora at uib.no
>
> Dear Corpora People,
>
> I spent some time googling for a tool that allows to explore and query
> huge dependency-annotated corpora. With huge I 'm referring to something
> as large as sDeWaC (~44M sentences), annotated the way MaltParser would
> do it automagically. I found no such tool.
>
> How do people search for things in dependency treebanks?
>
> Thanks for your time and help.
>
> Best
>
>    Niels Ott
>
>
> --
> Niels Ott (M.A.), Computational Linguist
> SFB 833 "Bedeutungskonstitution", Projekt A4, Universität Tübingen
> http://www.sfs.uni-tuebingen.de/~nott
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 30 Jul 2012 11:28:34 +0200
> From: marco turchi <marco.turchi at gmail.com>
> Subject: [Corpora-List] Multilingual Machine Translation and Text
>         Mining Position at the Joint Research Centre - European Commission
> To: mt-list at eamt.org
> Cc: moses-support <moses-support at mit.edu>, elsnet-list at elsnet.org,
>         dbworld at cs.wisc.edu, corpora at uib.no
>
> On behalf of the Optima Team at the Joint Research Centre - European
> Commission
>
> ==================================================================
>
> Please pass onto any potentially interested parties.
> Apologies for cross-posting.
>
> ==================================================================
>
> The Optima Team at the Joint Research Centre - JRC - European Commission is
> currently looking for a Postdoctoral Researcher in the fields of
> Multilingual Machine Translation and Text Mining.
>
> The successful candidate will help improve and extend several text mining
> applications, but to a large extent s/he will work on improving and
> extending the JRC?s in-house machine translation (MT) system ONTS (OPTIMA
> News Translation System). ONTS is a predominantly statistical MT system
> based on Moses, but it additionally makes use of JRC?s in-house resources
> (e.g. lists of person names and their variants across different languages
> and scripts). To date, ONTS has been trained for 11 language pairs (all
> into English). See http://optima.jrc.it/Translate/ for a demo of the
> current status of ONTS and the publication ?ONTS: OPTIMA News Translation
> System? (Turchi et al., EACL?2012) for a technical description of the work
> carried out so far.
>
> Possible research avenues related to improving the machine translation
> results include (a) finding and gathering more training data; (b)
> exploiting comparable news collections to improve the MT performance; and
> (c) investigating pre-processing techniques for morphologically complex
> languages. Further possible research avenues are related to using MT
> technology (d) to improve other text mining tools, e.g. event extraction,
> cross-lingual linking of related news or multilingual document
> categorisation. Much will be left to the initiative of the candidate as
> long as the efforts are targeted towards the objective of the OPTIMA
> action.
>
> The system within which the results will be deployed is implemented in Java
> as a set of servlets in Tomcat and the data processing chain makes
> extensive use mark-up languages. Java programming skills and experience
> with mark-up languages are therefore required.
>
> Qualifications:
>
>    - University degree in computational linguistics, computer science or
>    related areas;
>    - Doctoral degree in a similar discipline, or equivalent work experience
>    of 5 years;
>    - Good written and spoken English language skills are required.  Given
>    the strong focus on multilinguality in the work, at least passive
> knowledge
>    of other languages is also required.
>    - Programming skills in Java;
>    - Hands-on experience with Moses or other, similar statistical machine
>    translation engines;
>    - Experience in an application-oriented setting would be beneficial;
>    - Ability to write scientific publications;
>    - Team player, proactive in research, as well as an ability to work
>    independently and to communicate efficiently.
>
> Indicative duration: 36 months
> Preferred starting date: ASAP
>
> JRC site: Ispra, Italy
>
> **CLOSING DATE FOR APPLICATIONS: 16/09/2012 23:59 CET**
>
> Further Information: http://recruitment.jrc.ec.europa.eu/?type=GH
> Code: 2012-IPR-G-30-000-00481 - CAT 30 - ISPRA
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 3453 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20120730/6ec6cbb0/attachment.txt
> >
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
>         corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
>         corpora-request at uib.no
>
> You can reach the person managing the list at
>         corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 61, Issue 28
> ***************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120730/31dc0b58/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list