21.221, Diss: Comp Ling: Junczys-Dowmunt: 'German Compound Nouns and their...'

linguist at LINGUISTLIST.ORG linguist at LINGUISTLIST.ORG
Thu Jan 14 15:51:23 UTC 2010


LINGUIST List: Vol-21-221. Thu Jan 14 2010. ISSN: 1068 - 4875.

Subject: 21.221, Diss: Comp Ling: Junczys-Dowmunt: 'German Compound Nouns and their...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Monica Macaulay, U of Wisconsin-Madison  
Eric Raimy, U of Wisconsin-Madison  
Joseph Salmons, U of Wisconsin-Madison  
Anja Wanner, U of Wisconsin-Madison  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Di Wdzenczny <di at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 11-Jan-2010
From: Marcin Junczys-Dowmunt < junczys at amu.edu.pl >
Subject: German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora
 

	
-------------------------Message 1 ---------------------------------- 
Date: Thu, 14 Jan 2010 10:49:33
From: Marcin Junczys-Dowmunt [junczys at amu.edu.pl]
Subject: German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=21-221.html&submissionid=2373386&topicid=14&msgnumber=1
  


Institution: Adam Mickiewicz University 
Program: Linguistics Program 
Dissertation Status: Completed 
Degree Date: 2009 

Author: Marcin Junczys-Dowmunt

Dissertation Title: German Compound Nouns and their Polish Equivalents:
Automatic extraction, analysis and verification based on
parallel corpora 

Dissertation URL:  http://www.staff.amu.edu.pl/~junczys/index.php?title=Publications

Linguistic Field(s): Computational Linguistics

Subject Language(s): German, Standard (deu)
                     Polish (pol)


Dissertation Director(s):
Krzysztof Jassem
Jerzy Pogonowski

Dissertation Abstract:

We apply methods first used for statistical machine translation to the
automatic extraction and analysis of German compound nouns and their Polish
equivalents.
A large German-Polish parallel corpus is used as the main source of data.
In the course of this work several new applications are developed and
described.With the help of these applications a set of more than 140,000
unique German compound nouns is created for which we are able to identify
more than 200,000 unique Polish counterparts in the corpus.
>From this data we extract several subsets of equivalence pairs that have
been filtered either automatically or half-automatically. Additionally, one
manually verified subset of equivalence pairs is created. These data sets
serve as reference material for the verification of several claims from
other works in contrastive linguistics that based their results on much
smaller amounts of data. Apart from that we supply information about the
quantitative distribution of German compound nouns and their Polish
equivalents which has not been provided in any earlier work. 




-----------------------------------------------------------
LINGUIST List: Vol-21-221	

	



More information about the LINGUIST mailing list