16.2137, Qs: Lexical Bundles; German Wordlist with Hyphenation

LINGUIST List linguist at linguistlist.org
Tue Jul 12 15:07:28 UTC 2005


LINGUIST List: Vol-16-2137. Tue Jul 12 2005. ISSN: 1068 - 4875.

Subject: 16.2137, Qs: Lexical Bundles; German Wordlist with Hyphenation

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Jessica Boynton <jessica at linguistlist.org>
================================================================  

We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then strongly encouraged to post a summary to the list. This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it
is usually a good idea to personally thank those individuals who have
taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================  

1)
Date: 12-Jul-2005
From: Jennifer Eagleton < jenny at asian-emphasis.com >
Subject: Lexical Bundles 

2)
Date: 11-Jul-2005
From: Gregor Sieber < gregor.sieber at student.uni-tuebingen.de >
Subject: German Wordlist with Hyphenation - New Spelling 

	
-------------------------Message 1 ---------------------------------- 
Date: Tue, 12 Jul 2005 11:05:37
From: Jennifer Eagleton < jenny at asian-emphasis.com >
Subject: Lexical Bundles 
 

Editor's note: Apologies for the delay in posting.

I notice that all of the studies I have read on this topic have focussed on 4
word bundles and that you they have all used what I would call large corpora
i.e. many millions of words. The rationale seems to be that with 5 word bundles
you do not get enough to analyse and that with three word bundles there are
probably too many to handle.

I want to do a study of bundles on a specific corpus I have, but which only has
600,000 words. To be able to work with large numbers of bundles, it would
therefore make sense to focus on 3 word bundles. I could do a study on 4 word
bundles, but the sample would be smaller.

So my question is, do people see any disadvantages on focusing on 3-word bundles
and, if so, what they might be?

Looking forward to hearing your responses.

- 
ON BEHALF OF PROF. JOHN FLOWERDEW
DEPARTMENT OF ENGLISH AND COMMUNICATION
CITY UNIVERSITY OF HONG KONG 

Linguistic Field(s): Text/Corpus Linguistics



	
-------------------------Message 2 ---------------------------------- 
Date: Tue, 12 Jul 2005 11:05:40
From: Gregor Sieber < gregor.sieber at student.uni-tuebingen.de >
Subject: German Wordlist with Hyphenation - New Spelling 

	

I am a BA student in computational linguistics at the university of
Tübingen. For my BA thesis I am working on finite state patterns for German
following the work of Gosse Bouma (for Dutch). I want to use machine
learning algorithms to improve the results of the FS approach. For this
reason I am look for a word list in the new German orthography that
contains hyphenation points and could be used as training data for the
algorithm (TBL). The CELEX list, which would have been a god resource, is
still in the old orthography.

Thank you in advance for any hints about such a wordlist.

Best regards

Gregor Sieber 

Linguistic Field(s): Computational Linguistics


 



-----------------------------------------------------------
LINGUIST List: Vol-16-2137	

	



More information about the LINGUIST mailing list