16.2137, Qs: Lexical Bundles; German Wordlist with Hyphenation
LINGUIST List
linguist at linguistlist.org
Tue Jul 12 15:07:28 UTC 2005
LINGUIST List: Vol-16-2137. Tue Jul 12 2005. ISSN: 1068 - 4875.
Subject: 16.2137, Qs: Lexical Bundles; German Wordlist with Hyphenation
Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews (reviews at linguistlist.org)
Sheila Dooley, U of Arizona
Terry Langendoen, U of Arizona
Homepage: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.
Editor for this issue: Jessica Boynton <jessica at linguistlist.org>
================================================================
We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then strongly encouraged to post a summary to the list. This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.
In addition to posting a summary, we'd like to remind people that it
is usually a good idea to personally thank those individuals who have
taken the trouble to respond to the query.
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
===========================Directory==============================
1)
Date: 12-Jul-2005
From: Jennifer Eagleton < jenny at asian-emphasis.com >
Subject: Lexical Bundles
2)
Date: 11-Jul-2005
From: Gregor Sieber < gregor.sieber at student.uni-tuebingen.de >
Subject: German Wordlist with Hyphenation - New Spelling
-------------------------Message 1 ----------------------------------
Date: Tue, 12 Jul 2005 11:05:37
From: Jennifer Eagleton < jenny at asian-emphasis.com >
Subject: Lexical Bundles
Editor's note: Apologies for the delay in posting.
I notice that all of the studies I have read on this topic have focussed on 4
word bundles and that you they have all used what I would call large corpora
i.e. many millions of words. The rationale seems to be that with 5 word bundles
you do not get enough to analyse and that with three word bundles there are
probably too many to handle.
I want to do a study of bundles on a specific corpus I have, but which only has
600,000 words. To be able to work with large numbers of bundles, it would
therefore make sense to focus on 3 word bundles. I could do a study on 4 word
bundles, but the sample would be smaller.
So my question is, do people see any disadvantages on focusing on 3-word bundles
and, if so, what they might be?
Looking forward to hearing your responses.
-
ON BEHALF OF PROF. JOHN FLOWERDEW
DEPARTMENT OF ENGLISH AND COMMUNICATION
CITY UNIVERSITY OF HONG KONG
Linguistic Field(s): Text/Corpus Linguistics
-------------------------Message 2 ----------------------------------
Date: Tue, 12 Jul 2005 11:05:40
From: Gregor Sieber < gregor.sieber at student.uni-tuebingen.de >
Subject: German Wordlist with Hyphenation - New Spelling
I am a BA student in computational linguistics at the university of
Tübingen. For my BA thesis I am working on finite state patterns for German
following the work of Gosse Bouma (for Dutch). I want to use machine
learning algorithms to improve the results of the FS approach. For this
reason I am look for a word list in the new German orthography that
contains hyphenation points and could be used as training data for the
algorithm (TBL). The CELEX list, which would have been a god resource, is
still in the old orthography.
Thank you in advance for any hints about such a wordlist.
Best regards
Gregor Sieber
Linguistic Field(s): Computational Linguistics
-----------------------------------------------------------
LINGUIST List: Vol-16-2137
More information about the LINGUIST
mailing list