[Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian
Hrafn Loftsson
HRAFN at ru.is
Thu Feb 15 13:19:57 UTC 2007
Hi everyone,
(It has been pointed out to me that, for some reason, my message to the
list appeared empty in some e-mail systems. Here is a second try:)
The paper: "J. Hajic (2000) Morphological tagging: Data vs.
Dictionaries", reports percentages of ambiguous tokens for English
(38.65%), Czech (45.97%), Estonian (40.24%), Hungarian (21.58%),
Romanian (40.00%) and Slovene (38.01%), using an annotated version of
Orwell's 1984 novel for each of these languages.
I need corresponding percentage number for Swedish, Danish and
Norwegian, calculated using ANY corpora.
Does anyone have this info (and preferably a reference to a paper which
discusses the issue)?
Regards,
Hrafn Loftsson
Assistant professor
Department of Computer Science
School of Science and Engineering
Reykjavik University
Iceland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070215/f290c383/attachment.htm>
More information about the Corpora
mailing list