[Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian
Joakim Nivre
nivre at msi.vxu.se
Thu Feb 15 13:31:40 UTC 2007
Hi Hrafn,
You can find some statistics about Swedish in our article:
Nivre, J. and Grönqvist, L. (2001) Tagging a Corpus of Spoken Swedish.
International Journal of Corpus Linguistics 6(1), 47-78.
A pre-print is available from my home page at:
http://w3.msi.vxu..se/~nivre/research/publ.html
The percentage of ambiguous tokens we get for the Stockholm-Umeå corpus is
45.37. However, this is measured with the base tag set, consisting of only
23 tags. With the full tag set, containing some 150 tags, the percentage
will be higher. This is one of the reasons why it is very difficult to
compare these figures across languages and corpora. You will find more
details in the paper. (The first place to look is table 1.)
Best,
Joakim
On Thu, 15 Feb 2007, Hrafn Loftsson wrote:
> Hi everyone,
>
>
>
> (It has been pointed out to me that, for some reason, my message to the
> list appeared empty in some e-mail systems. Here is a second try:)
>
>
>
> The paper: "J. Hajic (2000) Morphological tagging: Data vs.
> Dictionaries", reports percentages of ambiguous tokens for English
> (38.65%), Czech (45.97%), Estonian (40.24%), Hungarian (21.58%),
> Romanian (40.00%) and Slovene (38.01%), using an annotated version of
> Orwell's 1984 novel for each of these languages.
>
>
>
> I need corresponding percentage number for Swedish, Danish and
> Norwegian, calculated using ANY corpora.
>
>
>
> Does anyone have this info (and preferably a reference to a paper which
> discusses the issue)?
>
>
>
> Regards,
>
> Hrafn Loftsson
>
> Assistant professor
>
> Department of Computer Science
>
> School of Science and Engineering
>
> Reykjavik University
>
> Iceland
>
>
==================================================================
Joakim Nivre
Växjö University Uppsala University
School of Mathematics Department of Linguistics
and Systems Engineering and Philology
SE-35195 Växjö Box 635, SE-75126 Uppsala
Tel: +46 470 708992 Tel: +46 18 4717009
Fax: +46 470 84004 Fax: +46 18 4711094
E-mail: nivre at msi.vxu.se E-mail: joakim.nivre at lingfil.uu.se
URL: http://www.msi.vxu.se/users/nivre
==================================================================
More information about the Corpora
mailing list