[Corpora-List] Fwd: Number of unique words in text for different languages

coffey at cli.unipi.it coffey at cli.unipi.it
Thu Aug 12 15:45:04 UTC 2010


Quoting Jim Fidelholtz <fidelholtz at gmail.com>:

> Hi all,
>
> As a disclaimer, I have not worked with any of the tokenizers. For the type
> of results originally reported, however, I do have a suggestion for a
> possible partial explanation, based on some experience with Spanish. There
> is a real stylistic rule in Spanish which makes speakers and especially
> writers avoid repeating the same 'content word' within the same or
> contiguous sentences or clauses, using instead a synonym or paraphrase.

... and the same is true for Italian.

Steve Coffey.

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list