[Corpora-List] Fwd: Number of unique words in text for different languages
    coffey at cli.unipi.it 
    coffey at cli.unipi.it
       
    Thu Aug 12 15:45:04 UTC 2010
    
    
  
Quoting Jim Fidelholtz <fidelholtz at gmail.com>:
> Hi all,
>
> As a disclaimer, I have not worked with any of the tokenizers. For the type
> of results originally reported, however, I do have a suggestion for a
> possible partial explanation, based on some experience with Spanish. There
> is a real stylistic rule in Spanish which makes speakers and especially
> writers avoid repeating the same 'content word' within the same or
> contiguous sentences or clauses, using instead a synonym or paraphrase.
... and the same is true for Italian.
Steve Coffey.
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
    
    
More information about the Corpora
mailing list