[Corpora-List] ANC Bigrams and Trigrams

Alex Murzaku lissus at gmail.com
Mon Feb 14 14:07:20 UTC 2005


I am working with pronouns on Albanian and I am using trigram data. I
would suppose that for all those dealing with anaphora phenomena,
n-grams beyond sentence/paragraph boundaries would be useful. "This"
[antecedent="useful"] would be true for any language... but I see its
usage limited so, perhaps having this data separate would make more
sense.


On Fri, 11 Feb 2005 14:42:18 -0500, Nancy Ide <ide at cs.vassar.edu> wrote:
> We are generating bigram and trigram data from the ANC First Release,
> which will very soon be available on the (new and improved) ANC
> website. We have a question for those who might be interested in this
> kind of data:  is it useful to generate the data for word pairs/triples
> that span sentence (or even paragraph) boundaries? Is there any
> advantage if we provide two sets of the bigram and trigram data, one
> that spans such boundaries and one that doesn't?
>
> Thanks,
> Nancy Ide
>
> =======================================================
>
> Nancy Ide
>
> Professor  of Computer Science
> Vassar College
> Poughkeepsie, NY 12604-0520 USA
> Tel: +1 845 437-5988 Fax: +1 845 437-7498
> ide at cs.vassar.edu
>
> Chercheur Associe
> Equipe Langue et Dialogue, LORIA/CNRS
> Campus Scientifique - BP 239
> 54506 Vandoeuvre-les-Nancy FRANCE
> Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
> ide at loria.fr
>
> =======================================================
>
>



More information about the Corpora mailing list