[Corpora-List] ANC Bigrams and Trigrams

Nancy Ide ide at cs.vassar.edu
Fri Feb 11 19:42:18 UTC 2005


We are generating bigram and trigram data from the ANC First Release,
which will very soon be available on the (new and improved) ANC
website. We have a question for those who might be interested in this
kind of data:  is it useful to generate the data for word pairs/triples
that span sentence (or even paragraph) boundaries? Is there any
advantage if we provide two sets of the bigram and trigram data, one
that spans such boundaries and one that doesn't?

Thanks,
Nancy Ide


=======================================================

Nancy Ide

Professor  of Computer Science
Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 845 437-5988 Fax: +1 845 437-7498
ide at cs.vassar.edu

Chercheur Associe
Equipe Langue et Dialogue, LORIA/CNRS
Campus Scientifique - BP 239
54506 Vandoeuvre-les-Nancy FRANCE
Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
ide at loria.fr

=======================================================



More information about the Corpora mailing list