Corpora: ngram frequencies with intervening words?
Hristo Tanev
htanev at yahoo.co.uk
Tue Apr 24 05:05:10 UTC 2001
Dear All,
This topic is interesting, but another question arises
in me too.
Does someone know a study for non-strict grammars,
where a space is allowed between constituents?
I think for grammars, avoiding POS-tagger errors.
These grammars allow a defined space between
constituents, which doesn't contain some POS tags and
punct marks.
For example the rule:
NP -> AP N
could be translated into the "fuzzy" rule
NP -> AP (X) N
, where
length(X)<=2 and X doesn't contain V, N, A or ","
This way some errors of POS tagger could be ignored in
the sequence X.
I haven't read about such grammars, but it doesn't
mean they don't exist, still more I am not 100%
convinced they are effective, but it is interesting,
isn't it?
Best wishes,
Hristo Tanev
--- Bruce Lambert <lambertb at uic.edu> wrote: >
Greetings,
>
> In the simplest case, when we compute ngram word
> frequencies, we consider
> adjacent words as ngrams. But we may also want to
> know about pairs of words
> that occur within n words of one another. Is there a
> program out there to
> compute ngram frequencies allowing a variable-width
> window between the
> words in the bigram? Ideally, the program would
> allow the user to rank the
> bigrams not only by bigram frequency, but also by
> the frequency of the
> intervening word patterns. For example, in a
> database of eighth grade
> science lessons, the bigram "atom smallest" might
> occur several times in
> different contexts. I'd like output approximately as
> follows:
>
> atom smallest (3) (1 "was the") (2 "is the")
>
> Indicating that the bigram "atom smallest" with
> window size 2 occurred 3
> times total, once with the intervening words "was
> the" and twice with the
> intervening words "is the".
>
> I can think of a brute force way to do this myself,
> of course, but I'd
> rather not reinvent the wheel if I can avoid it.
>
> -bruce
>
>
____________________________________________________________
Do You Yahoo!?
Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk
or your free @yahoo.ie address at http://mail.yahoo.ie
More information about the Corpora
mailing list