[Corpora-List] Java-based chunk parser?

Edward Ivanovic edwardi at csse.unimelb.edu.au
Thu Nov 26 04:55:04 UTC 2009


Dear Colleagues,

I'm looking for a Java-based tool that will let me define a simple
grammar based on regular expressions to parse a given string.  For
example:

"2q34w-6q8w 5q8w-11q87w" (etc)

Could be parsed by the following rules:

A: \d+q
B: \d+w
C: <A>\-<B>
D: (<C> )+

mixing regex with my own labels (A,B,C,D).  The actual syntax for the
rules isn't important.

Parsing D will then give me the groupings for C (of which there will
be two), and access to the other labels.

Something like the RegexpChunkParser in NLTK does this very well, but
I can't use Python for this (needs to be Java), so was hoping someone
would know of something before I write my own.

Many thanks,
Edward

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list