[Corpora-List] Java-based chunk parser?
Sérgio Matos
aleixomatos at ua.pt
Thu Nov 26 10:20:18 UTC 2009
Hi,
Maybe the UIMA Regular Expression Annotator is a good option for this:
http://incubator.apache.org/uima/sandbox.html#regex.annotator
Sérgio
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Edward Ivanovic
Sent: 26 November 2009 04:55
To: corpora at uib.no
Subject: [Corpora-List] Java-based chunk parser?
Dear Colleagues,
I'm looking for a Java-based tool that will let me define a simple
grammar based on regular expressions to parse a given string. For
example:
"2q34w-6q8w 5q8w-11q87w" (etc)
Could be parsed by the following rules:
A: \d+q
B: \d+w
C: <A>\-<B>
D: (<C> )+
mixing regex with my own labels (A,B,C,D). The actual syntax for the
rules isn't important.
Parsing D will then give me the groupings for C (of which there will
be two), and access to the other labels.
Something like the RegexpChunkParser in NLTK does this very well, but
I can't use Python for this (needs to be Java), so was hoping someone
would know of something before I write my own.
Many thanks,
Edward
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list