[Corpora-List] Java-based chunk parser?

Alexandre Rafalovitch arafalov at gmail.com
Sat Dec 5 17:13:31 UTC 2009


ANTLR might be a good tool for that, though the learning curve might
be a little steep...

http://www.antlr.org/

Regards,
    Alex.
Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/
- I think age is a very high price to pay for maturity (Tom Stoppard)


On Wed, Nov 25, 2009 at 11:55 PM, Edward Ivanovic
<edwardi at csse.unimelb.edu.au> wrote:
> Dear Colleagues,
>
> I'm looking for a Java-based tool that will let me define a simple
> grammar based on regular expressions to parse a given string.  For
> example:
>
> "2q34w-6q8w 5q8w-11q87w" (etc)
>
> Could be parsed by the following rules:
>
> A: \d+q
> B: \d+w
> C: <A>\-<B>
> D: (<C> )+
>
> mixing regex with my own labels (A,B,C,D).  The actual syntax for the
> rules isn't important.
>
> Parsing D will then give me the groupings for C (of which there will
> be two), and access to the other labels.
>
> Something like the RegexpChunkParser in NLTK does this very well, but
> I can't use Python for this (needs to be Java), so was hoping someone
> would know of something before I write my own.
>
> Many thanks,
> Edward
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list