[Corpora-List] corpora with regular expression engine (syntactic pattern)
Marco Baroni
marco.baroni at unitn.it
Sun Feb 24 13:43:25 UTC 2013
Dear Austina,
If I understand your question correctly, it pertains more to the query
engine you use to search the corpus than about the corpus itself
(assuming it is POS-tagged).
Given a corpus with POS tags (for example, for English and French you
can find them, also, here:
http://wacky.sslmit.unibo.it/doku.php?id=corpora), you can index them
with the IMS Open Corpus Workbench (http://cwb.sourceforge.net/), and
then you will be able to issue queries expressed as regular expressions
over sequences of POS, e.g., things like:
VERB ART? ADJ* NOUN
(a verb optionally followed by an article, 0 o more adjectives, and a noun)
Hth,
Marco
>
> 2013/2/24 Olivier Austina <olivier.austina at gmail.com
> <mailto:olivier.austina at gmail.com>>
>
> Hi Matías,
> English, French or Romanian but any language is welcome. Thank you.
>
> Austina
>
>
> 2013/2/24 Matías Guzmán <mortem.dei at gmail.com
> <mailto:mortem.dei at gmail.com>>
>
> At least give us the language you want.
>
> Matías Guzmán Naranjo.
>
>
> 2013/2/24 Olivier Austina <olivier.austina at gmail.com
> <mailto:olivier.austina at gmail.com>>
>
> Hi,
>
> Is there a corpora which can be queried using Part Of Speech
> tags in a regular expression?
> --
> Regards
> Austina
>
>
> _______________________________________________
> UNSUBSCRIBE from this page:
> http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
>
>
>
>
> --
> Regards
> Austina
>
>
--
Marco Baroni
Center for Mind/Brain Sciences (CIMeC)
University of Trento
http://clic.cimec.unitn.it/marco
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list