[Corpora-List] corpora with regular expression engine (syntactic pattern)

Marco Baroni marco.baroni at unitn.it
Sun Feb 24 13:43:25 UTC 2013


Dear Austina,

If I understand your question correctly, it pertains more to the query 
engine you use to search the corpus than about the corpus itself 
(assuming it is POS-tagged).

Given a corpus with POS tags (for example, for English and French you 
can find them, also, here: 
http://wacky.sslmit.unibo.it/doku.php?id=corpora), you can index them 
with the IMS Open Corpus Workbench (http://cwb.sourceforge.net/), and 
then you will be able to issue queries expressed as regular expressions 
over sequences of POS, e.g., things like:

VERB ART? ADJ* NOUN
(a verb optionally followed by an article, 0 o more adjectives, and a noun)

Hth,

Marco


>
> 2013/2/24 Olivier Austina <olivier.austina at gmail.com
> <mailto:olivier.austina at gmail.com>>
>
>     Hi Matías,
>     English, French or Romanian but any language is welcome. Thank you.
>
>     Austina
>
>
>     2013/2/24 Matías Guzmán <mortem.dei at gmail.com
>     <mailto:mortem.dei at gmail.com>>
>
>         At least give us the language you want.
>
>         Matías Guzmán Naranjo.
>
>
>         2013/2/24 Olivier Austina <olivier.austina at gmail.com
>         <mailto:olivier.austina at gmail.com>>
>
>             Hi,
>
>             Is there a corpora which can be queried using Part Of Speech
>             tags in a  regular expression?
>             --
>             Regards
>             Austina
>
>
>             _______________________________________________
>             UNSUBSCRIBE from this page:
>             http://mailman.uib.no/options/corpora
>             Corpora mailing list
>             Corpora at uib.no <mailto:Corpora at uib.no>
>             http://mailman.uib.no/listinfo/corpora
>
>
>
>
>
>     --
>     Regards
>     Austina
>
>


-- 
Marco Baroni
Center for Mind/Brain Sciences (CIMeC)
University of Trento
http://clic.cimec.unitn.it/marco

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list