[Corpora-List] corpora with regular expression engine (syntactic pattern)

Mark Davies Mark_Davies at byu.edu
Mon Feb 25 21:49:06 UTC 2013


As long as others are listing online interfaces to large corpora that do regular expressions / wildcards, I might as well mention the BYU corpora (http://corpus.byu.edu).



For example, BYU-BNC (http://corpus.byu.edu/bnc) can do "[vh*] [v?n*] [a*] [jj*] [nn*]" in less than four seconds:



http://corpus.byu.edu/bnc/?c=bnc&q=21313156



And of course the interface also allows searches by synonyms, lemma, wildcards, alternates, customized word lists, and any combinations of these, etc etc



MD



============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Gemma Boleda [gemma.boleda at upf.edu]
Sent: Monday, February 25, 2013 2:24 PM
To: Corpora at uib.no
Subject: Re: [Corpora-List] corpora with regular expression engine (syntactic pattern)

Hi Austina,

there are also a couple of online interfaces to corpora that allow for POS queries in regular expressions, such as for example:

Serge Sharoff's "Leeds CQP" search interface (English corpora available, and also corpora for other languages): http://corpus.leeds.ac.uk/internet.html

UPF's interface to CUCWeb (Catalan corpus): http://ramsesii.upf.es/cgi-bin/cucweb/search-form.pl?lang=en_US

These two interfaces are based on the IMS Open Corpus Workbench that Marco Baroni mentioned; indeed, this tool provides a module to easily build web interfaces with its core corpus processor as a back-end.

Best,
Gemma.

--
Gemma Boleda
The University of Texas at Austin
http://gboleda.utcompling.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130225/a527990c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list