<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Olivier and Kevin,<br>
<br>
A recent alternative to Xkwic, as a wrapper to the CQP search
engine,<br>
is TXM - <a class="moz-txt-link-freetext" href="http://sourceforge.net/projects/txm">http://sourceforge.net/projects/txm</a>:<br>
- it runs on Windows, Mac OS X and Linux<br>
- its graphical user interface is available in English, Russian and
French<br>
- it is also available as a web portal software (allowing you to
give online<br>
access to your own corpora with access control builtin)<br>
- it embeds the R software to allow you to apply any statistical<br>
model you could imagine of to CQP extractions<br>
- it works hard to process all kinds of data formats: Unicode raw
text,<br>
XML, various flavours of TEI P5, Transcriber speech
transcriptions,<br>
TMX aligned corpora, native CWB...<br>
- runs TreeTagger for you on the fly when importing corpora<br>
- it can decently handle at most 10 million words corpora
(currently)<br>
- it is free and open-source<br>
<br>
For more info:<br>
- in English see <a class="moz-txt-link-rfc2396E" href="http://wiki.tei-c.org/index.php/TXM"><http://wiki.tei-c.org/index.php/TXM></a><br>
- a whole one day introduction tutorial screencast (in French)
<a class="moz-txt-link-rfc2396E" href="http://txm.sourceforge.net/enregistrement_atelier_initiation_TXM_fr.html"><http://txm.sourceforge.net/enregistrement_atelier_initiation_TXM_fr.html></a><br>
- the scientific project background
<a class="moz-txt-link-rfc2396E" href="http://textometrie.ens-lyon.fr/?lang=en"><http://textometrie.ens-lyon.fr/?lang=en></a><br>
<br>
A last remark concerning the power of the CQP search engine.<br>
It combines two different levels of regular expressions other words:<br>
- a first level on the <span style="white-space: pre;">Part Of
Speech tags values, word graphical forms or lemma...<br>
- a second level on word sequences<br>
</span>For example, an expression like: [pos="V.*"]+<br>
can express any sequence of verbs of any length:<br>
- "V.*" is at the first level (<span style="white-space: pre;">Part
Of Speech tag value)</span>: any tag beginning with letter 'V'
(ignore sub-categories)<br>
- [...]+ is at the second level (sequence of words): two, three,
four... adjacent verbs<br>
<br>
Best,<br>
Serge<br>
<br>
Le 02/24/2013 08:02 PM, Kevin B. Cohen a écrit :<br>
<span style="white-space: pre;">> Hi, Olivier,<br>
> <br>
> If you're OK with English, the tgrep and Xkwic programs will
allow<br>
> you to do this. Both should work on a Mac. If you have
trouble<br>
> using them, two of my students wrote nice tutorials for them
this<br>
> past semester.<br>
> <br>
> Kev<br>
> <br>
> On Sun, Feb 24, 2013 at 5:29 AM, Olivier Austina <br>
> <a class="moz-txt-link-rfc2396E" href="mailto:olivier.austina@gmail.com"><olivier.austina@gmail.com></a> wrote:<br>
>> Hi,<br>
>> <br>
>> Is there a corpora which can be queried using Part Of
Speech tags<br>
>> in a regular expression? -- Regards Austina<br>
>> <br>
>> <br>
>> _______________________________________________
UNSUBSCRIBE from<br>
>> this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a> Corpora
mailing<br>
>> list <a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a><br>
>> <br>
> <br>
> </span><br>
<br>
<br>
</body>
</html>