<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Olivier and Kevin,<br>

    <br>

    A recent alternative to Xkwic, as a wrapper to the CQP search

    engine,<br>

    is TXM - <a class="moz-txt-link-freetext" href="http://sourceforge.net/projects/txm">http://sourceforge.net/projects/txm</a>:<br>

    - it runs on Windows, Mac OS X and Linux<br>

    - its graphical user interface is available in English, Russian and

    French<br>

    - it is also available as a web portal software (allowing you to

    give online<br>

      access to your own corpora with access control builtin)<br>

    - it embeds the R software to allow you to apply any statistical<br>

      model you could imagine of to CQP extractions<br>

    - it works hard to process all kinds of data formats: Unicode raw

    text,<br>

      XML, various flavours of TEI P5, Transcriber speech

    transcriptions,<br>

      TMX aligned corpora, native CWB...<br>

    - runs TreeTagger for you on the fly when importing corpora<br>

    - it can decently handle at most 10 million words corpora

    (currently)<br>

    - it is free and open-source<br>

    <br>

    For more info:<br>

    - in English see <a class="moz-txt-link-rfc2396E" href="http://wiki.tei-c.org/index.php/TXM"><http://wiki.tei-c.org/index.php/TXM></a><br>

    - a whole one day introduction tutorial screencast (in French)

<a class="moz-txt-link-rfc2396E" href="http://txm.sourceforge.net/enregistrement_atelier_initiation_TXM_fr.html"><http://txm.sourceforge.net/enregistrement_atelier_initiation_TXM_fr.html></a><br>

    - the scientific project background

    <a class="moz-txt-link-rfc2396E" href="http://textometrie.ens-lyon.fr/?lang=en"><http://textometrie.ens-lyon.fr/?lang=en></a><br>

    <br>

    A last remark concerning the power of the CQP search engine.<br>

    It combines two different levels of regular expressions other words:<br>

    - a first level on the <span style="white-space: pre;">Part Of

      Speech tags values, word graphical forms or lemma...<br>

      - a second level on word sequences<br>

    </span>For example, an expression like: [pos="V.*"]+<br>

    can express any sequence of verbs of any length:<br>

    - "V.*" is at the first level (<span style="white-space: pre;">Part

      Of Speech tag value)</span>: any tag beginning with letter 'V'

    (ignore sub-categories)<br>

    - [...]+ is at the second level (sequence of words): two, three,

    four... adjacent verbs<br>

    <br>

    Best,<br>

    Serge<br>

    <br>

    Le 02/24/2013 08:02 PM, Kevin B. Cohen a écrit :<br>

    <span style="white-space: pre;">> Hi, Olivier,<br>

      > <br>

      > If you're OK with English, the tgrep and Xkwic programs will

      allow<br>

      > you to do this. Both should work on a Mac. If you have

      trouble<br>

      > using them, two of my students wrote nice tutorials for them

      this<br>

      > past semester.<br>

      > <br>

      > Kev<br>

      > <br>

      > On Sun, Feb 24, 2013 at 5:29 AM, Olivier Austina <br>

      > <a class="moz-txt-link-rfc2396E" href="mailto:olivier.austina@gmail.com"><olivier.austina@gmail.com></a> wrote:<br>

      >> Hi,<br>

      >> <br>

      >> Is there a corpora which can be queried using Part Of

      Speech tags<br>

      >> in a regular expression? -- Regards Austina<br>

      >> <br>

      >> <br>

      >> _______________________________________________

      UNSUBSCRIBE from<br>

      >> this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a> Corpora

      mailing<br>

      >> list <a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>

      <a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a><br>

      >> <br>

      > <br>

      > </span><br>

    <br>

    <br>

  </body>

</html>