<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    hmm, not sure that's sensible. In this example you might end up with

    extracting a date of "weekafter" if you removed the line breaks.<br>

    <br>

    The moral of the story is ... if you have XML use an XML parser to

    extract the data. Yes it might take slightly longer to start with,

    but the time spent will be re-paid by not having to worry about

    weird edge cases and formatting issues,<br>

    <br>

    Mark<br>

    <br>

    <div class="moz-cite-prefix">On 30/06/14 21:01, anders bjorkelund

      wrote:<br>

    </div>

    <blockquote

cite="mid:alpine.LFD.2.10.1406302159290.20850@brachvogel.ims.uni-stuttgart.de"

      type="cite">The only purpose of line breaks in XML is to increase

      human readability anyway. The first thing I do when I extract

      stuff from XML with regexps is to get rid of all \r and \n, then

      you don't have to think about that anyway (by substituting them

      with the empty string). Might be somewhat suboptimal, but

      typically speed isn't an issue anyway.

      <br>

      <br>

      anders

      <br>

      <br>

      On Mon, 30 Jun 2014, Matías Guzmán Naranjo wrote:

      <br>

      <br>

      <blockquote type="cite">[^<] works for me In python

        <br>

        <br>

        <br>

        2014-06-30 21:44 GMT+02:00 maxwell

        <a class="moz-txt-link-rfc2396E" href="mailto:maxwell@umiacs.umd.edu"><maxwell@umiacs.umd.edu></a>:

        <br>

              On 2014-06-30 15:33, Phil Gooch wrote:

        <br>

                    On Mon, Jun 30, 2014 at 7:08 PM, Matías Guzmán

        Naranjo

        <br>

                    <a class="moz-txt-link-rfc2396E" href="mailto:mortem.dei@gmail.com"><mortem.dei@gmail.com></a> wrote:

        <br>

        <br>

                          wouldn't just writing

        <date>.*?</date> get me 'week after'?

        <br>

        <br>

        <br>

                    I'd go for

        <br>

        <br>

                    <date>[^<]+</date>

        <br>

        <br>

                    which will consume line breaks. Of course, this

        assumes that date only

        <br>

                    contains text and no other markup.

        <br>

        <br>

        <br>

        Again, my knowledge of grep is probably dated.  But I just tried

        the above, and it didn't work (it did not consume

        <br>

        line breaks, so it couldn't find things that were on two

        successive lines).  Are you using some command line

        <br>

        parameter on grep that allows it to search across successive

        lines?

        <br>

        <br>

           Mike Maxwell

        <br>

        <br>

        <br>

        <br>

        <br>

      </blockquote>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>

Corpora mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>

<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>