<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 30/06/14 19:08, Matías Guzmán

      Naranjo wrote:<br>

    </div>

    <blockquote

cite="mid:CAKrYe9meLFSGyCDgb7gmB23BbuX-FQutMg=gdh1x955pGQaMew@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>wouldn't just writing <date>.*?</date> get me

          'week after'?<br>

        </div>

      </div>

    </blockquote>

    That would depend on what options your regexp parser was using. By

    default many of them don't let . match newline characters,<br>

    <br>

    Mark<br>

    <br>

    <blockquote

cite="mid:CAKrYe9meLFSGyCDgb7gmB23BbuX-FQutMg=gdh1x955pGQaMew@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        I really can do everything I need with regular expressions. The

        question is more about what is easier in the long run. Some

        times I feel I'm writing too many 'for's and 'if's...<br>

        <div>

          <div>

            <div class="gmail_extra"><br>

              <br>

              <div class="gmail_quote">2014-06-30 18:16 GMT+02:00

                maxwell <span dir="ltr"><<a moz-do-not-send="true"

                    href="mailto:maxwell@umiacs.umd.edu" target="_blank">maxwell@umiacs.umd.edu</a>></span>:<br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div class="">On 2014-06-30 10:13, Darren Cook wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      E.g. if your document looks like this, I'd rather

                      use a regex to find<br>

                      the proper nouns:<br>

                      <br>

                        I am off to <place>London</place>

                      <date>tomorrow</date>, and then<br>

                      <place>Cambridge</place> with

                      <person>Mary</person> the

                      <date>week<br>

                      after</date>.<br>

                    </blockquote>

                    <br>

                  </div>

                  But if you wanted to find all the

                  <date>...</date> elements, and the line

                  breaks are as shown, a regex by itself isn't going to

                  work (in particular, it won't find 'week after').  You

                  need a parser, or else you need to do some

                  normalization of the XML (making sure line breaks

                  don't occur inside the XML elements of interest).  And

                  if you're going to normalize the XML anyway, you might

                  be better off using an XML parser in the first place.<br>

                  <br>

                     Mike Maxwell

                  <div class="HOEnZb">

                    <div class="h5"><br>

                      <br>

                      _______________________________________________<br>

                      UNSUBSCRIBE from this page: <a

                        moz-do-not-send="true"

                        href="http://mailman.uib.no/options/corpora"

                        target="_blank">http://mailman.uib.no/options/corpora</a><br>

                      Corpora mailing list<br>

                      <a moz-do-not-send="true"

                        href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

                      <a moz-do-not-send="true"

                        href="http://mailman.uib.no/listinfo/corpora"

                        target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

                    </div>

                  </div>

                </blockquote>

              </div>

              <br>

            </div>

          </div>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>

Corpora mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>

<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>