<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
hmm, not sure that's sensible. In this example you might end up with
extracting a date of "weekafter" if you removed the line breaks.<br>
<br>
The moral of the story is ... if you have XML use an XML parser to
extract the data. Yes it might take slightly longer to start with,
but the time spent will be re-paid by not having to worry about
weird edge cases and formatting issues,<br>
<br>
Mark<br>
<br>
<div class="moz-cite-prefix">On 30/06/14 21:01, anders bjorkelund
wrote:<br>
</div>
<blockquote
cite="mid:alpine.LFD.2.10.1406302159290.20850@brachvogel.ims.uni-stuttgart.de"
type="cite">The only purpose of line breaks in XML is to increase
human readability anyway. The first thing I do when I extract
stuff from XML with regexps is to get rid of all \r and \n, then
you don't have to think about that anyway (by substituting them
with the empty string). Might be somewhat suboptimal, but
typically speed isn't an issue anyway.
<br>
<br>
anders
<br>
<br>
On Mon, 30 Jun 2014, Matías Guzmán Naranjo wrote:
<br>
<br>
<blockquote type="cite">[^<] works for me In python
<br>
<br>
<br>
2014-06-30 21:44 GMT+02:00 maxwell
<a class="moz-txt-link-rfc2396E" href="mailto:maxwell@umiacs.umd.edu"><maxwell@umiacs.umd.edu></a>:
<br>
On 2014-06-30 15:33, Phil Gooch wrote:
<br>
On Mon, Jun 30, 2014 at 7:08 PM, Matías Guzmán
Naranjo
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:mortem.dei@gmail.com"><mortem.dei@gmail.com></a> wrote:
<br>
<br>
wouldn't just writing
<date>.*?</date> get me 'week after'?
<br>
<br>
<br>
I'd go for
<br>
<br>
<date>[^<]+</date>
<br>
<br>
which will consume line breaks. Of course, this
assumes that date only
<br>
contains text and no other markup.
<br>
<br>
<br>
Again, my knowledge of grep is probably dated. But I just tried
the above, and it didn't work (it did not consume
<br>
line breaks, so it couldn't find things that were on two
successive lines). Are you using some command line
<br>
parameter on grep that allows it to search across successive
lines?
<br>
<br>
Mike Maxwell
<br>
<br>
<br>
<br>
<br>
</blockquote>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<br>
</body>
</html>