[Corpora-List] XML parsers vs regex

Matías Guzmán Naranjo mortem.dei at gmail.com
Mon Jun 30 19:46:49 UTC 2014


[^<] works for me In python


2014-06-30 21:44 GMT+02:00 maxwell <maxwell at umiacs.umd.edu>:

> On 2014-06-30 15:33, Phil Gooch wrote:
>
>> On Mon, Jun 30, 2014 at 7:08 PM, Matías Guzmán Naranjo
>> <mortem.dei at gmail.com> wrote:
>>
>>  wouldn't just writing <date>.*?</date> get me 'week after'?
>>>
>>
>> I'd go for
>>
>> <date>[^<]+</date>
>>
>> which will consume line breaks. Of course, this assumes that date only
>> contains text and no other markup.
>>
>
> Again, my knowledge of grep is probably dated.  But I just tried the
> above, and it didn't work (it did not consume line breaks, so it couldn't
> find things that were on two successive lines).  Are you using some command
> line parameter on grep that allows it to search across successive lines?
>
>    Mike Maxwell
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140630/d257c85e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list