[Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML corpus
Christopher Tribble
ctribble at clara.co.uk
Mon Oct 21 17:31:45 UTC 2013
Mike an economical and pleasing solution!
Thanks
C:
--
Dr Christopher Tribble
EMAIL || ctribble at clara.co.uk
WEB || www.ctribble.co.uk
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Mike Scott
Sent: 21 October 2013 17:39
To: corpora at uib.no
Subject: Re: [Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML
corpus
The problem is WordSmith's handling of mark-up where there are multiple
attributes. Hitherto it has only been possible to search on one attribute
and, until today, you could only use a limited range of wildcards. As a
result of Peter's query, I have found a way of making a single asterisk
represent any attribute, just as it can represent a single word.
Thus
prevent* * from
will find (and previously found)
... preventing others from reaching ...
and now
<w * pos="V*>giv*
finds (from today's version (6.0.161) onwards)
...<w id-"123" pos="VV0>give ...
...<w id-"1234" pos="VV0>gives ...
etc.
Georg's solution is to treat all mark-up as ordinary text, which will suit
some uses but not others, as he says. Another solution I considered was to
make it easy to remove unwanted mark-up (as opposed to all mark-up) using
WordSmith's Text Converter, but in the end it seemed better to make the lone
asterisk mean the same as it does outside the mark-up.
Cheers -- Mike
On 20/10/2013 21:40, Marko, Georg (georg.marko at uni-graz.at) wrote:
Dear Peter,
I probably misunderstand the question, but what happens if you delete the
"<*>" in "Mark-up to ignore". It will probably make estimating distances
difficult, with all the pieces included in the tags here, but if you look
for the core bit - the "VV0", e.g. - this should be there (at least it was,
when I did a little test with the line you've given as a µ-corpus).
Simplistic solution, and probably not what you meant, but maybe...
Best
Georg
________________________________________
Von: corpora-bounces at uib.no [corpora-bounces at uib.no] im Auftrag von Peter
Saunders [peter.saunders at lang.ox.ac.uk]
Gesendet: Sonntag, 20. Oktober 2013 22:01
An: corpora at uib.no
Betreff: [Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML corpus
Dear All
Does anyone know how I can configure Wordsmith settings so that it will do
tag searches on a CLAWS 7 Pseudo XML tagged corpus? Here's a corpus line:
<w id="2.5" pos="VV0">give</w> <w id="2.6" pos="AT1">an</w>
I think the id="*" parameter causes problems and I don't know how to strip
this part out of tag searches.
Best
Peter
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
--
Mike Scott
***
If you publish research which uses WordSmith, do let me know so I can
include it at
http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wor
dsmith.htm
***
University of Aston and Lexical Analysis Software Ltd.
mike.scott at aston.ac.uk
www.lexically.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131021/ca9884a7/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list