[Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML corpus

Christopher Tribble ctribble at clara.co.uk
Mon Oct 21 17:31:45 UTC 2013


Mike – an economical and pleasing solution!

 

Thanks

 

C:

--

Dr Christopher Tribble

EMAIL  || ctribble at clara.co.uk

WEB    || www.ctribble.co.uk 

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Mike Scott
Sent: 21 October 2013 17:39
To: corpora at uib.no
Subject: Re: [Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML
corpus

 

The problem is WordSmith's handling of mark-up where there are multiple
attributes. Hitherto it has only been possible to search on one attribute
and, until today, you could only use a limited range of wildcards. As a
result of Peter's query, I have found a way of making a single asterisk
represent any attribute, just as it can represent a single word.
Thus 

prevent* * from
will find (and previously found) 
... preventing others from reaching ...

and now

<w * pos="V*>giv*
finds (from today's version (6.0.161) onwards)
...<w id-"123" pos="VV0>give ...
...<w id-"1234" pos="VV0>gives ...
etc.

Georg's solution is to treat all mark-up as ordinary text, which will suit
some uses but not others, as he says. Another solution I considered was to
make it easy to remove unwanted mark-up (as opposed to all mark-up) using
WordSmith's Text Converter, but in the end it seemed better to make the lone
asterisk mean the same as it does outside the mark-up.

Cheers -- Mike

 

On 20/10/2013 21:40, Marko, Georg (georg.marko at uni-graz.at) wrote:

Dear Peter,
 
I probably misunderstand the question, but what happens if you delete the
"<*>" in "Mark-up to ignore". It will probably make estimating distances
difficult, with all the pieces included in the tags here, but if you look
for the core bit - the "VV0", e.g. - this should be there (at least it was,
when I did a little test with the line you've given as a µ-corpus).
 
Simplistic solution, and probably not what you meant, but maybe...
 
Best
 
Georg
________________________________________
Von: corpora-bounces at uib.no [corpora-bounces at uib.no] im Auftrag von Peter
Saunders [peter.saunders at lang.ox.ac.uk]
Gesendet: Sonntag, 20. Oktober 2013 22:01
An: corpora at uib.no
Betreff: [Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML corpus
 
Dear All
 
Does anyone know how I can configure Wordsmith settings so that it will do
tag searches on a CLAWS 7 Pseudo XML tagged corpus? Here's a corpus line:
 
<w id="2.5" pos="VV0">give</w> <w id="2.6" pos="AT1">an</w>
 
I think the id="*"  parameter causes problems and I don't know how to strip
this part out of tag searches.
 
Best
 
Peter
 
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora





-- 
Mike Scott
 
***
If you publish research which uses WordSmith, do let me know so I can
include it at
http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wor
dsmith.htm
***
University of Aston and Lexical Analysis Software Ltd.
mike.scott at aston.ac.uk
www.lexically.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131021/ca9884a7/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list