[Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML corpus

Mike Scott mike at lexically.net
Mon Oct 21 16:38:34 UTC 2013


The problem is WordSmith's handling of mark-up where there are multiple 
attributes. Hitherto it has only been possible to search on one 
attribute and, until today, you could only use a limited range of 
wildcards. As a result of Peter's query, I have found a way of making a 
single asterisk represent any attribute, just as it can represent a 
single word.
Thus

    *prevent* * from*
    will find (and previously found)
    /... preventing others from reaching .../

and now

    *<w * pos="V*>giv**
    finds (from today's version (6.0.161) onwards)
    /...<w id-"123" pos="VV0>give .../
    /...<w id-"1234" pos="VV0>gives .../
    //etc.

Georg's solution is to treat all mark-up as ordinary text, which will 
suit some uses but not others, as he says. Another solution I considered 
was to make it easy to remove unwanted mark-up (as opposed to all 
mark-up) using WordSmith's Text Converter, but in the end it seemed 
better to make the lone asterisk mean the same as it does outside the 
mark-up.

Cheers -- Mike


On 20/10/2013 21:40, Marko, Georg (georg.marko at uni-graz.at) wrote:
> Dear Peter,
>
> I probably misunderstand the question, but what happens if you delete the "<*>" in "Mark-up to ignore". It will probably make estimating distances difficult, with all the pieces included in the tags here, but if you look for the core bit - the "VV0", e.g. - this should be there (at least it was, when I did a little test with the line you've given as a µ-corpus).
>
> Simplistic solution, and probably not what you meant, but maybe...
>
> Best
>
> Georg
> ________________________________________
> Von: corpora-bounces at uib.no [corpora-bounces at uib.no] im Auftrag von Peter Saunders [peter.saunders at lang.ox.ac.uk]
> Gesendet: Sonntag, 20. Oktober 2013 22:01
> An: corpora at uib.no
> Betreff: [Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML corpus
>
> Dear All
>
> Does anyone know how I can configure Wordsmith settings so that it will do tag searches on a CLAWS 7 Pseudo XML tagged corpus? Here's a corpus line:
>
> <w id="2.5" pos="VV0">give</w> <w id="2.6" pos="AT1">an</w>
>
> I think the id="*"  parameter causes problems and I don't know how to strip this part out of tag searches.
>
> Best
>
> Peter
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- 
Mike Scott

***
If you publish research which uses WordSmith, do let me know so I can include it at
http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
***
University of Aston and Lexical Analysis Software Ltd.
mike.scott at aston.ac.uk
www.lexically.net

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131021/729d47cc/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list