<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
The problem is WordSmith's handling of mark-up where there are
multiple attributes. Hitherto it has only been possible to search on
one attribute and, until today, you could only use a limited range
of wildcards. As a result of Peter's query, I have found a way of
making a single asterisk represent any attribute, just as it can
represent a single word.<br>
Thus <br>
<blockquote><b>prevent* * from</b><br>
will find (and previously found) <br>
<i>... preventing others from reaching ...</i><br>
</blockquote>
and now<br>
<blockquote><b><w * pos="V*>giv*</b><br>
finds (from today's version (6.0.161) onwards)<br>
<i>...<w id-"123" pos="VV0>give ...</i><br>
<i>
...<w id-"1234" pos="VV0>gives ...</i><br>
<i>
</i>etc.<br>
</blockquote>
Georg's solution is to treat all mark-up as ordinary text, which
will suit some uses but not others, as he says. Another solution I
considered was to make it easy to remove unwanted mark-up (as
opposed to all mark-up) using WordSmith's Text Converter, but in the
end it seemed better to make the lone asterisk mean the same as it
does outside the mark-up.<br>
<br>
Cheers -- Mike<br>
<br>
<br>
<div class="moz-cite-prefix">On 20/10/2013 21:40, Marko, Georg
(<a class="moz-txt-link-abbreviated" href="mailto:georg.marko@uni-graz.at">georg.marko@uni-graz.at</a>) wrote:<br>
</div>
<blockquote
cite="mid:F603C3481BBFBB448EF1400F3606509803FC84CEEB@ARTEMIS.pers.ad.uni-graz.at"
type="cite">
<pre wrap="">Dear Peter,
I probably misunderstand the question, but what happens if you delete the "<*>" in "Mark-up to ignore". It will probably make estimating distances difficult, with all the pieces included in the tags here, but if you look for the core bit - the "VV0", e.g. - this should be there (at least it was, when I did a little test with the line you've given as a µ-corpus).
Simplistic solution, and probably not what you meant, but maybe...
Best
Georg
________________________________________
Von: <a class="moz-txt-link-abbreviated" href="mailto:corpora-bounces@uib.no">corpora-bounces@uib.no</a> [<a class="moz-txt-link-abbreviated" href="mailto:corpora-bounces@uib.no">corpora-bounces@uib.no</a>] im Auftrag von Peter Saunders [<a class="moz-txt-link-abbreviated" href="mailto:peter.saunders@lang.ox.ac.uk">peter.saunders@lang.ox.ac.uk</a>]
Gesendet: Sonntag, 20. Oktober 2013 22:01
An: <a class="moz-txt-link-abbreviated" href="mailto:corpora@uib.no">corpora@uib.no</a>
Betreff: [Corpora-List] Wordsmith tag searches of CLAWS 7 Pseudo XML corpus
Dear All
Does anyone know how I can configure Wordsmith settings so that it will do tag searches on a CLAWS 7 Pseudo XML tagged corpus? Here's a corpus line:
<w id="2.5" pos="VV0">give</w> <w id="2.6" pos="AT1">an</w>
I think the id="*" parameter causes problems and I don't know how to strip this part out of tag searches.
Best
Peter
_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Mike Scott
***
If you publish research which uses WordSmith, do let me know so I can include it at
<a class="moz-txt-link-freetext" href="http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm">http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm</a>
***
University of Aston and Lexical Analysis Software Ltd.
<a class="moz-txt-link-abbreviated" href="mailto:mike.scott@aston.ac.uk">mike.scott@aston.ac.uk</a>
<a class="moz-txt-link-abbreviated" href="http://www.lexically.net">www.lexically.net</a>
</pre>
</body>
</html>