[Corpora-List] XML concordancing query
Ciarán Ó Duibhín
ciaran at oduibhin.freeserve.co.uk
Thu May 12 09:47:37 UTC 2011
Hi. I hope someone can save me doing a little research into concordance
programs! I'm looking for one which can do this.
My XML-tagged text will have many short strings tagged in a particular way,
it might possibly be <span class="ignore">xyz</span>
The thing about these tagged strings is that they are to be dropped when the
text is being divided into tokens. So, if the text contains
abc def g<span class="ignore">xyz</span>h <span
class="ignore">pq</span>ijkl
then the tokens are: abc def gh ijkl and these should figure as
concordance headwords.
The ignored material should however be included in the contexts (without the
tagging), so this piece of text would give a token of "gxyzh" appearing
under type "gh", and a token of "pqijkl" appearing under type "ijkl".
Thanks for your help,
Ciarán Ó Duibhín.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list