[Corpora-List] XML concordancing query

Ciarán Ó Duibhín ciaran at oduibhin.freeserve.co.uk
Thu May 12 09:47:37 UTC 2011


Hi. I hope someone can save me doing a little research into concordance 
programs!  I'm looking for one which can do this.

My XML-tagged text will have many short strings tagged in a particular way, 
it might possibly be <span class="ignore">xyz</span>

The thing about these tagged strings is that they are to be dropped when the 
text is being divided into tokens.  So, if the text contains

           abc   def   g<span class="ignore">xyz</span>h   <span 
class="ignore">pq</span>ijkl

then the tokens are:   abc def gh ijkl   and these should figure as 
concordance headwords.

The ignored material should however be included in the contexts (without the 
tagging), so this piece of text would give a token of "gxyzh" appearing 
under type "gh", and a token of "pqijkl" appearing under type "ijkl".

Thanks for your help,

Ciarán Ó Duibhín.





_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list