[Corpora-List] sorting OHG (non-ASCII) in PERL

Henning Reetz henning.reetz at uni-konstanz.de
Tue Feb 4 14:56:41 UTC 2003


Hi,

stupid question but perhaps the freaks can help me:

we're building a database of Old High German words. Obviously, there
are some characters that are not in ASCII (diacritics like stress
marks ' and carots ^) and chars that do not follow the 'normal'
sorting order (like 'uu' for 'w'). One possibility would be to recode
these chars (e.g. get rid off the diacritics for sorting and put them
back on in the output), but is there a more elegant and general way
(e.g. in case one would like to have a long 'e' after the short 'e'
etc.) so that one could use it for other scripts as well (UTF puts
chars in an order that does not necessarily reflect the 'intuitiv'
sequence in a language). - Is there a modul to tell PERL which
sorting sequence one would like to use or do I have to program it
myself?

Thanx for any hints.

Henning Reetz

--

Department of Linguistics
University of Konstanz
Fach D186
78457 Konstanz
Germany
email:	henning.reetz at uni-konstanz.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030204/355c334e/attachment.htm>


More information about the Corpora mailing list