[Corpora-List] sorting OHG (non-ASCII) in PERL

Thomas Schmidt thomas.schmidt at uni-hamburg.de
Tue Feb 4 15:31:26 UTC 2003


Dear Henning,

I don't think there is an easy solution to this. If you say that you use
diacritics, would that be "ordinary" characters followed by a combining
diacritical mark (i.e. TWO chars) or would that be the fixed combinations of
some characters and some diacritics (i.e. ONE char, e.g. 'e' with grave
accent) that are in Latin-Extended etc.? If the latter, you may be lucky and
find a locale that has the right sorting order for you - you could then tell
PERL to use that locale. If the former, you'd probably have to write your
own piece of code. Maybe these links will help you (they did help me with a
similar problem):

http://rf.net/~james/perli18n.html
http://www.sysarch.com/perl/sort_paper.html

Kind regards,

	Thomas

---------------------------------------
Thomas Schmidt
SFB 538 'Mehrsprachigkeit' Teilprojekt Z
Tel: ++ 49 (040) 42838-6425
Fax: ++ 49 (040) 42838-6116
http://www.rrz.uni-hamburg.de/exmaralda
http://www.rrz.uni-hamburg.de/SFB538/
---------------------------------------



-----Ursprungliche Nachricht-----
Von: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]Im
Auftrag von Henning Reetz
Gesendet: Dienstag, 4. Februar 2003 15:57
An: corpora at hd.uib.no
Betreff: [Corpora-List] sorting OHG (non-ASCII) in PERL


Hi,


stupid question but perhaps the freaks can help me:


we're building a database of Old High German words. Obviously, there are
some characters that are not in ASCII (diacritics like stress marks ' and
carots ^) and chars that do not follow the 'normal' sorting order (like 'uu'
for 'w'). One possibility would be to recode these chars (e.g. get rid off
the diacritics for sorting and put them back on in the output), but is there
a more elegant and general way (e.g. in case one would like to have a long
'e' after the short 'e' etc.) so that one could use it for other scripts as
well (UTF puts chars in an order that does not necessarily reflect the
'intuitiv' sequence in a language). - Is there a modul to tell PERL which
sorting sequence one would like to use or do I have to program it myself?


Thanx for any hints.


Henning Reetz



More information about the Corpora mailing list