[Corpora-List] sorting OHG (non-ASCII) in PERL

Lars Nygaard lars.nygaard at ilf.uio.no
Tue Feb 4 15:28:57 UTC 2003


This can be done with the "locale" pragma. It's all in the "perllocale"

lars nygaard

At 15:56 04.02.2003 +0100, you wrote:
>stupid question but perhaps the freaks can help me:
>we're building a database of Old High German words. Obviously, there are
>some characters that are not in ASCII (diacritics like stress marks ' and
>carots ^) and chars that do not follow the 'normal' sorting order (like
>'uu' for 'w'). One possibility would be to recode these chars (e.g. get
>rid off the diacritics for sorting and put them back on in the output),
>but is there a more elegant and general way (e.g. in case one would like
>to have a long 'e' after the short 'e' etc.) so that one could use it for
>other scripts as well (UTF puts chars in an order that does not
>necessarily reflect the 'intuitiv' sequence in a language). - Is there a
>modul to tell PERL which sorting sequence one would like to use or do I
>have to program it myself?
>Thanx for any hints.
>Henning Reetz

larsnyg @ glossa.uio.no       22 84 40 42 (jobb)
http://folk.uio.no/larsnyg   90 63 23 19 (mobil)

More information about the Corpora mailing list