encodings (was Re: [Lexicog] sounds animals make)

William J Poser billposer at ALUM.MIT.EDU
Tue May 25 18:35:44 UTC 2004


I don't understand why James Kass' URL didn't work for Ken.
My browser is pointed there right now, and as far as I can tell,
I didn't mistype it in my previous message. Here it is pasted from
my browser: http://home.att.net/~jameskass/

As for input into email, there are actually two issues here.
The first is how to enter Unicode in general. The answer to that
depends on your operating system, what software you use, and what
Unicode ranges you are using. One way of doing it, which is useful
when you need the odd character, is using a character map. This is a
sort of browser that displays the various ranges and lets you click
on a character that you want and copy it to the clipboard. Obviously
you don't want to enter lots of material this way.

The other major way of doing it is via a keymap, that is, a file that
maps sequences of keystrokes to particular Unicode characters. At present
I use the free (not even shareware, totally free) Unicode editor
Yudit a lot (http://www.yudit.org). Yudit comes with keymaps for a bunch
of languages and writing systems, but it is not hard to roll your own.
Yudit uses the 12 function keys to select input methods so you can
switch rapidly if, say, you are writing in several different systems.
(You decide how to assign the function keys to keymaps - you're limited
to switching among 12 at any one point, but you can have available
as many keymaps as you like.) For instance, I made a keymap for the
Carrier syllabics which lets me type in the SIL-developed roman system
but causes the syllabics to appear on the display. Yudit also
allows you to enter characters numerically, which again is not convenient
when writing a lot but is handy when you need the odd character
or want to test things, and it supports the kinput method for entering
Sino-Japanese characters. In fact, it even has a freehand entry method
for Chinese characters in which you draw with the mouse and it recognizes
the character.

Yudit is available for MS Windows and to the limited extent that I have
tried it seems to work the same as it does under Unix. I believe
that MS Word has some sort of support for keymaps but as I don't
routinely use MS Windows I don't know much about it. But I'm sure
other people on this list can.

One other comment: even if it isn't convenient for someone to use
Unicode, Unicode is still arguably the best interchange format.
So, for example, if you yourself are using a more specialized
encoding, if you translate it into Unicode before sending it out,
that will facilitate other people being able to use it.

The other issue with mail, which arises with anything that uses byte
values above decimal 127, is that email software has historically
not been 8-bit safe, meaning that it may gag on or distort anything
that contains bytes with values greater than 127 (that is, with the
high bit set). So it has been necessary, when sending Unicode or
Windows Codepage 1250, or images or sound files or programs, to
encode them for transmission into a form that uses only safe 7-bit
values, then decode them at the other end. We used to do that
manually (e.g. with uuencode in the Unix world, binhex on Macs, I don't
know what in the MS Windows world). More recently a lot of mail software
has handled this automatically. So, it may be necessary to send
Unicode as attachments. However, a few months ago I decided to test
the use of raw UTF-8 Unicode in email, and after successfully sending
myself messages in which I inserted UTF-8 into email without
any 7-bit encoding, began to correspond with a friend in Korean
(in hangul). We haven't had any problem. So my impression is that
the email system has become 8-bit safe, but I don't know if
that is universally true.

Bill
--
Bill Poser, Linguistics, University of Pennsylvania
http://www.ling.upenn.edu/~wjposer/ billposer at alum.mit.edu


------------------------ Yahoo! Groups Sponsor --------------------~-->
Yahoo! Domains - Claim yours for only $14.70
http://us.click.yahoo.com/Z1wmxD/DREIAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list