[Lexicog] Who Said What?

Damon Allen Davison allolex at GMAIL.COM
Tue Sep 28 10:15:28 UTC 2004


We do seem to enjoy our non-lexicography topics here. ;)

I wanted to make a few points. Since this is an English-language
mailing list, I don't think anyone should be offended by an
English-language encoding like ASCII, unless you are highly offended
by conflicting standards and advocacy of obsolete technology. ;)

Graphic representations of "foreign" character sets are problematic
because there are sundry and conflicting whatever-to-Latin
transcription systems out there.  One of the problems from a
computational point of view is that the representation markers for
many of these systems are ambiguous. ' N ' may very well represent a
tilde, but ' N ' also needs to be able to represent, well, ' N '. I
might be able to fish that out using a rule like: "if there is a
majiscule N in the middle of a word, it means that the previous word
has a tilde over it". Another problem with using tricks like ' N' is
that there are words and especially acronyms out there that have
internal capitalization. This all really does seem a bit excessive
since everyone should at the very least have ISO-8859-1 support and
8859-1 contains almost all Western European letters. ASCII has long
since been obsolete.

What everyone should be using is Unicode--the 8-bit version will do
for now. I can't think of a *NIX, Windows, or Apple system out there
which doesn't support it.  There are fonts, free and otherwise, to be
had. Modern systems often come with Unicode support out of the box.

If your mail reader doesn't support Unicode, then you should perhaps
consider upgrading to another mail client.  For those of you in love
with console-based programs, there is mutt, which is *far* superior to
Pine. Personally, I work on several different Unix-like systems
(Solaris, FreeBSD, OpenBSD, and Linux for anyone who cares) and use
mutt, Thunderbird, and Gmail to keep track of my mail---all of my
servers have IMAP support so there is no problem switching between
mail clients.  Gmail is good for lists because I've noticed that
firstly, it's good for me to have separation between daily personal
communication and list activity. Secondly, Gmail supports threading,
and lastly, Gmail has plenty of capacity for years of very active
lists. (Incidentally, I have a few Gmail invitations, so mail me if
you want one.) I don't really like the idea of web mail that much, but
it does have one major advantage: all modern browser GUIs support
Unicode.

For lexicography, the advantage of the Unicode standard should be
obvious. If you can represent everything you need using a combination
of two standards--XML and Unicode, then you can be assured that your
hard work will either be directly usable or easily translatable to
another standard in the future. If you start out with a standard that
represents more than you need, it's simple to go to a system that
represents only what you need, but if your own standard turns out not
to be utterly explicit, it is very hard to go back afterward and add
the missing bits and pieces.

I realize this is a bit more than two cents...

Damon


------------------------ Yahoo! Groups Sponsor --------------------~-->
$9.95 domain names from Yahoo!. Register anything.
http://us.click.yahoo.com/J8kdrA/y20IAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list