Corpora: Re: Typing accents in Windows & "optionality" of diacritic marks

Trond Trosterud trond.trosterud at hum.uit.no
Fri Apr 20 10:46:51 UTC 2001


>If you can tell me how to type them in Word under Windows on a US
>English keyboard I would appreciate it.

In most cases you do not need to go to the symbol window.

On your control panel (from the MS menu) you will find an icon called
keyboards. You go there, look for the keyboards you want, and select them.
Then, in the bottom right corner comes a little lg abbreviation telling
what keyboard you use. The same goes for the Mac (starting from the apple
menu). I use 3 different keyboard layouts on my Mac (Norwegian, Finnish and
Sámi), the US keyboard is available as well.

There is one possible (but unlikely) obstackle: If Microsoft has decided
that US OS users shall be protected from such temptations to look at other
lgs (i.e. if this handy mechanism is not available to US OS's) I suggest
you do something with it, start a campaign or whatever.

It is a long-standing difference between PC and Mac that the former does
not let you access the characters of your code table from other keyboard
layouts easily, whereas the latter gives you the whole 8-bit code table.
Thus, the Mac had a multilingual approach from the very beginning, as
opposed to the monolingually designed PC. Even PC users can access accented
letters without changing keyboard layout, though (at least my Norwegian PC
keyboard layout gives me access to Spanish and other vowels (acute, grave,
diaeresis, circumflex, tilde + vowel) via AltGr + the dead keys D12 and
E12. I can only hope that collegues in Los Angeles are equally well
provided, sitting there with their US keyboards).

Since genuine 7-bit systems are really rare, what monolingual English users
need to do to get access to all the Western Eurpoean lgs (save the Gaelic
ones, there you need other measures) is to configure your e-mail system to
the 8-bit code table ISO/IEC 8859-1, or Latin 1. Thus, Geoffry Sampsons
otherwise fine defense of linguistic rights is headed by the rather sad
message "X-Sun-Charset: US-ASCII", which translates to "English and
Indonesian only" (the only two lgs on earth for which US-ASCII is enough,
(and if you don't accept writing "role" for "rôle", you are left with
Indonesian)). Well, if he can read my Latin 1, it is OK, of course

Russian, Japanese, Eastern European etc. users may have problems with Latin
1 (but with proper email client settings the text will come through). That
is one reason why ISO/IEC 10646, or Unicode, is invented. And here, Win9x
and above has the lead over Macintosh (os X will bridge the gap). In Win9x
or abouve, you can read every letter of every lg. By going to the symbol
window you will be able to insert the relevant characters, in a cumbersome
way, but you will not be able to make keyboards for character collections
that do not have an 8-bit MS codetable. Receive info, and not produce it,
is the somewhat Orwellian style.

Then an important note on the "optionality of diacritic marks". This is
nonsense. The ring above my Norwegian a is just as optional as the bar
across the English l. Thus, Norwegian "rane" and "råne" is as distinct as
English "tie" amnd "lie". It is true, though, that you can read a Norwegian
text without the Norwegian letters, just as you can read an English text by
exchanging all i-s with y (tri it for iourselves). But we would rather not.

I cannot but hope that a majority of my collegues will find this trivial.
What has proven not to be trivial though, is efforts to standardise text
encoding in corpora. Since this is a corpus list, where many obviously are
stuck in ASCII, I strongly urge you to encode your corpora with Unicode.
There you simply will find the letters you need, from Chuvash via IPA to
African clicks and Cherokee. And your readers will be able to read your
corpus as well.


-------------------------------------------------------------------
Trond Trosterud                                     t +47 7764 4763
Det humanistiske fakultet                           h +47 7767 3639
N-9037 Universitetet i Tromsø, Noreg                f +47 7764 4239
Trond.Trosterud at hum.uit.no           http://www.hum.uit.no/a/trond/
-------------------------------------------------------------------



More information about the Corpora mailing list