build a font for your endangered language...

Andrew Cunningham lang.support at GMAIL.COM
Sat May 17 00:04:32 UTC 2008


Custom encodings are one problem, and should be avoided.

But the reality is many languages aren't supported by core operating
system fonts, which introduced all sorts of difficulties when using
web services such as emails.

Custom unicode fonts have their uses, and many languages rely on them,
but only really work if people have downloaded them.

And you need a certain amount of IT knowledge to override the fonts
used by various web services by default.


Andrew

2008/5/17 William J Poser <wjposer at ldc.upenn.edu>:
> Keola Donaghy 'uk'uneisguz:
>
>>Aloha We created and used our custom fonts back in 1994 and are
>>still slowly trying to wean ourselves from them and switch
>>completely to Unicode.
>
> Actually, I think this confirms what I have been saying: using
> custom fonts is NOT a problem, except in cases like cell phones
> and work machines over which users have no control, where you
> can't install them.
>
> The problem is not custom fonts, it is custom ENCODINGS.
>
> Since I think some people may not be clear on the distinction,
> let me explain. Text in a computer really consists of a sequence
> of character codes, which are non-negative integers. The computer
> doesn't really store an "a" - it stores a number which by convention
> is associated with the character "a". Once upon a time, in the days
> of "dumb terminals" and fixed-encoding keyboards, this was all hard-wired.
> When you pressed the "a" key on your keyboard it sent a certain small
> integer to your computer, and when the computer sent that same small
> integer to the terminal, the terminal displayed the corresponding
> glyph. Nowadays it is possible to program what codes are generated by
> particular keyboard events and what glyphs are displayed, but
> the basic principle is the same: text consists of a sequence of
> numbers.
>
> What until recently was by far the most common encoding was ASCII,
> in which "a" has the character code 97. (Character codes are normally
> given in hexadecimal but I'll translate into decimal here.) "b" is
> 98, "c" is 99. "A" is 65, "B" is 66, "C" is 67, etc. So, if you
> have an ASCII-encoded font containing glyphs for the roman alphabet,
> sending the code 98 to the display will select the glyph for "b"
> and display it.
>
> For other languages there are other encodings. If, for example,
> you use the ARMSCII7 encoding (which you might have done if you
> were an Armenian), if you send the code 98 to the display instead
> of the letter "b" you would get the Armenian capital letter cha.
>
> Until recently, at best there was a single standard for each language
> and writing system, so that everybody would be on the same wavelength
> within that language and writing system. Fonts for Armenian or
> Russian or Hebrew or whatever would be encoded according to the
> standard for that language. Then things would be simple so long
> as you were using that language, but would get messy if, say,
> you need to use Armenian and English in the same document, or
> wanted to write in Russian on a machine set up for Hebrew.
> Furthermore, in many cases there were multiple encodings for the
> same writing system. Sometimes, every font had its own idiosyncratic
> encoding. (The champions seem to be the Ethiopians, who had over
> 40 known encodings for Amharic.)
>
> In this situation, where every font potentially uses its own
> encoding, for other people to use your font it isn't sufficient
> for them to install it - their software has to understand its
> encoding.
>
> With much current software, so long as your font uses a well-known
> encoding, the software can use it because it contains or knows how
> to look up information about the encoding. Your browser, for example,
> almost certainly (a) attempts to detect the encoding of the web page
> it displays and (b) allows you to tell it what encoding to use (in case
> it fails to guess correctly - this happens with some frequency, in part
> because many web pages lie about their encoding and the browser accepts
> the lie). But if you have a truly idiosyncratic encoding in your font,
> software may not know what to do with it.
>
> What Unicode does is unify all writing systems into a single encoding.
> In Unicode "b" and Armenian capital cha do not compete for the
> same codepoint. Instead, "b" is 98 as in ASCII and Armenian capital
> cha is 1353. With everything included in a single encoding, you can
> mix writing systems easiy within a single document and use one writing
> system on a system set up for another.
>
> So, if you create your own font but use Unicode as the encoding,
> so long as people are able to install your font they should have no
> problem using it. What you should not do is create fonts that use
> your own idiosyncratic encoding.
>
> One of the uses of FontForge is in fact reencoding an existing font.
> You can see an example of this at:
> http://billposer.org/Linguistics/Computation/Reencoding/HowTo.html
> The examples used in this tutorial are based on a real task.
> I wanted to be able to use Linear B and at the time could only
> find a font that used an idiosyncratic encoding. So I took that
> font and changed the encoding to Unicode.
>
> Bill
>



-- 
Andrew Cunningham
Vicnet Research and Development Coordinator
State Library of Victoria
Australia

andrewc at vicnet.net.au
lang.support at gmail.com



More information about the Ilat mailing list