build a font for your endangered language...

Andrew Cunningham lang.support at GMAIL.COM
Sat May 17 00:05:37 UTC 2008


wouldn't this just be represented by "n" followed by "U+0301"? an "n"
and a combining acute?

Andrew

2008/5/17 Heather Souter <hsouter at gmail.com>:
> Taanshi,
>
> I am wondering how to go about creating a new character and how to go about
> getting a unicode codepoint for it.  What I have in mind is is an <n> with
> a</> over it....  Any advice or suggestions?
>
> Kihchi-maarsii!
>
> Eekoshi.
> Heather Souter
> Michif Language Activist and
> Community Linguist
>
> On Fri, May 16, 2008 at 5:04 PM, William J Poser <wjposer at ldc.upenn.edu>
> wrote:
>>
>> Keola Donaghy 'uk'uneisguz:
>>
>> >Aloha We created and used our custom fonts back in 1994 and are
>> >still slowly trying to wean ourselves from them and switch
>> >completely to Unicode.
>>
>> Actually, I think this confirms what I have been saying: using
>> custom fonts is NOT a problem, except in cases like cell phones
>> and work machines over which users have no control, where you
>> can't install them.
>>
>> The problem is not custom fonts, it is custom ENCODINGS.
>>
>> Since I think some people may not be clear on the distinction,
>> let me explain. Text in a computer really consists of a sequence
>> of character codes, which are non-negative integers. The computer
>> doesn't really store an "a" - it stores a number which by convention
>> is associated with the character "a". Once upon a time, in the days
>> of "dumb terminals" and fixed-encoding keyboards, this was all hard-wired.
>> When you pressed the "a" key on your keyboard it sent a certain small
>> integer to your computer, and when the computer sent that same small
>> integer to the terminal, the terminal displayed the corresponding
>> glyph. Nowadays it is possible to program what codes are generated by
>> particular keyboard events and what glyphs are displayed, but
>> the basic principle is the same: text consists of a sequence of
>> numbers.
>>
>> What until recently was by far the most common encoding was ASCII,
>> in which "a" has the character code 97. (Character codes are normally
>> given in hexadecimal but I'll translate into decimal here.) "b" is
>> 98, "c" is 99. "A" is 65, "B" is 66, "C" is 67, etc. So, if you
>> have an ASCII-encoded font containing glyphs for the roman alphabet,
>> sending the code 98 to the display will select the glyph for "b"
>> and display it.
>>
>> For other languages there are other encodings. If, for example,
>> you use the ARMSCII7 encoding (which you might have done if you
>> were an Armenian), if you send the code 98 to the display instead
>> of the letter "b" you would get the Armenian capital letter cha.
>>
>> Until recently, at best there was a single standard for each language
>> and writing system, so that everybody would be on the same wavelength
>> within that language and writing system. Fonts for Armenian or
>> Russian or Hebrew or whatever would be encoded according to the
>> standard for that language. Then things would be simple so long
>> as you were using that language, but would get messy if, say,
>> you need to use Armenian and English in the same document, or
>> wanted to write in Russian on a machine set up for Hebrew.
>> Furthermore, in many cases there were multiple encodings for the
>> same writing system. Sometimes, every font had its own idiosyncratic
>> encoding. (The champions seem to be the Ethiopians, who had over
>> 40 known encodings for Amharic.)
>>
>> In this situation, where every font potentially uses its own
>> encoding, for other people to use your font it isn't sufficient
>> for them to install it - their software has to understand its
>> encoding.
>>
>> With much current software, so long as your font uses a well-known
>> encoding, the software can use it because it contains or knows how
>> to look up information about the encoding. Your browser, for example,
>> almost certainly (a) attempts to detect the encoding of the web page
>> it displays and (b) allows you to tell it what encoding to use (in case
>> it fails to guess correctly - this happens with some frequency, in part
>> because many web pages lie about their encoding and the browser accepts
>> the lie). But if you have a truly idiosyncratic encoding in your font,
>> software may not know what to do with it.
>>
>> What Unicode does is unify all writing systems into a single encoding.
>> In Unicode "b" and Armenian capital cha do not compete for the
>> same codepoint. Instead, "b" is 98 as in ASCII and Armenian capital
>> cha is 1353. With everything included in a single encoding, you can
>> mix writing systems easiy within a single document and use one writing
>> system on a system set up for another.
>>
>> So, if you create your own font but use Unicode as the encoding,
>> so long as people are able to install your font they should have no
>> problem using it. What you should not do is create fonts that use
>> your own idiosyncratic encoding.
>>
>> One of the uses of FontForge is in fact reencoding an existing font.
>> You can see an example of this at:
>> http://billposer.org/Linguistics/Computation/Reencoding/HowTo.html
>> The examples used in this tutorial are based on a real task.
>> I wanted to be able to use Linear B and at the time could only
>> find a font that used an idiosyncratic encoding. So I took that
>> font and changed the encoding to Unicode.
>>
>> Bill
>
>



-- 
Andrew Cunningham
Vicnet Research and Development Coordinator
State Library of Victoria
Australia

andrewc at vicnet.net.au
lang.support at gmail.com



More information about the Ilat mailing list