build a font for your endangered language...

Fri May 16 23:30:10 UTC 2008

Taanshi,

I am wondering how to go about creating a new character and how to go about
getting a unicode codepoint for it.  What I have in mind is is an <n> with
a</> over it....  Any advice or suggestions?

Kihchi-maarsii!

Eekoshi.
Heather Souter
Michif Language Activist and
Community Linguist

On Fri, May 16, 2008 at 5:04 PM, William J Poser <wjposer at ldc.upenn.edu>
wrote:

> Keola Donaghy 'uk'uneisguz:
>
> >Aloha We created and used our custom fonts back in 1994 and are
> >still slowly trying to wean ourselves from them and switch
> >completely to Unicode.
>
> Actually, I think this confirms what I have been saying: using
> custom fonts is NOT a problem, except in cases like cell phones
> and work machines over which users have no control, where you
> can't install them.
>
> The problem is not custom fonts, it is custom ENCODINGS.
>
> Since I think some people may not be clear on the distinction,
> let me explain. Text in a computer really consists of a sequence
> of character codes, which are non-negative integers. The computer
> doesn't really store an "a" - it stores a number which by convention
> is associated with the character "a". Once upon a time, in the days
> of "dumb terminals" and fixed-encoding keyboards, this was all hard-wired.
> When you pressed the "a" key on your keyboard it sent a certain small
> integer to your computer, and when the computer sent that same small
> integer to the terminal, the terminal displayed the corresponding
> glyph. Nowadays it is possible to program what codes are generated by
> particular keyboard events and what glyphs are displayed, but
> the basic principle is the same: text consists of a sequence of
> numbers.
>
> What until recently was by far the most common encoding was ASCII,
> in which "a" has the character code 97. (Character codes are normally
> given in hexadecimal but I'll translate into decimal here.) "b" is
> 98, "c" is 99. "A" is 65, "B" is 66, "C" is 67, etc. So, if you
> have an ASCII-encoded font containing glyphs for the roman alphabet,
> sending the code 98 to the display will select the glyph for "b"
> and display it.
>
> For other languages there are other encodings. If, for example,
> you use the ARMSCII7 encoding (which you might have done if you
> were an Armenian), if you send the code 98 to the display instead
> of the letter "b" you would get the Armenian capital letter cha.
>
> Until recently, at best there was a single standard for each language
> and writing system, so that everybody would be on the same wavelength
> within that language and writing system. Fonts for Armenian or
> Russian or Hebrew or whatever would be encoded according to the
> standard for that language. Then things would be simple so long
> as you were using that language, but would get messy if, say,
> you need to use Armenian and English in the same document, or
> wanted to write in Russian on a machine set up for Hebrew.
> Furthermore, in many cases there were multiple encodings for the
> same writing system. Sometimes, every font had its own idiosyncratic
> encoding. (The champions seem to be the Ethiopians, who had over
> 40 known encodings for Amharic.)
>
> In this situation, where every font potentially uses its own
> encoding, for other people to use your font it isn't sufficient
> for them to install it - their software has to understand its
> encoding.
>
> With much current software, so long as your font uses a well-known
> encoding, the software can use it because it contains or knows how
> to look up information about the encoding. Your browser, for example,
> almost certainly (a) attempts to detect the encoding of the web page
> it displays and (b) allows you to tell it what encoding to use (in case
> it fails to guess correctly - this happens with some frequency, in part
> because many web pages lie about their encoding and the browser accepts
> the lie). But if you have a truly idiosyncratic encoding in your font,
> software may not know what to do with it.
>
> What Unicode does is unify all writing systems into a single encoding.
> In Unicode "b" and Armenian capital cha do not compete for the
> same codepoint. Instead, "b" is 98 as in ASCII and Armenian capital
> cha is 1353. With everything included in a single encoding, you can
> mix writing systems easiy within a single document and use one writing
> system on a system set up for another.
>
> So, if you create your own font but use Unicode as the encoding,
> so long as people are able to install your font they should have no
> problem using it. What you should not do is create fonts that use
> your own idiosyncratic encoding.
>
> One of the uses of FontForge is in fact reencoding an existing font.
> You can see an example of this at:
> http://billposer.org/Linguistics/Computation/Reencoding/HowTo.html
> The examples used in this tutorial are based on a real task.
> I wanted to be able to use Linear B and at the time could only
> find a font that used an idiosyncratic encoding. So I took that
> font and changed the encoding to Unicode.
>
> Bill
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/ilat/attachments/20080516/c41e5cf3/attachment.htm>