forum

Jordan Lachler jordanlachler at GMAIL.COM
Wed Feb 27 08:51:34 UTC 2008


Hi James,

As I'm sure you know, we have the same issue with the underlined-g in Haida
that you do in Tlingit.  Our solution has been to use the small capital G
U+0262 plus the combining macron below U+0331 to render lower-case
g-underlined, and then the regular capital G plus the combining macron below
for the upper-case g-underlined.  This eliminates the descender issue
altogether, and has the added benefit of making the g-underline look
strikingly different from the plain g, which is handy since the two sounds
have no relationship to each other beyond the fact that both are stops. (I
think Haida was "assigned" the g-underline because that character was
already in use for Tlingit, even though it's for a totally different sound.)

We've been using that combination with the Aboriginal Sans Unicode font and
it works pretty nicely in almost all renderings... on some machines, I've
seen it where the combining macron drifts off to the right a little too
much, but even then the underline is still much easier to spot than when
it's mashed together with the descender of the regular lower-case g.

From our experience, it took speakers and students a week or two to get used
to seeing it that way... now after three or four years people don't even
seem to notice it much at all (or, they're being scrupulous in not
mentioning it to me, either way).  Of course, when people handwrite Haida,
many/most folks still use a regular lower-case g and then draw a line
through or under the descender (especially if they're writing in cursive).

Anyway, just another option to consider.

Háw'aa,

Jordan

On Tue, Feb 26, 2008 at 11:16 PM, James Crippen <jcrippen at gmail.com> wrote:

> On Tue, Feb 26, 2008 at 1:09 PM, William J Poser <wjposer at ldc.upenn.edu>
> wrote:
> > Regarding the Apachean characters that "are not directly supported" by
> >  Unicode, I can't speak for Mia but when I've heard such things before
> >  it usually means that Unicode does not provide a single codepoint for
> >  the character.
>
> A similar situation exists for several Northwest Coast languages,
> particularly the on I work with, Tlingit. With these languages the
> issue is the combining macron below U+0331, in text ˍ, which was
> developed back in the Bad Old Days of typewriters where backspace and
> overstrike was a convenient way of extending the Latin alphabet. The
> diacritic is available in a few precomposed characters in the Latin
> Extended Additional rage (U+1E00 to U+1EFF), namely with B/b, D/d,
> K/k, L/l, N/n, R/r, T/t, and Z/z, as well as h (but not H!). For
> Tlingit the popular orthography requires the combining macron under
> both G/g and X/x as well as K/k. Since the former two pairs aren't
> precomposed most fonts display them unacceptably badly, if they
> actually include the (admittedly obscure) U+0331 diacritic.
>
> I tested all of the fonts in a default Windows XP installation and
> only found that only MS Sans Serif and Tahoma support the
> aforementioned characters, and they display U+0331 halfway between the
> intended glyph and the following glyph. This is probably a font
> problem but it could also be the renderer, I'm not absolutely certain.
> Lucida Sans Unicode gets U+0331 correct, but lacks the precomposed
> U+1E34 and U+1E35 (k with line below). The fonts from SIL, Doulos,
> Charis, and Gentium, all work correctly, but their line heights are
> unpleasantly large for some people and don't work well as system
> fonts.
>
> Apple does a much better job of this in Mac OS X 10.5, supporting the
> diacritic in a number of fonts. They did a particularly good job with
> Lucida Grande, which is the standard system font. In addition,
> Helvetica, Courier, Geneva, American Typewriter, and Bradley Hand work
> as expected. I had little or no problem with creating a keyboard
> layout and using it for my daily work.
>
> Unfortunately, I can't reliably use Unicode-encoded Tlingit in email
> or documents which I intend to share with others, since I can't ensure
> that the characters will be even close to viewable for them. As it
> stands all the Tlingit writing computer users I know instead use
> underline markup to fulfill their needs, but this of course breaks in
> any sort of operation that doesn't preserve markup, like copy-paste
> from web browsers, for example.
>
> One other problem I've encountered which is a more thorny issue is how
> to deal with the unfortunate combination of U+0331 macron-below and
> the latin small letter g. All the fonts I've seen display the
> diacritic below the descender. The original intention of the
> orthography designers was clearly to have an underscore overstruck on
> the descender of the g, as can be seen in a few Tlingit works out
> there. In my experience the diacritic is often rendered in such a way
> so that it's invisible, chopped off by the following line.
>
> Unicode provides a character that, on the face of it, should solve
> this problem, namely U+01E5 Latin small letter g with stroke. IIRC in
> the Unicode standards documents it appears with the stroke through the
> descender, and hence looks something like what Tlingit users expect.
> However, not only does this character have even less support in the
> font world (several of the aforementioned fonts lack this character),
> font developers have also placed the stroke through the right stem of
> the letter, or even through the upper bowl. Its capital counterpart
> U+01E4 has a stroke through the stubby arm of the G, rendering it
> unacceptable as a proper case pair.
>
> To cope with all of this, in my documents I've chosen to use U+1E21
> Latin small letter g with macron (above) as the lowercase form, and
> U+0047 U+0331 (G macron-below) as the uppercase form. This of course
> breaks case pairing, but it displays properly in most cases, avoids
> the disappearing diacritic, and works well enough for my purposes. I'm
> not afraid of transcoding all of my documents at some point in the
> future, but I wouldn't wish that on anyone else so I haven't
> promulgated a keyboard layout for either OS to anyone yet.
>
> A thought I had was to design an OpenType font family that had
> alternate forms for U+01E4 and U+01E5 which had the proper shape for
> the Tlingit orthography. It's a great idea, and although I could
> probably hack this into a free font family myself, there's no way that
> I'll find the time to actually do so. I thought of asking SIL to try
> implementing it, but never made a coherent proposal.
>
> >  It is possible in principle to request the addition of codepoints for
> >  such compound characters to Unicode. However, the Unicode Consortium is
> >  not thrilled by such requests. As I understand it, they don't like to
> >  clutter things up by encoding additional characters unnecessarily. In
> the
> >  cases in which they have done so, the motivation was reportedly
> >  consistency with previous character sets. (That is, if an existing
> >  encoding had a single codepoint for a character, Unicode also has a
> >  single codepoint for it in order to simply conversion between the older
> >  encoding and Unicode.)
>
> This rationale of theirs originally didn't bother me, but with the
> huge increase in codespace with their additional surrogates, and with
> the recent addition of a set of mathematical alphabets in italic,
> bold, bold italic, script, blackletter, blackboard bold, sans serif,
> bold sans, oblique sans, bold oblique sans, fixed with, bold greek,
> italic greek, bold italic greek, sans bold greek, and sans bold italic
> greek, I no longer comprehend their resistance to additional
> precomposed Latin forms.
>
> Anyway, that's my rant. I'm working on a paper on Tlingit
> orthographies which addresses these issues and more. Hopefully I will
> be able to present it at the LSA summer meeting, to which my
> department (University of Hawai'i at Mānoa) has graciously offered to
> send me. This discussion has reminded me that I need to finish the
> abstract and paper for it, and figure out some way to sensibly and
> coherently explain these sorts of Unicode problems to linguists out
> there developing orthographies.
>
> Aatlein gunalchéesh yee yei jinéiyi,
> James Crippen
>



-- 
Jordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/ilat/attachments/20080226/10357b27/attachment.htm>


More information about the Ilat mailing list