entity names for VN o+ and u+

Peter Constable Peter_Constable at sil.org
Mon Apr 30 18:45:35 UTC 2001


On 04/30/2001 07:19:25 AM Waruno wrote:

>But HTML the way I
>know it, doesn't provide for the possibility of placing them over some
>letter character. I'm perhaps a bit behind the times?

You need to understand the relationship between HTML, Unicode, the role of
client software, and "smart font" rendering. HTML uses Unicode as its
character set, and expects that precomposed characters will be used in a
document where available. This does not place any limitation on what can be
encoded, however: Unicode allows for dynamic composition of arbitrary
combinations of base characters and combining diacritics, and since HTML
supports Unicode it also supports this dynamic composition.

If you have tried viewing documents containing such combinations and been
disatisfied with the results, that is the fault of the client software
(browser, OS, fonts) and not of either HTML or of Unicode. Issues related
to complex-script rendering are assumed by Unicode and HTML to be handled
by the client software, typically by means of a "smart font" rendering
system, such as Apple's AAT or MS / Adobe's OpenType (or maybe some day
also SIL's Graphite). These rendering systems are designed to handle
behaviours that are typical of scripts like Arabic or Devanagari:
contextual glyph shaping, reordering of glyphs, ligation, and diacritic
positioning. In a system that is properly implemented, you should be able
to have in your document any combination of base characters and combining
diacritics, and even also precomposed base/diacritic characters, and get
all the diacritics stacked up in the correct position without requiring any
special markup or any special action on the part of the author or user.
Alas, such systems are still being developed, and I am not aware of any web
browser or even OS that currently provides this level of support for
arbitrary Latin diacritic combinations. Progress is being made, however,
and such support should be commonly available in a year or two.



>I find, that none of the reasons outweigh the main handicap of precomposed
>representation: illegibility for browsers which do not have the required
>special font.

In fact, one of the reasons for requiring the use of precomposed forms in
HTML whenever available is that they are more likely to be supported in
client software than are the decomposed equivalents.


This transgresses against the basic idea of Internet:
>maximal worldwide unimpeded contact. Precomposed representations create
>additional boundaries cutting up the WW-web into script-specific
>provinces.

That's not at all true. Precomposed representations merely define the
limits of what legacy technologies are capable of versus the greater level
of capability that requires support for Unicode and smart font rendering
technologies.


>Composed representation
>could perhaps encourage the numerous variants of Indic script in South
>and Southeast Asia to try and develop a common basic charset.

They don't need to. There is a universal character set, Unicode, that
already supports all of them adequately.


>> If you specify your charset to be UTF-8, then you don't need to use
>> entities for any character at all. Any software that is compliant with
HTML
>> 4.0 is supposed to be able to handle it.
>
>wow, news to me.  I'll try to read up on this UTF-8 thing.

You should read the relevant portion of the HTML 4 spec
(http://www.w3.org/TR/html4/charset.html). The XHTML 1.0 spec is actually
the current spec, and is an application of XML. As explained in the XML
spec, the character set used for XML is also the Unicode character set, and
the default encoding assumed is also Unicode (either UTF-8 or UTF-16 --
UTF-8 is more common).

You should also become familiar with the Unicode Standard -- see
www.unicode.org.



- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable at sil.org>



More information about the Sealang-l mailing list