Ancestor-descendant distance

Sean Crist kurisuto at
Mon Aug 23 00:39:56 UTC 1999

On Fri, 13 Aug 1999 ECOLING at wrote:

> We now have an international standard computer Code, Unicode,
> which contains most of the characters needed for transliteration
> (Latin-standard-based letters) and for phonetic transcription (IPA).
> It would be useful to try to establish a standard for Comparative
> Data sets, into which all existing computer data sets can be translated,
> so that the massive sets of data can be made available for studies
> such as this.

I agree totally.  We're on exactly the same wavelength here.

I've looked into this a little and have tried to educate myself about
SGML, which would be an obvious candidate for marking up the data sets.  I
don't know if there are any specific standard sets of SGML tags for
marking up dictionaries; if there are, it would probably make sense to
start with such a tag set, and extend it with whatever additional tags we
need to represent cognations between languages, etc.

If anyone on this list has any experience using SGML for such a purpose,
please write to me, because I'll need to be tackling this problem before
much longer!

