Tools for on-line etymological dictionaries [was Re: Ancestor-descendant distance]

Jon Patrick jonpat at staff.cs.usyd.edu.au
Mon Sep 6 02:01:21 UTC 1999


[ moderator changed Subject: header ]

    Date:       Sun, 22 Aug 1999 20:39:56 -0400 (EDT)
    From:       Sean Crist <kurisuto at unagi.cis.upenn.edu>

    On Fri, 13 Aug 1999 ECOLING at aol.com wrote:

    > We now have an international standard computer Code, Unicode,
    > which contains most of the characters needed for transliteration
    > (Latin-standard-based letters) and for phonetic transcription (IPA).
    > It would be useful to try to establish a standard for Comparative
    > Data sets, into which all existing computer data sets can be translated,
    > so that the massive sets of data can be made available for studies
    > such as this.

    I agree totally.  We're on exactly the same wavelength here.

    I've looked into this a little and have tried to educate myself about
    SGML, which would be an obvious candidate for marking up the data sets.  I
    don't know if there are any specific standard sets of SGML tags for
    marking up dictionaries; if there are, it would probably make sense to
    start with such a tag set, and extend it with whatever additional tags we
    need to represent cognations between languages, etc.

    If anyone on this list has any experience using SGML for such a purpose,
    please write to me, because I'll need to be tackling this problem before
    much longer!

The Text Encoding Initiative (TEI) has created a standard set of markup tags
for paper dictionaries. We have used them for converting paper dictionaries to
databases, however I believe they will need to be enhanced for creating
etymological dictionaries or alternatively databases for historical
linguistics.  This needs collaborative work between the people interested in
the problem.

Jon
______________________________________________________________
The meaning of your communication is the response you get



More information about the Indo-european mailing list