Indian and other Asian languages listed in ISO 639: feedback sought

John Clews Emeet at SESAME.DEMON.CO.UK
Wed Feb 2 18:52:23 UTC 2000


VYAKARAN: South Asian Languages and Linguistics Net
Editors:  Tej K. Bhatia, Syracuse University, New York
          John Peterson, University of Munich, Germany
Details:  Send email to listserv at listserv.syr.edu and say: INFO VYAKARAN
Subscribe:Send email to listserv at listserv.syr.edu and say:
          SUBSCRIBE VYAKARAN FIRST_NAME LAST_NAME
          (Substitute your real name for first_name last_name)
Archives: http://listserv.syr.edu

Indian and other Asian languages listed in ISO 639: feedback sought

Dear list members. I am a member of the Joint Advisory Committee on
ISO 639: Codes for representation of names of languages (abbreviated
to ISO 639: language codes in further discussion below).

This committee meets on 17-18 February 2000 in Washington DC, and I
would be grateful for any information from list members, which would
highlight any major gaps in Indian and other Asian languages, that
are listed below.

If you are interested in the background to ISO 639, read section 1:
if not, and you would like to comment on the codes, and the languages
that have been coded, and any omissions or errors, go to section 2.


1. ISO 639: language codes

ISO 639 is one of many international standards developed by groups of
experts in many countries. This section provides a simplified view.

Put simply, ISO 639's job is to provide simple codes that can be
embedded in (mainly computerised) information systems that can allow
these information systems to highlight language use, or even to
enable useful things like font switching or similar, e.g. on Internet
web sites.

There are older 2-letter codes used, it could be said, mainly in
older, "legacy" system. 3-letter codes (mainly identical with codes
used by the Library of Congress, and in many libraries) have seen the
largest growth, and allow for greater expansion. Actually the two
sets are currently listed in two separate parts of ISO 639,
respectively in ISO [WD] 639[-1] and ISO 639-2.

The Internet Engineering Task Force's specification RFC 1766
recommends the use of ISO 639 codes in Internet uses.

We have also been in discussion with the Summer Institute of
Linguistics (SIL) who use a different (and much larger) set of
3-letter codes in their Ethnologue, codes which have also been used
in some Internet situations.

NB: (a) if you also use Ethnologue codes as well as, or instead of,
codes from ISO 639, I would be glad to hear, and to know in what
circumstances you use those codes.

(b) If you just use the Ethnologue, or some other reference source
where those Ethnologue codes are used, just for reference (i.e. the
reference source is important, not the codes themselves) that's a
different question.

I'd be grateful if you can distinguish (a) from (b) in any replies on
that point. However, regarding (b) it may be useful to know which
other publications/web sites use Ethnologue codes.


2. Opportunity for feedback

The list below is my own handy reference list based on my own
compilation of 3-letter codes from ISO 639-2 and Library of Congress
codes, and the 2-letter codes in ISO WD 639-1. Errors are likely to
be my own, rather than in ISO 639, though I have been fairly careful.

Some of the notes (especially those in square brackets, or using
asterisks) are just for my own reference, and relate only to
information on different editions of ISO 639,and can be ignored.

I'd be particularly interested to know of obvious omissions, or
errors in naming, or where predominant use of language names has
changed.

There are also some fairly basic "genetic codes" where entries for
"xxxx languages" or "xxxx languages (other)." Again, if some language
groups seem to have been omitted altogether I would be glad to know.

In any of the columns, any information including asterisks (*) or
question marks (?) are essentially my own, and for my own use.

If possible could you embed your comments within my quoted table,
unless your comment is very simple on a few lines: that will enable
me to allign comments.

I am primarily interested in omissions, and in language names, but if
you want to suggest a particularly useful 3-letter code, it may be
helpful (although I think that the default approach is likely to be
"use the first three letters of the name in English, or French, or in
the local spelling of the name" if known, allowing for transcription
or transliteration conventions).

Please could you reply direct to me at <Emeet at sesame.demon.co.uk> and
not on the list to avoid too many large repetitive emails
overcrowding the traffic on the South Asian Linguists list at
<VYAKARAN at LISTSERV.SYR.EDU>

I'll post a summary of your comments to <VYAKARAN at LISTSERV.SYR.EDU>
in due course.

If you could reply within a week from reading this email, this is
likely to provide sufficient time also to be able to feed such
information into the ISO 639 Joint Advisory Group meeting on 17-18
January 2000.


3. Handy reference list

Here's the list: I look forward to your comments!

[ Tip: Use a monospace font like Courier for the chart below ]


------------------------------------------------------------
  LC  ISO 639-2   ISO 639-1  Language name in English
------------------------------------------------------------
  --- --- ---     (aj)       Abaza
      abk          ab        Abkhazian
  --- --- ---     (ad)       Adyge
      ace                    Achinese
      ach                    Acoli
      ada                    Adangme
      aar          aa        Afar
      afh                    Afrihili
      afr          af        Afrikaans
      afa                    Afro-Asiatic (Other)
      aka          ak        Akan
      akk                    Akkadian
      alb/sqi *    sq        Albanian
      ale                    Aleut
      alg                    Algonquian languages
  --- --- ---     (an)       Aragonese
      tut                    Altaic (Other)
      amh          am        Amharic
      apa                    Apache languages
      ara          ar        Arabic
      arc                    Aramaic
      arp                    Arapaho
      arn                    Araucanian (Mapuche)
      arw                    Arawak
      arm/hye *    hy        Armenian
  --- --- ---     (vl)       Aromanian; Arumanian
      art                    Artificial (Other)
  --- --- ---     (ae)       Arvanite
      asm          as        Assamese
  --- --- ---     (au)       Asturian
      ath                    Athapascan languages
      aus                    Australian languages
      map                    Austronesian (Other)
      ava          av        Avaric
      ave         (fv)       Avestan
      awa                    Awadhi
      aym          ay        Aymara
      aze          az        Azerbaijani
      ban                    Balinese
  --- --- ---     (bq)       Balkar
      bat                    Baltic (Other)
      bal                    Baluchi
      bam          bm        Bambara
      bai                    Bamileke languages
      bad                    Banda
      bnt                    Bantu (Other)
      bas                    Basa
      bak          ba        Bashkir
      baq/eus *    eu        Basque
      btk                    Batak (Indonesia)
      bej                    Beja
      bel                    Belarusian [was Byelorussian]
      bem                    Bemba
      ben          bn        Bengali
      ber                    Berber (Other)
      bho                    Bhojpuri
      bih          bh        Bihari
      bik                    Bikol
      bin                    Bini
      bis          bi        Bislama
  --- --- ---     (bs)       Bosnian
      bra                    Braj
      bre          br        Breton
      bug                    Bugis (Buginese)
      bul          bg        Bulgarian
      bua                    Buriat
      bur/mya *    my        Burmese
      cad                    Caddo
      car                    Carib
      cat          ca        Catalan
      cau                    Caucasian (Other)
      ceb                    Cebuano
      cel                    Celtic (Other)
      cai                    Central American Indian (Other)
      chg                    Chagatai
      cmc                    Chamic languages
      cha                    Chamorro
  --- --- ---     (??)       Chamorro
      che         (nx)       Chechen
      chr         (jl)       Cherokee
      chy                    Cheyenne
      chb                    Chibcha
  --- --- ---     (ch)       Chichewa; Chewa
      chi/zho *    zh        Chinese
      chn                    Chinook jargon
      chp                    Chipewyan
      cho                    Choctaw
      chu         (sj)       Church Slavic (Old Church Slavonic)
  tru chk                    Chuukese
      chv         (cv)       Chuvash
      cop                    Coptic
      cor          kw        Cornish
      cos          co        Corsican
      cre          cr        Cree
      mus                    Creek
      cpe                    Creoles & Pidgins, English
      cpf                    Creoles & Pidgins, French
      cpp                    Creoles & Pidgins, Portuguese
      crp                    Creoles & Pidgins (Other)
      scr/hrv *    hr        Croatian (Serbo-Croat, Latin)
      cus                    Cushitic (Other)
      cze/ces *    cs        Czech
      dak                    Dakota
      dan          da        Danish
  --- --- ---     (dg)       Dargwa
      day                    Dayak
      del                    Delaware
      din                    Dinka
      div          dv        Divehi
      doi                    Dogri
      dgr                    Dogrib
      dra                    Dravidian (Other)
      dua                    Duala
      dut/nld *    nl        Dutch
      dum                    Dutch, Middle (ca. 1050-1350)
      dyu                    Dyula
      dzo          dz        Dzongkha
      efi         (ef)       Efik
      egy                    Egyptian (Ancient)
      eka                    Ekajuk
      elx                    Elamite
      eng          en        English
      enm                    English, Middle (ca. 1100-1500)
      ang                    English, Old (ca. 450-1100)
  --- --- ---     (er)       Erzya Mordvin
  esp epo          eo        Esperanto
  esk ---          --     ** Eskimo (Other) (not in 639-2)
      est          et        Estonian
  eth ---          --     ** Ethiopic [languages] (not in 639-2)
      ewe          ee        Ewe
      ewo                    Ewondo
      fan                    Fang
      fat                    Fanti
  far fao          fo        Faroese
      fij          fj        Fijian
      fin          fi        Finnish
      fiu                    Finno-Ugrian (Other)
      fon                    Fon
  --- --- ---     (fp)       Franco-Proven=E7al
      fre/fra *    fr        French
      frm                    French, Middle (ca. 1400-1600)
      fro                    French, Old (842- ca. 1400)
  fri fry          fy        Frisian
  --- --- ---     (??)       Frisian, East; Sater Frisian
  --- --- ---     (fn)       Frisian, North (fn! - also in Persian, Old)
      fur         (fu)       Friulian
      ful          ff        Fulah
      gaa                    Ga
  gae gla          gd        Gaelic, Scots [* were gae/gdh]
  iri gle          ga        Gaelic, Irish [* were gai/iri]
  max glv          gv        Gaelic, Manx
  --- --- ---     (gg)       Gagauz
  gag glg          gl        Gallegan (Galician - used in Spain)
      lug          lg        Ganda
      gay                    Gayo
      gba                    Gbaya
  eth gez                    Geez
      geo/kat *    ka        Georgian
      ger/deu *    de        German
      gmh                    German, Middle High
      goh                    German, Old High (ca. 750-1050)
      gem                    Germanic (Other)
  --- --- ---     (??)       German, Low; Low German
      gil                    Gilbertese
      gon                    Gondi
      gor                    Gorontalo
      got                    Gothic
      grb                    Grebo
      grc                    Greek, Ancient (to 1453)
      gre/ell *    el        Greek, Modern (1453-)
      kal          kl        Greenlandic (Kalaallisut)
  gua grn          gn        Guarani
      guj          gu        Gujarati
      gwi                    Gwich'in
      hai                    Haida
      hau          ha        Hausa
      haw                    Hawaiian
      heb          he    *** Hebrew [Infoterm, 1989: iw deprecated?]
      her          oh        Herero
      hil                    Hiligaynon
      him                    Himachali
      hin          hi        Hindi
      hmo         (??)       Hiri Motu, Motu
      hit                    Hittite
      hmn                    Hmong
      hun          hu        Hungarian
      hup                    Hupa
      iba                    Iban
      ice/isl *    is        Icelandic
      ibo          ig        Igbo
      ijo                    Ijo
      ilo                    Iloko
      inc                    Indic (Other)
      ine                    Indo-European (Other)
      ind          id    *** Indonesian [Infoterm, 1989: in deprecated?]
  --- --- ---     (ng)       Ingush
  int ina          ia        Interlingua [*                           ]
      ile          ie        Interlingue [*note similar lanaguage name]
      iku          iu        Inuktitut [Infoterm, 1989]
      ipk          ik        Inupiaq (was Inupiak)
      ira                    Iranian (Other)
      sga                    Irish, Old (to 900)
      mga                    Irish, Middle (900 - 1200)
      iro                    Iroquoian languages
  --- --- ---     (rx)       Istro-Romanian
      ita          it        Italian
      jpn          ja        Japanese
      jav/jaw *    jv/jw *   Javanese [jw (Jawi?) now deprecated??]
      jrb                    Judeo-Arabic
      jpr                    Judeo-Persian
  --- --- ---     (qb)       Kabardian
      kab                    Kabyle
      kac                    Kachin
      kal                    Kalaallisut [renamed]
  --- --- ---     (xl)       Kalmyk
      kam                    Kamba
      kan          kn        Kannada
      kau          kr        Kanuri
  --- --- ---     (qc)       Karachay
  --- --- ---     (qr)       Karaim
      kaa                    Kara-Kalpak
  --- --- ---     (kj)       Karelian, North (Other Karelian too?)
      kar                    Karen
      kas          ks        Kashmiri
  --- --- ---     (??)       Kashubian
      kaw                    Kawi
      kaz          kk        Kazakh
      kha                    Khasi
  cam khm          km    **  Khmer (LC was once "cam")
      khi                    Khoisan (Other)
      kho                    Khotanese
  --- --- ---     (ki)       Kikuyu; Gikuyu
      kik                    Kikuyu
      kmb                    Kimbundu
      kin          rw        Kinyarwanda
      kir          ky        Kirghiz
  --- --- ---     (kv)       Komi
      kom                    Komi
      kon          kg        Kongo
      kok                    Konkani
      kor          ko        Korean
  kus kos                    Kosraean
      kpe                    Kpelle
      kro                    Kru
      kua          ok        Kuanyama  [Kwanyama in 639-2]
  --- --- ---     (qm)       Kumyk
      kum                    Kumyk
      kur          ku        Kurdish
      kru                    Kurukh
      kut                    Kutenai
  --- --- ---     (ld)       Ladin
      lad                    Ladino
  --- --- ---     (ly)       Ladino
      lah                    Lahnda
  --- --- ---     (lk)       Lak
      lam                    Lamba
      lao          lo        Lao
      lat          la        Latin
      lav          lv        Latvian
      ltz          lb        Letzeburgesch
      lez         (le)       Lezghian
      lin          ln        Lingala
      lit          lt        Lithuanian
  --- --- ---     (li)       Livonian
      loz                    Lozi
      lub          lu        Luba-Katanga
      lua                    Luba-Lulua
      lui                    Luiseno
      lun                    Lunda
      luo                    Luo (Kenya and Tanzania)
      lus                    Lushai
      mac/mkd *    mk        Macedonian [*** mak earlier? NB Makasar]
      mad                    Madurese
      mag                    Magahi
      mai                    Maithili
      mak                    Makasar
  mla mlg          mg        Malagasy
      may/msa *    ms        Malay
      mal          ml WD1*   Malayalam
      mlt          mt        Maltese
      mdr                    Mandar
      man         (md)       Mandingo
      mni                    Manipuri
      mno                    Manobo languages
      mao/mri *    mi        Maori
      mar          mr        Marathi
      chm          --        Mari
  --- --- ---     (mj)       Mari, Meadow
  --- --- ---     (mm)       Mari, Mountain
      mah         (??)       Marshall (Marshallese)
      mwr                    Marwari
      mas                    Masai
      myn                    Mayan languages
      men                    Mende
      mic                    Micmac
      min                    Minangkabau
  --- --- ---     (??)       Mingrelian
      mis                    Miscellaneous (Other)
      moh                    Mohawk
  --- --- ---     (mh)       Moksha Mordvin
      mol          mo        Moldavian
      mkh                    Mon-Kmer (Other)
      lol                    Mongo (Mongo-Nkundu)
      mon          mn        Mongolian
      mos                    Mossi (Moore (?) in LC list)
      mul                    Multiple languages
      mun                    Munda languages
      nah                    Nahuatl (LC listed earlier as Aztec)
  --- --- ---     (ke)       Nama
      nau          na        Nauru
      nav         (dn)       Navajo (Navaho)
      nde          nd   *    Ndebele, North [nd=3D N. assumed]
      nbl                    Ndebele, South
      ndo          on        Ndonga
  --- --- ---     (nt)       Nenets
      nep          ne        Nepali
      new                    Newari
      nia                    Nias
      nic                    Niger-Kordofanian (Other)
      ssa                    Nilo-Saharan (Other)
      niu                    Niuean
  --- --- ---     (nh)       Nogai (Noghay)
      non                    Norse, Old
      nai                    North American Indian (Other)
  --- nor          no        Norwegian
  --- nno         (nn)       Norwegian - Nynorsk
  --- --- ---     (nb)       Norwegian - Bokm=E5l
      nub                    Nubian languages
      nym                    Nyamwezi
      nya         (ny)       Nyanja
      nyn                    Nyankole
      nyo                    Nyoro
      nzi                    Nzima
  lan oci                    Occitan (Langue d'Oc) (LC: post-500)
      oji          oj        Ojibwa
      ori          or        Oriya
  gal orm          om    **  Oromo (LC differs)
      osa                    Osage
      oss         (ir)       Ossetic (Ossetian)
      oto                    Otomian languages
      pal                    Pahlavi
      pau                    Palauan
      pli         (pv)       Pali
      pam                    Pampanga
      pag                    Pangasinan
      pan          pa        Panjabi
      pap                    Papiamento
      paa                    Papuan-Australian (Other)
  --- --- ---     (fm)       Persian, Middle
      per/fas *    fa        Persian
      peo         (fn!)  *** Persian, Old (ca 600 - 400 B.C.) (fn dupe!)
      phi                    Philippine (Other)
      phn                    Phoenician
      pol          pl        Polish
      pon                    Ponape (was this Pohnpeian too ???)
      por          pt        Portuguese
      pra                    Prakrit languages
      pro         (pi)       Provencal, Old (to 1500) (-1500 in ISO 639-1?)
      pus          ps        Pushto
      que          qu        Quechua
      raj                    Rajasthani
      rap                    Rapanui
      rar                    Rarotongan
     (qaa-qtz)              (Reserved for local use)
      roh          rm        Rhaeto-Romance
      roa                    Romance (Other)
      rum/ron *    ro        Romanian
  --- --- ---     (ry)       Romany; Romani
      rom                    Romany
      run          rn        Rundi
      rus          ru        Russian
  --- --- ---     (??)       Ruthenian (Rusyn, Rusinian, Lemko)
      sal                    Salishan languages
      sam                    Samaritan Aramaic
  lap smi          se        Sami languages
  --- --- ---     (sy)       Sami, Inari
  --- --- ---     (sz)       Sami, Kildin
  --- --- ---     (sx)       Sami, Lule
  --- --- ---     (ds?)      Sami, Northern
  --- --- ---     (sb)       Sami, Skolt
  --- --- ---     (sp)       Sami, Southern
  sao smo          sm        Samoan
      sad                    Sandawe
      sag          sg        Sango
      san          sa        Sanskrit
      sat                    Santali
      srd         (sc)       Sardinian
      sas                    Sasak
      sco         (ll)       Scots, Lowlands (Lallans)
      sel                    Selkup
      sem                    Semitic (Other)
      scc/srp *    sr        Serbian (Serbo-Croat, Cyrillic)
      srr                    Serer
      shn                    Shan
  sho sna          sn        Shona
      sid                    Sidamo
      bla                    Siksika
      snd          sd        Sindhi
  snh sin          si        Sinhalese (Singhalese)
  sgn ---          --        Sign languages [* not expanded further]
      sit                    Sino-Tibetan (Other)
      sio                    Siouan languages
      den                    Slave (Athapascan language)
      sla                    Slavic (Other)
      slo/slk *    sk        Slovak
      slv          sl        Slovenian
      sog                    Sogdian
      som          so        Somali
      son                    Songhai
      snk                    Soninke
      wen ---      --        Sorbian languages (Wendish?)
  --- --- ---     (sf)       Sorbian, Lower
  --- --- ---     (??)       Sorbian, Upper
      nso                    Sotho, Northern
  sso sot          st        Sotho, Southern
      sai                    South American Indian (Other)
      spa --- *    es        Spanish [* were spa/esl; "esp" later!!!!]
      sun          su        Sudanese
      suk                    Sukuma
      sux                    Sumerian
      sus                    Susu
      swa          sw        Swahili
  swz ssw          ss        Swati (Swazi, Siswati, ?Siswant?)
      swe --- *    sv        Swedish [* ISO 639-2/T sve deprecated??]
      syr                    Syriac
  --- --- ---     (tb)       Tabasaran
  tag tgl          tl        Tagalog
      tah         (??)       Tahitian
      tai                    Tai (Other)
  taj tgk          tg        Tajik
      tmh                    Tamashek
      tam          ta        Tamil
  tar tat          tt        Tatar
      tel          te        Telugu
      ter                    Tereno (Terena)
      tet                    Tetum
      tha          th        Thai
      tib/bod *    bo        Tibetan
      tig                    Tigre
      tir          ti        Tigrinya
      tem                    Timne (Temne)
      tiv                    Tivi
      tli                    Tlingit
      tpi                    Tok Pisin
      tkl                    Tokelau
      tog          to        Tonga (Nyasa)
      ton                    Tonga (Tonga Islands)
  tru ---          --     ** Truk  (???????????)
      tsi                    Tsimshian
      tso          ts        Tsonga
  tsw tsn          tn        Tswana
      tum                    Tumbuka
      tur          tr        Turkish
      ota                    Turkish, Ottoman (1500 - 1928)
      tuk          tk        Turkmen
      tvl                    Tuvalu
      tyv                    Tuvinian
      twi          tw        Twi
  --- --- ---     (um)       Udmurt
      uga                    Ugaritic
      uig          ug        Uighur [Infoterm, 1989]
      ukr          uk        Ukrainian
      umb                    Umbundu
      und                    Undetermined
      urd          ur        Urdu
      uzb          uz        Uzbek
      vai                    Vai
  --- --- ---     (??)       Valencian
      ven          ve        Venda
  --- --- ---     (vp)       Veps
      vie          vi        Vietnamese
      vol          vo        Volapuk
      vot                    Votic
      wak                    Wakashan languages
  --- --- ---     (wl)       Walloon
      wal                    Walamo
      war                    Waray
      was                    Washo
      wel/cym *    cy        Welsh
      wol          wo        Wolof
      xho          xh        Xhosa
      sah                    Yakut
      yao                    Yao
      yap                    Yap (Yapese)
  --- --- ---     (yy)       Yi
      yid          yi    *** Yiddish [Infoterm, 1989: ji now deprecated?]
      yor          yo        Yoruba
      ypk                    Yupik languages
      znd                    Zande
      zap                    Zapotec
      zen                    Zenaga
      zha          za        Zhuang [Infoterm, 1989]
      zul          zu        Zulu
      zun                    Zuni
------------------------------------------------------------
 *    highlights changes
 ***  deprecations etc.
 (  ) tentative, mainly in ISO 639-1 draft
------------------------------------------------------------
***  In the web page: Code for the Representation of the Names of
     Languages. From ISO 639, revised 1989", there is the note on
     "Changes</a> made December 20, 1997, based upon information in
     the following note from a member of the W3C HTML group":

     "In 1989, the ISO 639 Registration Authority changed a number of
     codes as follows (the quote is taken from RFC 1766):

     The following codes have been added in 1989 (nothing later):
     ug (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang),
     he (Hebrew, *** replacing iw), yi (Yiddish, *** replacing ji),
     and id (Indonesian, replacing in)."

     3-letter dash codes ( --- ) below (and also 2-letter dash codes
     ( -- ) below) represent areas where there appears to be no code
     in the other code sources.

     In several cases, information on alternative language names are
     my own, assumed from comparing various lists.

John Clews

2 February 2000

                        END OF DOCUMENT

----------------------------------------------------------------

Best regards

John Clews

--
John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: 0171 412 7826 (day); 0171 272 8397 (evening); 01423 888 432 (w/e)
Email: Scripts at sesame.demon.co.uk

Committee Chair of  ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of CEN/TC304: Information and Communications
 Technologies: European Localization Requirements
Committee Member of the Foundation for Endangered Languages;
Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets



More information about the Vyakaran mailing list