Indian and other Asian languages listed in ISO 639: feedback sought
John Clews
Emeet at SESAME.DEMON.CO.UK
Wed Feb 2 18:52:23 UTC 2000
VYAKARAN: South Asian Languages and Linguistics Net
Editors: Tej K. Bhatia, Syracuse University, New York
John Peterson, University of Munich, Germany
Details: Send email to listserv at listserv.syr.edu and say: INFO VYAKARAN
Subscribe:Send email to listserv at listserv.syr.edu and say:
SUBSCRIBE VYAKARAN FIRST_NAME LAST_NAME
(Substitute your real name for first_name last_name)
Archives: http://listserv.syr.edu
Indian and other Asian languages listed in ISO 639: feedback sought
Dear list members. I am a member of the Joint Advisory Committee on
ISO 639: Codes for representation of names of languages (abbreviated
to ISO 639: language codes in further discussion below).
This committee meets on 17-18 February 2000 in Washington DC, and I
would be grateful for any information from list members, which would
highlight any major gaps in Indian and other Asian languages, that
are listed below.
If you are interested in the background to ISO 639, read section 1:
if not, and you would like to comment on the codes, and the languages
that have been coded, and any omissions or errors, go to section 2.
1. ISO 639: language codes
ISO 639 is one of many international standards developed by groups of
experts in many countries. This section provides a simplified view.
Put simply, ISO 639's job is to provide simple codes that can be
embedded in (mainly computerised) information systems that can allow
these information systems to highlight language use, or even to
enable useful things like font switching or similar, e.g. on Internet
web sites.
There are older 2-letter codes used, it could be said, mainly in
older, "legacy" system. 3-letter codes (mainly identical with codes
used by the Library of Congress, and in many libraries) have seen the
largest growth, and allow for greater expansion. Actually the two
sets are currently listed in two separate parts of ISO 639,
respectively in ISO [WD] 639[-1] and ISO 639-2.
The Internet Engineering Task Force's specification RFC 1766
recommends the use of ISO 639 codes in Internet uses.
We have also been in discussion with the Summer Institute of
Linguistics (SIL) who use a different (and much larger) set of
3-letter codes in their Ethnologue, codes which have also been used
in some Internet situations.
NB: (a) if you also use Ethnologue codes as well as, or instead of,
codes from ISO 639, I would be glad to hear, and to know in what
circumstances you use those codes.
(b) If you just use the Ethnologue, or some other reference source
where those Ethnologue codes are used, just for reference (i.e. the
reference source is important, not the codes themselves) that's a
different question.
I'd be grateful if you can distinguish (a) from (b) in any replies on
that point. However, regarding (b) it may be useful to know which
other publications/web sites use Ethnologue codes.
2. Opportunity for feedback
The list below is my own handy reference list based on my own
compilation of 3-letter codes from ISO 639-2 and Library of Congress
codes, and the 2-letter codes in ISO WD 639-1. Errors are likely to
be my own, rather than in ISO 639, though I have been fairly careful.
Some of the notes (especially those in square brackets, or using
asterisks) are just for my own reference, and relate only to
information on different editions of ISO 639,and can be ignored.
I'd be particularly interested to know of obvious omissions, or
errors in naming, or where predominant use of language names has
changed.
There are also some fairly basic "genetic codes" where entries for
"xxxx languages" or "xxxx languages (other)." Again, if some language
groups seem to have been omitted altogether I would be glad to know.
In any of the columns, any information including asterisks (*) or
question marks (?) are essentially my own, and for my own use.
If possible could you embed your comments within my quoted table,
unless your comment is very simple on a few lines: that will enable
me to allign comments.
I am primarily interested in omissions, and in language names, but if
you want to suggest a particularly useful 3-letter code, it may be
helpful (although I think that the default approach is likely to be
"use the first three letters of the name in English, or French, or in
the local spelling of the name" if known, allowing for transcription
or transliteration conventions).
Please could you reply direct to me at <Emeet at sesame.demon.co.uk> and
not on the list to avoid too many large repetitive emails
overcrowding the traffic on the South Asian Linguists list at
<VYAKARAN at LISTSERV.SYR.EDU>
I'll post a summary of your comments to <VYAKARAN at LISTSERV.SYR.EDU>
in due course.
If you could reply within a week from reading this email, this is
likely to provide sufficient time also to be able to feed such
information into the ISO 639 Joint Advisory Group meeting on 17-18
January 2000.
3. Handy reference list
Here's the list: I look forward to your comments!
[ Tip: Use a monospace font like Courier for the chart below ]
------------------------------------------------------------
LC ISO 639-2 ISO 639-1 Language name in English
------------------------------------------------------------
--- --- --- (aj) Abaza
abk ab Abkhazian
--- --- --- (ad) Adyge
ace Achinese
ach Acoli
ada Adangme
aar aa Afar
afh Afrihili
afr af Afrikaans
afa Afro-Asiatic (Other)
aka ak Akan
akk Akkadian
alb/sqi * sq Albanian
ale Aleut
alg Algonquian languages
--- --- --- (an) Aragonese
tut Altaic (Other)
amh am Amharic
apa Apache languages
ara ar Arabic
arc Aramaic
arp Arapaho
arn Araucanian (Mapuche)
arw Arawak
arm/hye * hy Armenian
--- --- --- (vl) Aromanian; Arumanian
art Artificial (Other)
--- --- --- (ae) Arvanite
asm as Assamese
--- --- --- (au) Asturian
ath Athapascan languages
aus Australian languages
map Austronesian (Other)
ava av Avaric
ave (fv) Avestan
awa Awadhi
aym ay Aymara
aze az Azerbaijani
ban Balinese
--- --- --- (bq) Balkar
bat Baltic (Other)
bal Baluchi
bam bm Bambara
bai Bamileke languages
bad Banda
bnt Bantu (Other)
bas Basa
bak ba Bashkir
baq/eus * eu Basque
btk Batak (Indonesia)
bej Beja
bel Belarusian [was Byelorussian]
bem Bemba
ben bn Bengali
ber Berber (Other)
bho Bhojpuri
bih bh Bihari
bik Bikol
bin Bini
bis bi Bislama
--- --- --- (bs) Bosnian
bra Braj
bre br Breton
bug Bugis (Buginese)
bul bg Bulgarian
bua Buriat
bur/mya * my Burmese
cad Caddo
car Carib
cat ca Catalan
cau Caucasian (Other)
ceb Cebuano
cel Celtic (Other)
cai Central American Indian (Other)
chg Chagatai
cmc Chamic languages
cha Chamorro
--- --- --- (??) Chamorro
che (nx) Chechen
chr (jl) Cherokee
chy Cheyenne
chb Chibcha
--- --- --- (ch) Chichewa; Chewa
chi/zho * zh Chinese
chn Chinook jargon
chp Chipewyan
cho Choctaw
chu (sj) Church Slavic (Old Church Slavonic)
tru chk Chuukese
chv (cv) Chuvash
cop Coptic
cor kw Cornish
cos co Corsican
cre cr Cree
mus Creek
cpe Creoles & Pidgins, English
cpf Creoles & Pidgins, French
cpp Creoles & Pidgins, Portuguese
crp Creoles & Pidgins (Other)
scr/hrv * hr Croatian (Serbo-Croat, Latin)
cus Cushitic (Other)
cze/ces * cs Czech
dak Dakota
dan da Danish
--- --- --- (dg) Dargwa
day Dayak
del Delaware
din Dinka
div dv Divehi
doi Dogri
dgr Dogrib
dra Dravidian (Other)
dua Duala
dut/nld * nl Dutch
dum Dutch, Middle (ca. 1050-1350)
dyu Dyula
dzo dz Dzongkha
efi (ef) Efik
egy Egyptian (Ancient)
eka Ekajuk
elx Elamite
eng en English
enm English, Middle (ca. 1100-1500)
ang English, Old (ca. 450-1100)
--- --- --- (er) Erzya Mordvin
esp epo eo Esperanto
esk --- -- ** Eskimo (Other) (not in 639-2)
est et Estonian
eth --- -- ** Ethiopic [languages] (not in 639-2)
ewe ee Ewe
ewo Ewondo
fan Fang
fat Fanti
far fao fo Faroese
fij fj Fijian
fin fi Finnish
fiu Finno-Ugrian (Other)
fon Fon
--- --- --- (fp) Franco-Proven=E7al
fre/fra * fr French
frm French, Middle (ca. 1400-1600)
fro French, Old (842- ca. 1400)
fri fry fy Frisian
--- --- --- (??) Frisian, East; Sater Frisian
--- --- --- (fn) Frisian, North (fn! - also in Persian, Old)
fur (fu) Friulian
ful ff Fulah
gaa Ga
gae gla gd Gaelic, Scots [* were gae/gdh]
iri gle ga Gaelic, Irish [* were gai/iri]
max glv gv Gaelic, Manx
--- --- --- (gg) Gagauz
gag glg gl Gallegan (Galician - used in Spain)
lug lg Ganda
gay Gayo
gba Gbaya
eth gez Geez
geo/kat * ka Georgian
ger/deu * de German
gmh German, Middle High
goh German, Old High (ca. 750-1050)
gem Germanic (Other)
--- --- --- (??) German, Low; Low German
gil Gilbertese
gon Gondi
gor Gorontalo
got Gothic
grb Grebo
grc Greek, Ancient (to 1453)
gre/ell * el Greek, Modern (1453-)
kal kl Greenlandic (Kalaallisut)
gua grn gn Guarani
guj gu Gujarati
gwi Gwich'in
hai Haida
hau ha Hausa
haw Hawaiian
heb he *** Hebrew [Infoterm, 1989: iw deprecated?]
her oh Herero
hil Hiligaynon
him Himachali
hin hi Hindi
hmo (??) Hiri Motu, Motu
hit Hittite
hmn Hmong
hun hu Hungarian
hup Hupa
iba Iban
ice/isl * is Icelandic
ibo ig Igbo
ijo Ijo
ilo Iloko
inc Indic (Other)
ine Indo-European (Other)
ind id *** Indonesian [Infoterm, 1989: in deprecated?]
--- --- --- (ng) Ingush
int ina ia Interlingua [* ]
ile ie Interlingue [*note similar lanaguage name]
iku iu Inuktitut [Infoterm, 1989]
ipk ik Inupiaq (was Inupiak)
ira Iranian (Other)
sga Irish, Old (to 900)
mga Irish, Middle (900 - 1200)
iro Iroquoian languages
--- --- --- (rx) Istro-Romanian
ita it Italian
jpn ja Japanese
jav/jaw * jv/jw * Javanese [jw (Jawi?) now deprecated??]
jrb Judeo-Arabic
jpr Judeo-Persian
--- --- --- (qb) Kabardian
kab Kabyle
kac Kachin
kal Kalaallisut [renamed]
--- --- --- (xl) Kalmyk
kam Kamba
kan kn Kannada
kau kr Kanuri
--- --- --- (qc) Karachay
--- --- --- (qr) Karaim
kaa Kara-Kalpak
--- --- --- (kj) Karelian, North (Other Karelian too?)
kar Karen
kas ks Kashmiri
--- --- --- (??) Kashubian
kaw Kawi
kaz kk Kazakh
kha Khasi
cam khm km ** Khmer (LC was once "cam")
khi Khoisan (Other)
kho Khotanese
--- --- --- (ki) Kikuyu; Gikuyu
kik Kikuyu
kmb Kimbundu
kin rw Kinyarwanda
kir ky Kirghiz
--- --- --- (kv) Komi
kom Komi
kon kg Kongo
kok Konkani
kor ko Korean
kus kos Kosraean
kpe Kpelle
kro Kru
kua ok Kuanyama [Kwanyama in 639-2]
--- --- --- (qm) Kumyk
kum Kumyk
kur ku Kurdish
kru Kurukh
kut Kutenai
--- --- --- (ld) Ladin
lad Ladino
--- --- --- (ly) Ladino
lah Lahnda
--- --- --- (lk) Lak
lam Lamba
lao lo Lao
lat la Latin
lav lv Latvian
ltz lb Letzeburgesch
lez (le) Lezghian
lin ln Lingala
lit lt Lithuanian
--- --- --- (li) Livonian
loz Lozi
lub lu Luba-Katanga
lua Luba-Lulua
lui Luiseno
lun Lunda
luo Luo (Kenya and Tanzania)
lus Lushai
mac/mkd * mk Macedonian [*** mak earlier? NB Makasar]
mad Madurese
mag Magahi
mai Maithili
mak Makasar
mla mlg mg Malagasy
may/msa * ms Malay
mal ml WD1* Malayalam
mlt mt Maltese
mdr Mandar
man (md) Mandingo
mni Manipuri
mno Manobo languages
mao/mri * mi Maori
mar mr Marathi
chm -- Mari
--- --- --- (mj) Mari, Meadow
--- --- --- (mm) Mari, Mountain
mah (??) Marshall (Marshallese)
mwr Marwari
mas Masai
myn Mayan languages
men Mende
mic Micmac
min Minangkabau
--- --- --- (??) Mingrelian
mis Miscellaneous (Other)
moh Mohawk
--- --- --- (mh) Moksha Mordvin
mol mo Moldavian
mkh Mon-Kmer (Other)
lol Mongo (Mongo-Nkundu)
mon mn Mongolian
mos Mossi (Moore (?) in LC list)
mul Multiple languages
mun Munda languages
nah Nahuatl (LC listed earlier as Aztec)
--- --- --- (ke) Nama
nau na Nauru
nav (dn) Navajo (Navaho)
nde nd * Ndebele, North [nd=3D N. assumed]
nbl Ndebele, South
ndo on Ndonga
--- --- --- (nt) Nenets
nep ne Nepali
new Newari
nia Nias
nic Niger-Kordofanian (Other)
ssa Nilo-Saharan (Other)
niu Niuean
--- --- --- (nh) Nogai (Noghay)
non Norse, Old
nai North American Indian (Other)
--- nor no Norwegian
--- nno (nn) Norwegian - Nynorsk
--- --- --- (nb) Norwegian - Bokm=E5l
nub Nubian languages
nym Nyamwezi
nya (ny) Nyanja
nyn Nyankole
nyo Nyoro
nzi Nzima
lan oci Occitan (Langue d'Oc) (LC: post-500)
oji oj Ojibwa
ori or Oriya
gal orm om ** Oromo (LC differs)
osa Osage
oss (ir) Ossetic (Ossetian)
oto Otomian languages
pal Pahlavi
pau Palauan
pli (pv) Pali
pam Pampanga
pag Pangasinan
pan pa Panjabi
pap Papiamento
paa Papuan-Australian (Other)
--- --- --- (fm) Persian, Middle
per/fas * fa Persian
peo (fn!) *** Persian, Old (ca 600 - 400 B.C.) (fn dupe!)
phi Philippine (Other)
phn Phoenician
pol pl Polish
pon Ponape (was this Pohnpeian too ???)
por pt Portuguese
pra Prakrit languages
pro (pi) Provencal, Old (to 1500) (-1500 in ISO 639-1?)
pus ps Pushto
que qu Quechua
raj Rajasthani
rap Rapanui
rar Rarotongan
(qaa-qtz) (Reserved for local use)
roh rm Rhaeto-Romance
roa Romance (Other)
rum/ron * ro Romanian
--- --- --- (ry) Romany; Romani
rom Romany
run rn Rundi
rus ru Russian
--- --- --- (??) Ruthenian (Rusyn, Rusinian, Lemko)
sal Salishan languages
sam Samaritan Aramaic
lap smi se Sami languages
--- --- --- (sy) Sami, Inari
--- --- --- (sz) Sami, Kildin
--- --- --- (sx) Sami, Lule
--- --- --- (ds?) Sami, Northern
--- --- --- (sb) Sami, Skolt
--- --- --- (sp) Sami, Southern
sao smo sm Samoan
sad Sandawe
sag sg Sango
san sa Sanskrit
sat Santali
srd (sc) Sardinian
sas Sasak
sco (ll) Scots, Lowlands (Lallans)
sel Selkup
sem Semitic (Other)
scc/srp * sr Serbian (Serbo-Croat, Cyrillic)
srr Serer
shn Shan
sho sna sn Shona
sid Sidamo
bla Siksika
snd sd Sindhi
snh sin si Sinhalese (Singhalese)
sgn --- -- Sign languages [* not expanded further]
sit Sino-Tibetan (Other)
sio Siouan languages
den Slave (Athapascan language)
sla Slavic (Other)
slo/slk * sk Slovak
slv sl Slovenian
sog Sogdian
som so Somali
son Songhai
snk Soninke
wen --- -- Sorbian languages (Wendish?)
--- --- --- (sf) Sorbian, Lower
--- --- --- (??) Sorbian, Upper
nso Sotho, Northern
sso sot st Sotho, Southern
sai South American Indian (Other)
spa --- * es Spanish [* were spa/esl; "esp" later!!!!]
sun su Sudanese
suk Sukuma
sux Sumerian
sus Susu
swa sw Swahili
swz ssw ss Swati (Swazi, Siswati, ?Siswant?)
swe --- * sv Swedish [* ISO 639-2/T sve deprecated??]
syr Syriac
--- --- --- (tb) Tabasaran
tag tgl tl Tagalog
tah (??) Tahitian
tai Tai (Other)
taj tgk tg Tajik
tmh Tamashek
tam ta Tamil
tar tat tt Tatar
tel te Telugu
ter Tereno (Terena)
tet Tetum
tha th Thai
tib/bod * bo Tibetan
tig Tigre
tir ti Tigrinya
tem Timne (Temne)
tiv Tivi
tli Tlingit
tpi Tok Pisin
tkl Tokelau
tog to Tonga (Nyasa)
ton Tonga (Tonga Islands)
tru --- -- ** Truk (???????????)
tsi Tsimshian
tso ts Tsonga
tsw tsn tn Tswana
tum Tumbuka
tur tr Turkish
ota Turkish, Ottoman (1500 - 1928)
tuk tk Turkmen
tvl Tuvalu
tyv Tuvinian
twi tw Twi
--- --- --- (um) Udmurt
uga Ugaritic
uig ug Uighur [Infoterm, 1989]
ukr uk Ukrainian
umb Umbundu
und Undetermined
urd ur Urdu
uzb uz Uzbek
vai Vai
--- --- --- (??) Valencian
ven ve Venda
--- --- --- (vp) Veps
vie vi Vietnamese
vol vo Volapuk
vot Votic
wak Wakashan languages
--- --- --- (wl) Walloon
wal Walamo
war Waray
was Washo
wel/cym * cy Welsh
wol wo Wolof
xho xh Xhosa
sah Yakut
yao Yao
yap Yap (Yapese)
--- --- --- (yy) Yi
yid yi *** Yiddish [Infoterm, 1989: ji now deprecated?]
yor yo Yoruba
ypk Yupik languages
znd Zande
zap Zapotec
zen Zenaga
zha za Zhuang [Infoterm, 1989]
zul zu Zulu
zun Zuni
------------------------------------------------------------
* highlights changes
*** deprecations etc.
( ) tentative, mainly in ISO 639-1 draft
------------------------------------------------------------
*** In the web page: Code for the Representation of the Names of
Languages. From ISO 639, revised 1989", there is the note on
"Changes</a> made December 20, 1997, based upon information in
the following note from a member of the W3C HTML group":
"In 1989, the ISO 639 Registration Authority changed a number of
codes as follows (the quote is taken from RFC 1766):
The following codes have been added in 1989 (nothing later):
ug (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang),
he (Hebrew, *** replacing iw), yi (Yiddish, *** replacing ji),
and id (Indonesian, replacing in)."
3-letter dash codes ( --- ) below (and also 2-letter dash codes
( -- ) below) represent areas where there appears to be no code
in the other code sources.
In several cases, information on alternative language names are
my own, assumed from comparing various lists.
John Clews
2 February 2000
END OF DOCUMENT
----------------------------------------------------------------
Best regards
John Clews
--
John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: 0171 412 7826 (day); 0171 272 8397 (evening); 01423 888 432 (w/e)
Email: Scripts at sesame.demon.co.uk
Committee Chair of ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of CEN/TC304: Information and Communications
Technologies: European Localization Requirements
Committee Member of the Foundation for Endangered Languages;
Committee Member of ISO/IEC/JTC1/SC2: Coded Character Sets
More information about the Vyakaran
mailing list