6.1278, Sum: Languages with No Between-word Delimiters

The Linguist List linguist at tam2000.tamu.edu
Wed Sep 20 16:02:10 UTC 1995


---------------------------------------------------------------------------
LINGUIST List:  Vol-6-1278. Wed Sep 20 1995. ISSN: 1068-4875. Lines:  152
 
Subject: 6.1278, Sum:  Languages with No Between-word Delimiters
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu>
 
Associate Editor:  Ljuba Veselinova <lveselin at emunix.emich.edu>
Assistant Editors: Ron Reck <rreck at emunix.emich.edu>
                   Ann Dizdar <dizdar at tam2000.tamu.edu>
                   Annemarie Valdez <avaldez at emunix.emich.edu>
 
Software development: John H. Remmers <remmers at emunix.emich.edu>
 
Editor for this issue: hdry at emunix.emich.edu (Helen Dry)
 
---------------------------------Directory-----------------------------------
1)
Date:  Tue, 19 Sep 1995 13:23:16 EDT
From:  fujii at mackay.cs.umass.edu (Hideo Fujii)
Subject:  2nd Summary: Languages with no between-word delimiters
 
---------------------------------Messages------------------------------------
1)
Date:  Tue, 19 Sep 1995 13:23:16 EDT
From:  fujii at mackay.cs.umass.edu (Hideo Fujii)
Subject:  2nd Summary: Languages with no between-word delimiters
 
 
Dear Collegues,
 
This is the second summary about the languages with no delimiters (e.g.,
space) for word boundaries.  Many people sent me valuable information.
I appreciate following contributers:
 
	 Shanley Allen <allen at mpi.nl>
	 Rita Bhandari <bhandari at semlab1.sbs.sunysb.edu>
	 Doug Cooper <doug at chulkn.car.chula.ac.th>
	 Peter Daniels <pdaniels at press-gopher.uchicago.edu>
	 Stefan Frisch <frisch at babel.ling.nwu.edu>
	 Keith Goeringer <keg at violet.berkeley.edu>
	 Mark Hansell-Mai Hansheng <mhansell at carleton.edu>
	 Susantha Herath <herath at u-aizu.ac.jp>
	 Matthew Hurst <matth at cogsci.ed.ac.uk>
	 Wolfram Kahl <kahl at hermes.informatik.unibw-muenchen.de>
	 Jee Eun Kim <jeeeunk at microsoft.com>
	 Hiroaki Kitano <6500hiro at ucsbuxa.ucsb.edu>
	 Wenchao Li <wcli at vax.ox.ac.uk>
	 Stuart Luppescu <sl70 at musuko.spc.uchicago.edu>
	 Duncan MacGregor <aa735 at freenet.carleton.ca>
	 Stavros Macrakis <macrakis at osf.org>
	 Philippe Mennecier <ferry at cimrs1.mnhn.fr>
	 Boris Fridman Mintz <fridman at ucol.mx>
	 Nicholas Ostler <nostler at chibcha.demon.co.uk>
	 Peter Paul <Peter.Paul at arts.monash.edu.au>
	 Gnani Perinpanayagam <gnani at sun3.oulu.fi>
	 Ellen F. Prince <ellen at central.cis.upenn.edu>
	 Steve Seegmiller <SEEGMILLER at apollo.montclair.edu>
	 Dan I. Slobin <slobin at cogsci.Berkeley.EDU>
	 Jan-Olof Svantesson <Jan-Olof.Svantesson at ling.lu.se>
	 Allan C Wechsler <Wechsler at world.std.com>
 
 
I had a problem to classify languages into two groups: ones which have
delimiters for words, others don't.  Some languages don't have delimiters,
but eventually words are separable by superficial cue succh as letter
form as seen in Arabic.  Other languages are opposite - it does have
delimiters, but virtually we need more analysis to get "reasonable"(I know
it's vague!) units - words are so long because of glueing the elements,
like Tamil.
 
I understand that this glueing and typological agglutinating (or poly-
synthesizing) are different matter.  But, it may have some correlation
between them.  Could someone tell me what is the typological class
(Agglutinating, Polysynthesis, etc.) of languages of "NO" and "Partly NO"?
- let's ignore gray zone, and consider only storong or typical ones!
 
I got an impression that Devangari-based languages in "Partly NO" group
are Agglutinating languages.  Isn't it correct?
(I know Japanese are Agglutinating, and Chinese is Isolating.  Sanskrit
is inflecting, isn't it?)
 
So, I finally decided to classify in four groups: "NO delimiters", "Partly NO",
"Virtually YES", and "YES, it has delimiters". (I did't consider as YES, or
Virtually YES for languages which are segmentable to morphemes by every
character like in Chinese, because I wanted bigger lexical units than
morphemes.)
 
I will submit the final summary next time.  If you find errors in this list,
or some special comment please send a message directly to me.
Especially, I'm afraid of misclassification between "Partly NO" and "YES".
 
Here is the list:
=======================================================================
Q: Does the language have word-boundary delimiters?
  A.[NO]:(3) Chinese, Japanese, Tibetan
 
  B.[Partly NO - Words delimited, but need analysis to reach lexical level]
   (7)	 Devanagari-based:
	   Burmese, Khmer, Lao(?), Malayalam(?), Sanskrit, Tamil, Thai
 
  C.[Vertually YES - Easily distinguishable by character form]
   (8)	 Arabic-based:
	   Arabic, Dari, Kurdish(*1*), Malay, Pashto, Persian(Farsi),
	   Sinhi, Urdu
 
  D.[YES]: (133)
   Latin/Greek-baed:
     (89)  Acholi, Afrikaans, Akan(Twi), Balinese, Bambara, Bantu, Basque,
	   Berber, Breton, Buluba-Lulua, Caddoan, Catalan, Chikaranga,
	   Chippewa(Ojibwa), Choctaw, Cree, Croatian, Czech, Dakota(Sioux),
	   Danish, Dutch, English, (Esperanto), Estonian, Ewe, Fijian,
	   Filipino, Finnish, Flemish, French, Fulani(Fulbe), Gaelic,
	   Gaelic, German, Greek, Guarani, Harari, Hausa, Hawaiian,
	   Hungarian, Icelandic, Igbo, Indonesian, Iroquoian, Italian,
	   Javanese, Kanuri, Khasi, Kongo, Lappish, Latvian, Lithuanian,
	   Lu-Ganda, Makua, Malagasy, Malay, Maltese, Mandingo, Maori,
	   Mapudungu, Masai, Moldavian(*2*), Nyanja, Nama, Navajo, Norwegian,
	   Polish, Portuguese, Quechua, Rhaeto-Romantic, Romanian, Romany,
	   Samoan, Sundanese, Sangs, Slovak, Slovene, Somali, Spanish,
	   Swahili, Swedish, Tagalog, Turkish, Turkmen(*2*), Uzbek(*2*),
	   Vietnamese, Welsh, Yoruba, Zulu
   Cyrillic-baed:
     (26)  Avar, Azerbaijani, Bashkhir, Belorussian, Bulgarian, Buryat,
	   Chechen, Chuvash, Kabardian, Kalmyk, Kazakh, Kirghiz, Komi, Mari,
	   Macedonian, Mongolian(*3*), Nivkh, Russian, Ossetian,
	   Sebian, Serbo-Croat, Tajik, Tatar, Udmurt, Ukrainian,
	   Yakut
   Hebrew:
      (3)  Hebrew(modern), Ladino(Judio-Spanish), Yeddish
   Devanagari-based:
      (7)  Assamese, Bengali, Hindi, Nepali, Telugu, Sinhalese
   Others:
      (7)  Amharic(Ethiopian)(?), Armenian(modern), Cherokee, Georgian,
	   Inuktitut(Eskimo), Korean, Punjabi
   ?  (1): Manchu
 
*1* Kurdish also uses Cyrllic, Roman and Armenian.
*2* Moldavian, Turkmen, Uzbek used Cyrillic until recently.
*3* Mongolian has both Uigur-derived script and Cyrillic as official.
 
 
Following are languages which I don't have data yet:
  Buginese, 		Kannada, 		Kashmirti,
  Lahnda,		Marathi
 
 ==============================================================================
- Hideo Fujii
  U. of Massachusetts
------------------------------------------------------------------------
LINGUIST List: Vol-6-1278.



More information about the LINGUIST mailing list