6.1302, Sum: Languages with no between-word delimiters (final sum)

The Linguist List linguist at tam2000.tamu.edu
Fri Sep 22 22:01:18 UTC 1995


---------------------------------------------------------------------------
LINGUIST List:  Vol-6-1302. Fri Sep 22 1995. ISSN: 1068-4875. Lines:  174
 
Subject: 6.1302, Sum: Languages with no between-word delimiters (final sum)
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu>
 
Associate Editor:  Ljuba Veselinova <lveselin at emunix.emich.edu>
Assistant Editors: Ron Reck <rreck at emunix.emich.edu>
                   Ann Dizdar <dizdar at tam2000.tamu.edu>
                   Annemarie Valdez <avaldez at emunix.emich.edu>
 
Software development: John H. Remmers <remmers at emunix.emich.edu>
 
Editor for this issue: dizdar at tam2000.tamu.edu (Ann Dizdar)
 
---------------------------------Directory-----------------------------------
1)
Date:  Fri, 22 Sep 1995 08:52:51 EDT
From:  fujii at mackay.cs.umass.edu (Hideo Fujii)
Subject:  Final SUM/Q: languages with no between-word delimiters
 
---------------------------------Messages------------------------------------
1)
Date:  Fri, 22 Sep 1995 08:52:51 EDT
From:  fujii at mackay.cs.umass.edu (Hideo Fujii)
Subject:  Final SUM/Q: languages with no between-word delimiters
 
 
Dear LINGUISTs & NLPASIAns,
 
Thank you very much for sending a lot of valuable information.  This is
the final summary about the languages which don't have delimiters between
'words'.  I am tempted to send all comments, but I give up because of the
troublesome amount.  And also, I eliminated the languages which do have
delimiters [YES group] as the same reason.  If you are interested in keeping
it at your hand, please consult the previous summary (LINGUIST VOL-6-1269).
To update as I did, you can just 1) remove Flemish, 2)add West-Frisian
(according to Henk Wolf. Thank you.) to [YES]:Latin/Greek variations; and
3) move Turkish, Kazakh(??), Azerbaijani(??), Uzbek(??), Kirghiz(??) and
Turkmen(??) feom [YES] to [Partly NO].  Here, (??) indicates a "closely akin"
language to Turkish which produces VERY long words by agglutination and
no space inside it (also by Henk Wolf).
 
I don't have any definite conclusions, but following are observed:
 
  1)  Delimiter-less languages are minority in the world languages.
      Especially [Yes] group are very rare.  There are only 3 languages
      (Chinese, Japanese Tibettan) - 2% of 158 languages.  If we include
      [Partly NO] group (like many indian languages), they are 6% of
      total 158 languages. ["(?)" is counted as 0.5].
      (Chinese, Japanese & Tibettan, all three have script to write
      (traditionally) in Up-to-Down direction, but they do so even now with
      some extent.  This may be a factor of this result, but I'm not
      sure....)
 
  2)  There is NO strong correlation between delimiter-less-ness and
      language typology:  We can observe various types of languages
      in [YES]/[Partly NO] group, e.g., agglutinating (Japanese, Tamil,
      Turkish, etc.), isolating (Chinese), and inflectional (Sanskrit, etc.).
      (How about the polisynthetic??)
      Also, a language type can be observed in both [YES] and [(Partly)NO],
      such as sanskrit([Prtly NO]) and Russian([YES]) for inflecting.
      Between [NO] and [YES] (or at least [Partly NO]), the same holds
      for agglutinating and isolating.  For example, Chinese[NO] and
      Vietnamese[YES]; Japanese[NO] and Hungalian[YES] (or Tamil[Partly NO]).
 
  3)  So, it is NOT quite right to say (and often I listened) that
      "the language L does not have a space between words because
      L is agglutinating."
 
  4)  Latin/Greek and Cyrillic-based languages are big majority
      (70% in our list; 54% Latin/Greek, 16% Cyrillic), and they
      have space as a delimiter between words.  It seems no exceptions
      in modern languages.  (But many exceptions in classic/medival languages.)
 
 
Several people suggested that some languages have 'moderately' long
(verbal/nominal) compounds (e.g. German, Dutch, West-Frisian, etc.)
(vs. above languages with VERY long words).
I am not sure that these compounds are non-lexical, i.e., productive and
semantically transparent (i.e., syntactic compounds).
 
Could someone tell me if these German etc. have prominently syntactic compounds
to make the word "pretty long"?  Or, are they mostly lexical compounds?
Also, if you know some other languages in our list have this (i.e., syntax
compounds are prominent) property, please just send the name of the language.
I will make and post a summary of this new question as a different topic again.
 
We have still many (?)-items in our list.  So, if you are knowledgeable
about these, please let me know (I will wait for in a long run...).  I will
send an addendum to this summary some time later.
 
Finally, I want to express my sincere gratitude to our (37) contributers
to compose this final summary.  The name of contributers are listed at
the end of this summary.  (I hope I didn't miss any name. If it happens,
I sincerely apologize.)
 
Hideo Fujii
University of Massachusetts
    at Amherst
 
 
SUMMARY: Languages Without Delimiters Between 'Words'
(in total 158 languages)
==========================================================
Q: Does the language have word-boundary delimiters?
  A.[NO]:(3) Chinese, Japanese, Tibetan
 
  B.[Partly NO -Words delimited, but need analysis to reach lexical level]:(14)
	 Latin/Greek Variations:
	   Turkish, Turkmen(*2*)(??), Uzbek(*2*)(??)
         Cyrillic-baed:
           Azerbaijani(??), Kazakh(??), Kirghiz(??)
	 Devanagari Variations:
	   Burmese, Khmer, Lao(?), Sanskrit, Thai
	 Others:
	   Kannada(?), Malayalam(?), Tamil
 
  C.[Vertually YES - Easily distinguishable by character form]: (10)
   Arabic Variations:      (10)
  D.[YES]: (131)
   Latin/Greek Variations: (86)
   Cyrillic Variations:    (25)
   Hebrew Variations:      ( 3)
   Devangari Variations:   ( 8)
   Others:                 ( 9)
 
*1* Kurdish also uses Cyrllic, Roman and Armenian.
*2* Moldavian, Turkmen, Uzbek, Mongolian  used (or still is using) Cyrillic
    until recently.
 
List of Contributers
====================
Shanley Allen <allen at mpi.nl>
the Babesther <han at minerva.cis.yale.edu>
Rita Bhandari <bhandari at semlab1.sbs.sunysb.edu>
Doug Cooper <doug at chulkn.car.chula.ac.th>
Peter Daniels <pdaniels at press-gopher.uchicago.edu>
Boris Fridman Mintz <fridman at ucol.mx>
Stefan Frisch <frisch at babel.ling.nwu.edu>
Hideo Fujii <fujii at mackay.cs.umass.edu>
Keith Goeringer <keg at violet.berkeley.edu>
Henry Groover <hgroover at qualitas.com>
Mark Hansell-Mai Hansheng <mhansell at carleton.edu>
Susantha Herath <herath at u-aizu.ac.jp>
Matthew Hurst <matth at cogsci.ed.ac.uk>
Hiroaki Kitano <6500hiro at ucsbuxa.ucsb.edu>
Wolfram Kahl <kahl at hermes.informatik.unibw-muenchen.de>
Jee Eun Kim <jeeeunk at microsoft.com>
Wenchao Li <wcli at vax.ox.ac.uk>
Stuart Luppescu <sl70 at musuko.spc.uchicago.edu>
Greg Lyons <lcgal at mahidol.ac.th>
Duncan MacGregor <aa735 at freenet.carleton.ca>
Stavros Macrakis <macrakis at osf.org>
James Magnuson <magnuson at psych.rochester.edu>
Mark A. Mandel <Mark at ccgate.dragonsys.com>
Alec McAllister <ECL6TAM at lucs-01.novell.leeds.ac.uk>
Philippe Mennecier <ferry at cimrs1.mnhn.fr>
Nicholas Ostler <nostler at chibcha.demon.co.uk>
Peter Paul <Peter.Paul at arts.monash.edu.au>
Gnani Perinpanayagam <gnani at sun3.oulu.fi>
Ellen F. Prince <ellen at central.cis.upenn.edu>
Steve Seegmiller <SEEGMILLER at apollo.montclair.edu>
Dan I. Slobin <slobin at cogsci.Berkeley.EDU>
Achim Stenzel <achim at tiger.toppoint.de>
Jan-Olof Svantesson <Jan-Olof.Svantesson at ling.lu.se>
Joseph Tomei <jtomei at lilim.ilcs.hokudai.ac.jp>
Shravan Vasishth <shravan at lisa.lang.osaka-u.ac.jp>
Allan C Wechsler <Wechsler at world.std.com>
Henk Wolf <H.A.Y.Wolf at stud.let.ruu.nl>
------------------------------------------------------------------------
LINGUIST List: Vol-6-1302.



More information about the LINGUIST mailing list