Date:  Wed, 26 Aug 1998 11:44:07 +0800 (SST)
From:  Ji Donghong <dhji at krdl.org.sg>
Subject:  POS is well-formed or not well-formed?

Some time ago, I posed two queries (section 1 in the following sum)
about part-of-speech based on syntactic distribution. I am very
thankful for the researchers listed in section 2, who replied to the
queries. The typical answers are listed in section 3. Some references
they mentioned are listed in section 4. In addition, I present my
personal conclusion about the problem in section 5 just for your
information. In order to make the researchers who are not familiar
with Chinese understand more clearly about my posing the queries, I
list one open question, i.e., the first question in section 6. The
other question in section 6 may also be interesting.

Thank you very much.

With best regards,

Ji Donghong

                         SUM: WHAT'S BEHIND PART-OF-SPEECH?


Query A:

In Chinese, there are fewer affixes for us to classify words into
categories, e.g., nouns, verbs or adjectives, etc., so even up to now,
there has been no information about POS for Chinese words in the most
famous Chinese dictionary, i.e., Modern Chinese Dictionary.

   Some linguists proposed that Chinese words be classified as nouns,
verbs and adjectives, etc. completely based on their grammatical
distribution, which they referred to as their ability to combine with
other words.

  My questions are:

1) Can such grammatical distribution be solely used as a means to
determine POS of words?

2) Are there any similar problems in other languages? How to solve the
problem there?

Query B:

Several days ago, I posed a query "what's behind part-of-speech?", up
to now, more than 10 researchers have replied me. Now I would like to
pose another query on the topic before presenting a summarisation:

Q: Is the part-of-speech based on syntactic distribution a WELL-FORMED

Any comments or information will be highly appreciated.


Adam Kilgarriff
Geoffrey Sampson
Marcia Haag
Philip Resnik
Sun Honglin
Joseph Davis
Christopher Hogan
Frantisek Cermak
Waruno Mahdi
Atro Voutilainen
Rob Freeman
Vctor Vzquez Martnez
Bingfu Lu
Alex Murzaku
Alexis Manaster Ramer
Lua Kim Teng
Earl Herrick
Xu Jie
Guo Jin
Dan Maxwell
Elaine Jones
Anne-Line Graedler
Steven Schaufele
Robin Sackmann


1) Some doubted whether categories such as N, V, ADJ etc. are good
analytic categories for Chinese language, and that they may be
inappropriate imports from the West.

2) Some pointed that grammatical distribution or functions are the
standard, or primary way to classify POS. The reason mentioned include
that the definition is clear and useful, or at least more so than
alternatives.  Some others proposed that syntactic valency be used to
define POS among all syntactic means.

3) Some argued that grammatical distribution should not be used to
determine lexical categories. The reasons mentioned include that there
are predicate nouns, attributive verbs, sentential subjects, etc.

4) Some pointed that it is hardly surprising that grammarians have had
trouble classifying Chinese words into parts of speech.  The reason is
the notion of "part-of-speech" is fraught with difficulties in
linguistics, to the extent that many western linguists since 1900 have
abandoned it altogether (though Chomsky did explicitly reintroduce the
ancient notion in 1957 in his generative grammar).

5) Some replied the queries indirectly, pointing that the fact that
POS disambiguation can be done on the basis of linguistically
motivated contextual rules suggests that parts of speech are
syntactically motivated or syntactically definable).

6) Some pointed that POS is not a particularly well-formed concept,
not in the sense that you can define universally accepted unambiguous
classes, no labelling will be objective and absolute, even the
classical interpretations are uncertain. The reasons mentioned include
that when you assign POS, you are partitioning a continuum of
association behaviour.  Further, they held that for language
processing systems, POS is a misleading concept, and that we are
better off thinking about the continuous reality of syntactic
associativity, rather than trying to label it and pretend it is

7) Some pointed that ultimate criterion for POS should be meaning. The
reasons mentioned include that although syntactic features are very
limited, the combination of these features is, if not infinite, a huge

8) Some pointed that outside of phonetics perhaps, there seems to be
no concept in linguistics which is well-defined enough so given a
language we can mechanically identify instances of that concept. They
also pointed that linguistic concepts, whether part-of-speech,
subject, or anything else, come into existence on the basis of someone
describing one or a small number of languages, producing a term which
refers to fairly (though not always precisely) well-defined set of
entities in that (those) lg(s), and then the same person or more
likely others trying to use the same term for entities in some other
language(s) which SEEM to have something in common with those in the
original language(s).

9) Some pointed that POS may be taken somewhat for granted by the
linguistics community, linguists come to the task of defining POS with
a preconceived notion of what it is they want to define, and then seek
criteria that support these ideas. They also pointed for a given set
of parts of speech, it is quite possible to find distributional
evidence that pick out that set and nothing else, but that that set is
by no means unique, and that many other possible sets may be supported
by the data.

10) Some pointed that even today, the Parts Of Speech, as they are
taught in the schools to English-speaking school children, are an
illogical, messy list, two of them, the Noun and the Verb, have
semantic definitions masquerading as grammatical/syntactic
definitions, and the others have more or less syntactic definitions in
terms of the Noun and the Verb.

11) Some pointed that there is nothing wrong when defining POS based
on grammatical functions, rather, the problem is that we always have a
pre- defined POS system, then the distribution is called just as a
means to justify the system, which is very subjective.


Zhao Yuanren A Grammar of Spoken Chinese

Ferdinand de Saussure, Cours de linguistique generale;

Otto Jespersen, The Philosophy of Grammar;

Edward Sapir, Language.

Ellen Contini-Morava's "Introduction" and William Diver's "Theory" in
Contini-Morava and Goldberg's volume "Meaning as Explanation: Advances
in Linguistic Sign Theory," Mouton de Gruyter, 1995.

Schachter, Paul.  _Parts-of-speech Systems_.  In Language typology and
syntactic description. Timothy Shopen, ed.  Cambridge: Cambridge
University Press, 1992, pp. 3--61.

Radford, Andrew. Transformational Grammar: A First Course. Cambridge:
Cambridge University Press, 1992.

Gabelentz, Georg von der, 1886, "Zur chinesischen Sprache und zur
allgemeinen Grammatik", Internationale Zeitschrift fu"r allgemeine
Sprachwissenschaft_ 3:92-109 (see there p. 100).

Le van Ly, 48, _Le parler vietnamien. Esquisse d'une grammaire
vietnamienne_.  Paris: Huong Anh.

Martini, Francois, 1950, "L'opposition nom et verbe en vietnamien et
en siamois", _Bulletin de la Societe de Linguistique de Paris_

Trnka, Bohumil, 1966, "On the Basic Categories of Syntagmatic
Morphology", _Traveaux Linguistiques de Prague_ 2:165-169.

Mahdi, Waruno, 1993, "Distinguishing Homonymic Word Forms in
Indonesian", pp. 181-218 in Ger P. Reesink (ed.) _Topics in
Descriptive Austronesian inguistics_, Semaian 11. Leiden: Vakgroep
Talen en Culturen van ZO Asien en Oceanie.

Rygaloff, A., 1958, "La classe nominale en chinois:
determine/indetermine", Bulletin de la Societe de Linguistique de
Paris_ 53:306-315.

Hinrich Shutze, "Dimensions of Meaning"

Chu, Fa-Kao; "Word classes in classical Chinese"; in Proceedings of
the IXth Congress of linguistics; The Hague 196, p. 594.

Hagege,Claude; "Le probleme linguistique de prepositions et la
solution chinoise"; Louvain, Peeters, 1975.

Sasse, Hans-Jurgen; "Syntactic categories and sub-categories"in
J. Jacobs et al.; "Syntax. Ein internationales Handbuch der
zeitgenossicher Forschung", Walter de Gruyter, Berlin, 1994.

1995 On the subject of Malagasy imperatives. Oceanic Linguistics 34:

1994 On the origin of the term 'ergative'. Sprachtypologie und
Universalienforschung 47(3): 207-210.

1993 Malagasy and the subject/topic issue. Oceanic Linguistics 31:

1992 On intensional vs. extensional grammatical categories. Papers
from the Second Annual Meeting of the Southeast Asian Linguistics
Society (ed. Karen L. Adams and Thomas John Hudak), 201-212. Tempe,
AZ: Arizona State University Program for Southeast Asian Studies.

What's a topic in the Philippines? Papers from the First Annual
Meeting of the Southeast Asian Linguistics Society (ed. Martha Ratliff
and Eric Schiller), 271-291. Arizona State University Program for
Southeast Asian Studies Monograph Series.

1988 What about Lisu?  Languages of the Tibeto-Burman Area 11(2):

Karen L. Adams and AMR. Some questions of topic/focus choice in
Tagalog.  Oceanic Linguistics 27: 79-101.

James D. McCawley's 1992 paper "Justifying Part-of-Speech Assignments
in Mandarin Chinese", Journal of Chinese Linguistics_ vol 20, no. 2,
pp. 211-245.

Sadock (1990) "Parts of speech in Autolexical Syntax", in McCawley
(1988) The Syntactic Phenomena of English.

Vonen, Arnfinn Muruvik. 1997. Parts of Speech and Linguistic Typology.
Open Classes and Conversion in Russian and Tokelau. (Acta Humaniora
No. 22).  Oslo: Universitetsforlaget. (ISBN 82-00-12685-4)

Sackmann, Robin, 1996, The problem of "adjectives" in Mandarin
Chinese, in Sackmann, Robin (ed.) Theoretical linguistics and
grammatical description.  Amsterdam etc.: John Benjamins Publishing
Co. p.257-275.


My personal conclusion is that POS based on syntactic distribution is
not a well-formed concept. The reasons are that:

1) Non-operable.

  For a word of a given language, what is its syntactic distribution?
It seems that there is no clear definition. The most natural modelling
for the syntactic distribution of a word may be the context in which
the word can occur, however we cannot list all in any sense.

2) Non-deterministic:

  Even if we can select, based on whatever reasons, a definite set of
distributional evidences, e.g., contexts, functions or co-occurrences,
as criteria to define the POS system for a language, there should
exist many many classes, and many many classifications for the whole
word set.  It seems that we don't have any reasonable reason to choose
a particular classification among all as the POS system for the
considered language.

3) Non-provable or non-justifiable:

  Even if we can select a particular classification as the POS system
based on whatever reasons, it seems that there is no sense in which we
can say that the selected POS system is correct or incorrect. The
deeper reason for this problem may be that distributional theories
about POS don't care about WHAT (is the part of speech, e.g., nouns,
verbs, etc. of a language?), only care about HOW (to construct a POS
system for a language?), or at least they equalise WHAT and HOW and
don't care about the distinction between them.  Thus it may be
difficult for us to justify a POS system for a language, or compare
different POS systems for a language in a significant sense.


1) Suppose that we are given a language, which is just like English,
however without any affixes, e.g., -ment, -ing, -ed, -tion, -sion,
etc., So the following are all possible phrases in the language: make
develop; develop country; develop product, etc. Now the problem is:
How to determine the distribution-based POS system for the language?
(The case is roughly like that in Chinese.)

2) If POS based on distribution is not well-formed, what possible
influences can the non-well-formedness have on the syntactic theories
built based on POS?

