[Corpora-List] About Part of Speech in English and Chinese

Xiao, Zhonghua z.xiao at lancaster.ac.uk
Wed Nov 4 11:06:26 UTC 2009


Dear all,

 

I think it important to keep lexical and syntactic levels separate for consistent analysis in context. Part-of-speech is a lexical-level phenomenon so that both instances of "church" as in "This is an old church with a tall church tower" are tagged as a noun. Modification is a syntactic level phenomenon. Nouns, like adjectives, can modify other nouns (as in traditional grammar), i.e. nouns and adjectives can have the same function as noun modifiers. In this way, "woman" in examples such as "a pretty woman" and "a woman doctor" is always analysed as a noun.

 

The situation is Chinese is more complex, because there is a rather loose connection between word classes at lexical level and their grammatical functions at syntactic level: even adjectives and nouns can be used as predicates in this language; and even verbs can be used directly as subjects etc which are usually taken up by nouns or non-finite verbs in English (because there is no such thing as non-finite verbs in Chinese). That's why some popular Chinese taggers have used POS tags such as VN (verbs used as nouns in context), AD (adjectives used directly as adverbials in context), and AN (adjectives with a nominal function in context).

 

 

Richard Xiao

 


________________________________

From: corpora-bounces at uib.no on behalf of Linas Vepstas
Sent: Mon 02/11/2009 16:12
To: Mike Maxwell
Cc: corpora
Subject: Re: [Corpora-List] About Part of Speech in English and Chinese



2009/11/2 Linas Vepstas <linasvepstas at gmail.com>:
> 2009/11/2 Mike Maxwell <maxwell at umiacs.umd.edu>:
>> a rule like
>>   NP --> Det Adj* N+
>> (that's a flat version; one might have intermediate levels of structure).
>>  This analysis would account for the following distinction:
>>   a tall church tower
>>  *a church tall tower

I failed to make my intended point.  Suppose, for a moment, that
almost all churches had an architectural component called a
"tall tower", so that guidebooks and architectural digests might
sometimes talk about the "tall tower" of a church.  Were this the case,
then it would be just fine to talk about "a church tall tower", because
everyone would know that a "tall tower" was the primary semantic
entity, and so "church" would simply be a _nn noun-modifier to this
semantic entity.  So then, in this example, one could validly say
"church tall tower"  in those cases where one had to distinguish
between  the "tall tower" of a church, and the "tall tower" of something
else.

That my artificial example occurs in real life is witnessed in biomedical
literature, where there are many "architectural" structures having
names in the form of "adj-noun", and must then be further refined
by using additional modifiers, which may be nouns or adjectives.
Examples below.

> We extracted human umbilical vein endothelial cells.
> We extracted smooth muscle myosin heavy chain protein.
> We extracted peripheral blood mononuclear cells.
> It is located on the nuclear envelope inner membrane.
> We extracted simian virus large T-antigen.
> We collected HTLV-I infected T-cells.

To correctly parse these sentences, one must isolate the primary
or dominant semantic entity (e.g. "endothelial cells") and then
search for modifiers (e.g. "human umbilical vein")  Writing rules
for this is not easy.

--linas

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list