shallow parsing
Peter G. Liu
petergangliu at
Tue Apr 3 14:54:34 UTC 2001
in our constraint-based chart parser I used an HPSG-similar grammar format
(which can be seem as a shallow version of it).
1. the shallow issue
It is shallow mainly in the following sense: It does not support the
extensive DAG embedding. It does not support the mechnism for long distance
dependence, no gap, no filler. Though the syntactic construction goes as
high as to S node and also forms trees of WH/YesNo questions, only phrase
level categories such as NP, PP, VP, ADJP, ADVP etc are used extensively (to
provide morphological and syntactic information for information extraction
in our case)
Basically the grammar format is a mixture of PATRII and HPSG. It is
HPSG-similar in the following snese: It handels complements and adjucts in
the way as normally done with the X-Bar schema. It supoorts the
head-principle and the principle of semantic contribution (it does not
support the subcategorisation principle, but there might be a easy way to
implement this principle to some extend).
The system has ca. 100 rules (part of them are lexical rules). As a shallow
parser, it has processeed ca. ten thousands of emails so far supporting the
information extraction.
2. Examples:
a) the modified DAG structure
the feature structure (a simplified HPSG format)
|phon: _ |
|syn |head:|cat: _ | | |
| | |[feature]:_ | | |
| | |subtype: _ | | |
| |subcat: _ | |
| |
|sem |pred : _ | |
| |modlist :_ | |
| |arglist :_ | |
b) rule examples (for simplicity, we just take the string as a base
predicate such as book for BOOK(x))
# 100 VP V NP PP
> = V
>V.syn.subcat = %v2ppdir
> = N
> = acc
> = PREP
>VP.syn.head < V.syn.head
>VP.sem.pred < {pred, V.phon}
>VP.sem.arglist < {theme, NP.phon}
>VP.sem.arglist < {destination, PP.sem.arglist}
# 100 S VP
> = V
>VP.syn.head.tense = %present
>VP.syn.head.person = not!per3
>VP.syn.head.subtype = not!infinitive
>VP.syn.subcat = nil
>S.syn.head < VP.syn.head
>S.syn.head.person < per2
>S.syn.head.ctype < command
>S.sem < VP.sem
>S.sem.arglist < {agent, theMessageReceiver}
c) syntactic construction and semantic composition (output result)
(0 - 5) S : VP .
Rule number: 32
Edge number: 249
Category: S
cat V
num _
person per2
case nom
aux _
tense present
subtype not!base
s-type command
subcat []
sem (pred, put)
arglist (theme, it)(destination, your commercial)(agent,
phon [<S> , [<VP> , [<V> , [<V> , put]], [<NP> , [<PRON> , it]], [<PP> ,
[<PREP> , on], [<NP> , [<DET> , your], [<NBAR> , [<N> , commercial]]]]]]
Get Your Private, Free E-mail from MSN Hotmail at
More information about the HPSG-L
mailing list