[Corpora-List] Re: similarity

Eric Atwell eric at comp.leeds.ac.uk
Tue Sep 9 21:10:27 UTC 2003


Marco,

I apologise, I (mis-)parsed "Time flies like an arrow"
as <Imperative-verb> <object-noun-phrase> <adverbial-phrase/clause>
then looked in the corpus for another sentence with this structure,
and found "Select the text you want to protect"

You are right, "Scrolling changes the display but..."
IS closer in grammatical structure to an alternative parse,
<subject-noun-phrase> <verb> <object-noun-phrase>
(You are welcome to look at the rival parses/taggings of this instead!)

Some linguists might say that a sentence can have more than one structure.
I prefer the "pro-corpus" stance that a sentence should not be parsed
in isolation but that even for sentence-syntax you need to take
context into account; and that a sentence will have only one parse
depending on context (except for comparatively rare cases of deliberate
ambiguity, eg in jokes/puns)

But this may be leading discussion away from "similarity"...

Eric Atwell


On Tue, 9 Sep 2003, Marco Antonio Esteves da Rocha wrote:

>
>
> Hello, all, this is very likely to be a linguist's statement of
> ignorance as to how an automatic POS tagger works. Even worse (for a
> linguist), it may mean ignorance of the meaning assigned to the phrase
> 'grammatical structure'. But I am really curious to know why *select the
> text you want to protect* is in any way similar to *Time flies like an
> arrow...* any more then, say, *Scrolling changes the display but does
> not move the insertion point.* (selected from the sample of sentences in
> AMALGAM). In fact, I believe the idea of similarity and the various
> degrees of nearness, in terms of grammatical structure, may actually
> prove useful for enhancing automatic parsing and POS tagging. So it may
> be worth discussing.
>
> Marco Rocha
>
> Eric Atwell wrote:
>
>  > Peet,
>  >
>  > The AMALGAM project at Leeds University collected a "MULTI-TREEBANK",
>  > A sample of sentences annotated with 24 rival parsing and PoS-tagging
> schemes,
>  > see http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-parsed.html
>  > and http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-tagged.html
>  >
>  > Parse trees as raw output of 10 rival parsers:
>  > Alice, DESPAR, ENGCG, Principar, Link, RANLP, Carroll/Briscoe Shallow
> Parser,
>  > WordPerfect's Grammatik, Tosca, Sextant;
>  >
>  > Parse trees representing 4 English corpus parsing schemes:
>  > UPenn, ICE, POW Systemic-Functional Bracketed, POW S-F Numerical
>  >
>  > PoS-tagged text representing 10 English corpus PoS-tagging schemes:
>  > Brown, ICE, LLC, LOB, UNIX Parts, POW, SEC, UPenn, BNC-C5, and BNC-C6.
>  >
>  > The sample sentences were from software manuals (tho the PoS-tagged
> samples
>  > were extended to also include BBC radio and London teenager
> sentences), see
>  >
> http://www.comp.leeds.ac.uk/amalgam/amalgam/corpus/tagged/raw/ipsm_raw.html
>  >
>  > [note: IF YOU HAVE A PARSER/TAGGER, PLEASE VOLUNTEER TO PARSE/TAG THESE
>  > SENTENCES AND DONATE THE OUTPUT TO THE MULTITREEBANK FOR ALL TO SHARE!]
>  >
>  > Unsurprisingly, the sample does not include your example "Time flies
> like..."
>  > - the nearest (in grammatical structure) I could find in the sample was:
>  > "Select the text you want to protect."
>  >
>  >
>  > Alice:
>  > (SENT (SENT-MOD (UNK-CAT "Select") (NP (DET "the") (NOUN "text")))
>  > (SENT (VP-ACT (NP "you") (V-TR "want")) (NP NULL-PHON))) (SENT-MOD
>  > (UNK-CAT "to") (NP "protect"))
>  >
>  > DESPAR:
>  > VB   select 1  --> 8  -
>  > DT      the 2  --> 3  [
>  > NN     text 3  --> 1  + OBJ
>  > PP      you 4  --> 5  " SUB
>  > VBP    want 5  --> 3  ]
>  > TO       to 6  --> 7  -
>  > VB  protect 7  --> 5  -
>  > .         . 8  --> 0  -
>  >
>  > ENGCG:
>  > "<Select>"
>  > "select" <*> <SVO> <SV> <P/for> V IMP VFIN @+FMAINV
>  > "<the>"
>  > "the" <Def> DET CENTRAL ART SG/PL @DN>
>  > "<text>"
>  > "text" N NOM SG @OBJ
>  > "<you>"
>  > "you" <NonMod> PRON PERS NOM SG2/PL2 @SUBJ
>  > "<want>"
>  > "want" <SVOC/A> <SVO> <SV> <P/for> V PRES -SG3 VFIN @+FMAINV
>  > "<to>"
>  > "to" INFMARK> @INFMARK>
>  > "<protect>"
>  > "protect" <SVO> V INF @-FMAINV
>  > "<$.>"
>  >
>  > Principar:
>  > (
>  >  (Select	~ V_NP	*)
>  >  (the	~ Det	< text	spec)
>  >  (text	~ N	> Select	comp1)
>  >  (you	~ N	< want	subj)
>  >  (want	~ V_CP	> text	rel)
>  >  (to	~ I	> want	comp1)
>  >  (protect	~ V_NP	> to	pred)
>  >  (.	)
>  > )
>  >
>  > Link:
>  > parse not found
>  >
>  > RANLP:
>  > (VP/NP select
>  >  (N2+/DET1a the
>  >   (N2-
>  >    (N1/INFMOD
>  >     (N1/RELMOD1 (N1/N text)
>  >      (S/THATLESSREL (S1a (N2+/PRO you) (VP/NP want (TRACE1 E)))))
>  >     (VP/TO to (VP/NP protect (TRACE1 E)))))))
>  >
>  > Carroll/Briscoe Shallow Parser:
>  > parse not found
>  >
>  > WordPerfect's Grammatik:
>  > SENTENCE
>  >    |- CLAUSE 1
>  >    |    |- VERB ---------------- Select
>  >    |    |- DIRECT-OBJECT ------- the text
>  >    |- CLAUSE 2 - RELATIVE
>  >         |- SUBJECT ------------- you
>  >         |- VERB ---------------- want
>  >         |- DIRECT-OBJECT ------- {the text}
>  >         |- VERB-Infinitive ----- to protect
>  >         |- --------------------- .
>  >
>  > Tosca:
>  > parse not found
>  >
>  > Sextant:
>  > VP  101 Select         select         INF       0 0
>  > NP    2 the            the            DET       1 1  2 (text) DET
>  > NP*   2 text           text           NOUN      2 1  0 (select) DOBJ
>  > NP*   3 you            you            PRON      3 0
>  > VP  102 want           want           INF       4 0
>  > VP  102 to             to             TO        5 0
>  > VP  102 protect        protect        INF       6 1  3 (you) SUBJ
>  > --    0 .              .              .         7 0
>  >
>  > UPenn:
>  > ( (S
>  >     (NP-SBJ (-NONE- *) )
>  >     (VP (VB select)
>  >       (NP
>  >         (NP (DT the) (NN text) )
>  >         (SBAR
>  >           (WHNP-1 (-NONE- 0) )
>  >           (S
>  >             (NP-SBJ-2 (PRP you) )
>  >             (VP (VBP want)
>  >               (S
>  >                 (NP-SBJ (-NONE- *-2) )
>  >                 (VP (TO to)
>  >                   (VP (VB protect)
>  >                     (NP (-NONE- *T*-1) )))))))))
>  >     (. .) ))
>  >
>  > ICE:
>  > PU CL(main,montr,imp)
>  >  VB VP(trans,imp)
>  >   MVB V(trans,imp) {select}
>  >  OD NP()
>  >   DT DTP()
>  >    DTCE ART(def) {the}
>  >   NPHD N(com,sing) {text}
>  >   NPPO CL(depend,montr,pres)
>  >    SU NP()
>  >     NPHD PRON(pers) {you}
>  >    VB VP(montr,pres)
>  >     MVB V(montr,pres) {want}
>  >    OD CL(depend,montr,infin)
>  >     TO PRTCL(to) {to}
>  >     VB VP(montr,infin)
>  >      MVB V(montr,infin) {protect}
>  >  PUNC PUNC(per) {.}
>  >
>  > POW Systemic-Functional Bracketed:
>  > [Z
>  >     [CL
>  >         [M select]
>  >         [C
>  >             [NGP
>  >                 [DD the]
>  >                 [H text]
>  >                 [Q
>  >                     [CL
>  >                         [S
>  >                             [NGP
>  >                                 [HP you]
>  >                             ]
>  >                         ]
>  >                         [M want]
>  >                         [C
>  >                             [CL
>  >                                 [I to]
>  >                                 [M protect]
>  >                             ]
>  >                         ]
>  >                     ]
>  >                 ]
>  >             ]
>  >         ]
>  >         [E .]
>  >     ]
>  > ]
>  >
>  > POW S-F Numerical:
>  > Z CL 1 M select 1 C NGP 2 DD the 2 H text 2 Q CL 3 S NGP HP you 3 M want
>  > 3 C CL 4 I to 4 M protect 1 E .
>  >
>  > Brown:
>  > select/VB
>  > the/AT
>  > text/NN
>  > you/PPSS
>  > want/VB
>  > to/TO
>  > protect/VB
>  > ./.
>  >
>  > ICE:
>  > select/V(montr,infin)
>  > the/ART(def)
>  > text/N(com,sing)
>  > you/PRON(pers)
>  > want/V(montr,pres)
>  > to/PRTCL(to)
>  > protect/V(montr,imp)
>  > ./PUNC(per)
>  >
>  > LLC:
>  > select/VA+0
>  > the/TA
>  > text/NC
>  > you/RC
>  > want/VA+0
>  > to/PD
>  > protect/VA+0
>  > ./.
>  >
>  > LOB:
>  > select/VB
>  > the/ATI
>  > text/NN
>  > you/PP2
>  > want/VB
>  > to/TO
>  > protect/VB
>  > ./.
>  >
>  > UNIX Parts:
>  > select/adj
>  > the/art
>  > text/noun
>  > you/pron
>  > want/verb
>  > to/verb
>  > protect/verb
>  > ./.
>  >
>  > POW:
>  > select/P
>  > the/DD
>  > text/H
>  > you/HP
>  > want/M
>  > to/I
>  > protect/M
>  > ./.
>  >
>  > SEC:
>  > select/VB
>  > the/ATI
>  > text/NN
>  > you/PP2
>  > want/VB
>  > to/TO
>  > protect/VB
>  > ./.
>  >
>  > UPenn:
>  > select/VB
>  > the/DT
>  > text/NN
>  > you/PRP
>  > want/VBP
>  > to/TO
>  > protect/VB
>  > ./.
>  >
>  > BNC-C5:
>  > Select/VVB
>  > the/AT0
>  > text/NN1
>  > you/PNP
>  > want/VVB
>  > to/TO0
>  > protect/VVI
>  > ./PUN
>  >
>  > BNC-C6:
>  > Select/VV0
>  > the/AT
>  > text/NN1
>  > you/PPY
>  > want/VV0
>  > to/TO
>  > protect/VVI
>  > ./YSTP
>  >
>  >
>  >
>  >
>  > On Sat, 30 Aug 2003, peetm wrote:
>  >
>  >
>  >>Hi,
>  >>
>  >>
>  >>
>  >>I'm really interested in seeing alternative mark-ups of the following
>  >>sentence:
>  >>
>  >>
>  >>
>  >>"Time flies like an arrow whereas fruit flies like a banana"
>  >>
>  >>
>  >>
>  >>I know that 'accurate' is entirely subjective - and down to the
> tagger - but
>  >>- I'd like to see samples of mark-ups produced by this sentence,
> 'accurate'
>  >>or not (preferably with an explanation of the mark-up used:
>  >>methododology/tag set - or with links to the same).
>  >>
>  >>
>  >>
>  >>I'm especially interested in any mark-up that produces some hierarchical
>  >>XML-type output.
>  >>
>  >>
>  >>
>  >>So, if anyone feels like providing me with examples - PLEASE DO SO!
>  >>
>  >>
>  >>
>  >>Many thanks,
>  >>
>  >>
>  >>
>  >>peetm
>  >>
>  >>
>  >>
>  >>email: peet.morris at clg.ox.ac.uk
>  >>
>  >>
>  >>
>  >>addr: Computational Linguistics Group
>  >>
>  >>      University of Oxford
>  >>
>  >>      The Clarendon Institute
>  >>
>  >>      Walton Street
>  >>
>  >>      Oxford
>  >>
>  >>      OX1 2HG
>  >>
>  >>
>  >>
>  >>================================================
>  >>
>  >>
>  >>
>  >>Important: This email is intended for the use of the individual
> addressee(s)
>  >>named above and may contain information that is confidential,
> privileged or
>  >>unsuitable for overly sensitive persons with low self-esteem, no sense of
>  >>humour or irrational religious beliefs.
>  >>
>  >>
>  >>
>  >>If you are not the intended recipient, then social etiquette demands that
>  >>you fully appropriate the message without trace of the former sender and
>  >>triumphantly claim it as your own. Leaving a former sender's
> signature on a
>  >>"forwarded" email is very bad form and, while being only a technical
> breach
>  >>of the Olympic ideal, does in fact constitute an irritating social
> faux pas.
>  >>
>  >>
>  >>
>  >>Further, sending this email to a colleague does not appear to breach the
>  >>provisions of the Copyright Amendment (Digital Agenda) Act 2000 of the
>  >>Commonwealth, because chances are none of the thoughts contained in this
>  >>email are in any sense original...
>  >>
>  >>
>  >>
>  >>Finally, if you have received this email in error, shred it immediately,
>  >>then add it to some nutmeg, egg whites and caster sugar. Whisk until
> stiff
>  >>peaks form, then place it in a warm oven for 40 minutes. Remove
> promptly and
>  >>let it stand for 2 hours before adding the decorative kiwi fruit and
> cream.
>  >>Then notify me immediately by return email and eat the original message.
>  >>
>  >>
>  >>
>  >>
>  >>
>  >
>
>
>
>
>
>
>
>
>
>
>

--
Eric Atwell, Senior Lecturer, Computer Vision and Language research group
Distributed Multimedia Systems MSc Tutor & SOCRATES/JYA Tutor
School of Computing, University of Leeds, LEEDS LS2 9JT
TEL: 0113-3435761  MOBILE: 0775-1039104 FAX: 0113-3435468
WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric at comp.leeds.ac.uk
Visit http://www.computingLEEDS.ac.uk - our newsletter for industry



More information about the Corpora mailing list