[Corpora-List] Markup Examples Needed!
Eric Atwell
eric at comp.leeds.ac.uk
Wed Sep 3 11:18:10 UTC 2003
Peet,
The AMALGAM project at Leeds University collected a "MULTI-TREEBANK",
A sample of sentences annotated with 24 rival parsing and PoS-tagging schemes,
see http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-parsed.html
and http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-tagged.html
Parse trees as raw output of 10 rival parsers:
Alice, DESPAR, ENGCG, Principar, Link, RANLP, Carroll/Briscoe Shallow Parser,
WordPerfect's Grammatik, Tosca, Sextant;
Parse trees representing 4 English corpus parsing schemes:
UPenn, ICE, POW Systemic-Functional Bracketed, POW S-F Numerical
PoS-tagged text representing 10 English corpus PoS-tagging schemes:
Brown, ICE, LLC, LOB, UNIX Parts, POW, SEC, UPenn, BNC-C5, and BNC-C6.
The sample sentences were from software manuals (tho the PoS-tagged samples
were extended to also include BBC radio and London teenager sentences), see
http://www.comp.leeds.ac.uk/amalgam/amalgam/corpus/tagged/raw/ipsm_raw.html
[note: IF YOU HAVE A PARSER/TAGGER, PLEASE VOLUNTEER TO PARSE/TAG THESE
SENTENCES AND DONATE THE OUTPUT TO THE MULTITREEBANK FOR ALL TO SHARE!]
Unsurprisingly, the sample does not include your example "Time flies like..."
- the nearest (in grammatical structure) I could find in the sample was:
"Select the text you want to protect."
Alice:
(SENT (SENT-MOD (UNK-CAT "Select") (NP (DET "the") (NOUN "text")))
(SENT (VP-ACT (NP "you") (V-TR "want")) (NP NULL-PHON))) (SENT-MOD
(UNK-CAT "to") (NP "protect"))
DESPAR:
VB select 1 --> 8 -
DT the 2 --> 3 [
NN text 3 --> 1 + OBJ
PP you 4 --> 5 " SUB
VBP want 5 --> 3 ]
TO to 6 --> 7 -
VB protect 7 --> 5 -
. . 8 --> 0 -
ENGCG:
"<Select>"
"select" <*> <SVO> <SV> <P/for> V IMP VFIN @+FMAINV
"<the>"
"the" <Def> DET CENTRAL ART SG/PL @DN>
"<text>"
"text" N NOM SG @OBJ
"<you>"
"you" <NonMod> PRON PERS NOM SG2/PL2 @SUBJ
"<want>"
"want" <SVOC/A> <SVO> <SV> <P/for> V PRES -SG3 VFIN @+FMAINV
"<to>"
"to" INFMARK> @INFMARK>
"<protect>"
"protect" <SVO> V INF @-FMAINV
"<$.>"
Principar:
(
(Select ~ V_NP *)
(the ~ Det < text spec)
(text ~ N > Select comp1)
(you ~ N < want subj)
(want ~ V_CP > text rel)
(to ~ I > want comp1)
(protect ~ V_NP > to pred)
(. )
)
Link:
parse not found
RANLP:
(VP/NP select
(N2+/DET1a the
(N2-
(N1/INFMOD
(N1/RELMOD1 (N1/N text)
(S/THATLESSREL (S1a (N2+/PRO you) (VP/NP want (TRACE1 E)))))
(VP/TO to (VP/NP protect (TRACE1 E)))))))
Carroll/Briscoe Shallow Parser:
parse not found
WordPerfect's Grammatik:
SENTENCE
|- CLAUSE 1
| |- VERB ---------------- Select
| |- DIRECT-OBJECT ------- the text
|- CLAUSE 2 - RELATIVE
|- SUBJECT ------------- you
|- VERB ---------------- want
|- DIRECT-OBJECT ------- {the text}
|- VERB-Infinitive ----- to protect
|- --------------------- .
Tosca:
parse not found
Sextant:
VP 101 Select select INF 0 0
NP 2 the the DET 1 1 2 (text) DET
NP* 2 text text NOUN 2 1 0 (select) DOBJ
NP* 3 you you PRON 3 0
VP 102 want want INF 4 0
VP 102 to to TO 5 0
VP 102 protect protect INF 6 1 3 (you) SUBJ
-- 0 . . . 7 0
UPenn:
( (S
(NP-SBJ (-NONE- *) )
(VP (VB select)
(NP
(NP (DT the) (NN text) )
(SBAR
(WHNP-1 (-NONE- 0) )
(S
(NP-SBJ-2 (PRP you) )
(VP (VBP want)
(S
(NP-SBJ (-NONE- *-2) )
(VP (TO to)
(VP (VB protect)
(NP (-NONE- *T*-1) )))))))))
(. .) ))
ICE:
PU CL(main,montr,imp)
VB VP(trans,imp)
MVB V(trans,imp) {select}
OD NP()
DT DTP()
DTCE ART(def) {the}
NPHD N(com,sing) {text}
NPPO CL(depend,montr,pres)
SU NP()
NPHD PRON(pers) {you}
VB VP(montr,pres)
MVB V(montr,pres) {want}
OD CL(depend,montr,infin)
TO PRTCL(to) {to}
VB VP(montr,infin)
MVB V(montr,infin) {protect}
PUNC PUNC(per) {.}
POW Systemic-Functional Bracketed:
[Z
[CL
[M select]
[C
[NGP
[DD the]
[H text]
[Q
[CL
[S
[NGP
[HP you]
]
]
[M want]
[C
[CL
[I to]
[M protect]
]
]
]
]
]
]
[E .]
]
]
POW S-F Numerical:
Z CL 1 M select 1 C NGP 2 DD the 2 H text 2 Q CL 3 S NGP HP you 3 M want
3 C CL 4 I to 4 M protect 1 E .
Brown:
select/VB
the/AT
text/NN
you/PPSS
want/VB
to/TO
protect/VB
./.
ICE:
select/V(montr,infin)
the/ART(def)
text/N(com,sing)
you/PRON(pers)
want/V(montr,pres)
to/PRTCL(to)
protect/V(montr,imp)
./PUNC(per)
LLC:
select/VA+0
the/TA
text/NC
you/RC
want/VA+0
to/PD
protect/VA+0
./.
LOB:
select/VB
the/ATI
text/NN
you/PP2
want/VB
to/TO
protect/VB
./.
UNIX Parts:
select/adj
the/art
text/noun
you/pron
want/verb
to/verb
protect/verb
./.
POW:
select/P
the/DD
text/H
you/HP
want/M
to/I
protect/M
./.
SEC:
select/VB
the/ATI
text/NN
you/PP2
want/VB
to/TO
protect/VB
./.
UPenn:
select/VB
the/DT
text/NN
you/PRP
want/VBP
to/TO
protect/VB
./.
BNC-C5:
Select/VVB
the/AT0
text/NN1
you/PNP
want/VVB
to/TO0
protect/VVI
./PUN
BNC-C6:
Select/VV0
the/AT
text/NN1
you/PPY
want/VV0
to/TO
protect/VVI
./YSTP
On Sat, 30 Aug 2003, peetm wrote:
> Hi,
>
>
>
> I'm really interested in seeing alternative mark-ups of the following
> sentence:
>
>
>
> "Time flies like an arrow whereas fruit flies like a banana"
>
>
>
> I know that 'accurate' is entirely subjective - and down to the tagger - but
> - I'd like to see samples of mark-ups produced by this sentence, 'accurate'
> or not (preferably with an explanation of the mark-up used:
> methododology/tag set - or with links to the same).
>
>
>
> I'm especially interested in any mark-up that produces some hierarchical
> XML-type output.
>
>
>
> So, if anyone feels like providing me with examples - PLEASE DO SO!
>
>
>
> Many thanks,
>
>
>
> peetm
>
>
>
> email: peet.morris at clg.ox.ac.uk
>
>
>
> addr: Computational Linguistics Group
>
> University of Oxford
>
> The Clarendon Institute
>
> Walton Street
>
> Oxford
>
> OX1 2HG
>
>
>
> ================================================
>
>
>
> Important: This email is intended for the use of the individual addressee(s)
> named above and may contain information that is confidential, privileged or
> unsuitable for overly sensitive persons with low self-esteem, no sense of
> humour or irrational religious beliefs.
>
>
>
> If you are not the intended recipient, then social etiquette demands that
> you fully appropriate the message without trace of the former sender and
> triumphantly claim it as your own. Leaving a former sender's signature on a
> "forwarded" email is very bad form and, while being only a technical breach
> of the Olympic ideal, does in fact constitute an irritating social faux pas.
>
>
>
> Further, sending this email to a colleague does not appear to breach the
> provisions of the Copyright Amendment (Digital Agenda) Act 2000 of the
> Commonwealth, because chances are none of the thoughts contained in this
> email are in any sense original...
>
>
>
> Finally, if you have received this email in error, shred it immediately,
> then add it to some nutmeg, egg whites and caster sugar. Whisk until stiff
> peaks form, then place it in a warm oven for 40 minutes. Remove promptly and
> let it stand for 2 hours before adding the decorative kiwi fruit and cream.
> Then notify me immediately by return email and eat the original message.
>
>
>
>
--
Eric Atwell, CVL: Computer Vision and Language research group
Distributed Multimedia Systems MSc Tutor & SOCRATES/JYA Tutor
School of Computing, University of Leeds, LEEDS LS2 9JT
TEL: 0113-3435761 MOBILE: 0775-1039104 FAX: 0113-3435468
WWW: http://www.comp.leeds.ac.uk/eric EMAIL: eric at comp.leeds.ac.uk
Visit http://www.computingLEEDS.ac.uk - our newsletter for industry
More information about the Corpora
mailing list