[Corpora-List] Tree-Structured Named Entities corpora ?

Yoann Dupont yoa.dupont at gmail.com
Mon Dec 16 19:01:50 UTC 2013


Dear all,

Thank you all very much for those corpora.

Best regards,


2013/12/12 Kathrin Beck <kathrin.beck at uni-tuebingen.de>

> Dear Yoann,
>
> The TüBa-D/Z (Tübingen Treebank of Written German;
> http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html)
> is a manually annotated treebank of approximately 85,000 sentences. It
> contains five subclasses of Named Entities; nested Named Entities are
> annotated as well:
> 17,386 GPE (geo-political entities)
> 5,380 LOC (locations)
> 30,181 PER (persons)
> 18,262 ORG (organisations)
> 3,594 OTH (other, e.g. movie titles)
>
> Examples for the annotation scheme are: [PER Bill Clinton]; [ORG [GPE New
> York] Times]
>
> Kind regards,
>
> Kathrin Beck
>
>
> Am 09.12.2013 um 12:00 schrieb corpora-request at uib.no:
>
> > Message: 7
> > Date: Mon, 9 Dec 2013 11:29:54 +0100
> > From: Yoann Dupont <yoa.dupont at gmail.com>
> > Subject: [Corpora-List] Tree-Structured Named Entities corpora ?
> > To: corpora at uib.no
> >
> > Greetings Corpora-List,
> >
> > I am currently looking for corpora with tree-structured named entities.
> >
> > A simple example of tree structuration would be a person which has a
> first
> > and last name : "Barack Obama" is a person whose first name is "Barack"
> and
> > last name is "Obama". A parsing would then be : *(PER (NAME.FIRST*
> Barack*)
> > (NAME.LAST* Obama*))*
> > Another example would be geographical addresses.
> >
> > I know some corpora that could fit this definition : the SemEval'2007
> task
> > 9 corpora (tree-structured NE in Spanish and Catalan) and the GENIA
> corpus
> > (tree-structured NE for biomedical entities in English).
> >
> > Does any of you know other tree-structured NE corpora ?
> >
> > Thank you kindly in advance,
> >
> > --
> > Yoann DUPONT
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: not available
> > Type: text/html
> > Size: 1523 bytes
> > Desc: not available
> > URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20131209/66ca235f/attachment.txt
> >
>
> -----------------
> Kathrin Beck
>
> Project Administrator CLARIN-D
> Dept. of Computational Linguistics
> University of Tübingen
> Wilhelmstr. 19/ 2.22
> 72074 Tübingen
> Germany
>
> Tel.: +49-7071-29-73970
> Fax:  +49-7071-29-5214
> E-Mail: kbeck at sfs.uni-tuebingen.de,
> kathrin.beck at uni-tuebingen.de
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Yoann DUPONT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131216/f00d7e44/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list