[Corpora-List] Tree-Structured Named Entities corpora ?

Kathrin Beck kathrin.beck at uni-tuebingen.de
Thu Dec 12 15:39:57 UTC 2013


Dear Yoann,

The TüBa-D/Z (Tübingen Treebank of Written German; http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html) is a manually annotated treebank of approximately 85,000 sentences. It contains five subclasses of Named Entities; nested Named Entities are annotated as well:
17,386 GPE (geo-political entities)
5,380 LOC (locations)
30,181 PER (persons)
18,262 ORG (organisations)
3,594 OTH (other, e.g. movie titles)

Examples for the annotation scheme are: [PER Bill Clinton]; [ORG [GPE New York] Times]

Kind regards,

Kathrin Beck


Am 09.12.2013 um 12:00 schrieb corpora-request at uib.no:

> Message: 7
> Date: Mon, 9 Dec 2013 11:29:54 +0100
> From: Yoann Dupont <yoa.dupont at gmail.com>
> Subject: [Corpora-List] Tree-Structured Named Entities corpora ?
> To: corpora at uib.no
> 
> Greetings Corpora-List,
> 
> I am currently looking for corpora with tree-structured named entities.
> 
> A simple example of tree structuration would be a person which has a first
> and last name : "Barack Obama" is a person whose first name is "Barack" and
> last name is "Obama". A parsing would then be : *(PER (NAME.FIRST* Barack*)
> (NAME.LAST* Obama*))*
> Another example would be geographical addresses.
> 
> I know some corpora that could fit this definition : the SemEval'2007 task
> 9 corpora (tree-structured NE in Spanish and Catalan) and the GENIA corpus
> (tree-structured NE for biomedical entities in English).
> 
> Does any of you know other tree-structured NE corpora ?
> 
> Thank you kindly in advance,
> 
> -- 
> Yoann DUPONT
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 1523 bytes
> Desc: not available
> URL: <http://www.uib.no/mailman/public/corpora/attachments/20131209/66ca235f/attachment.txt>

-----------------
Kathrin Beck

Project Administrator CLARIN-D
Dept. of Computational Linguistics
University of Tübingen
Wilhelmstr. 19/ 2.22
72074 Tübingen
Germany

Tel.: +49-7071-29-73970
Fax:  +49-7071-29-5214
E-Mail: kbeck at sfs.uni-tuebingen.de,
kathrin.beck at uni-tuebingen.de


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list