[Corpora-List] HeidelTime 1.8: cross-domain temporal tagger for 11 languages

Jannik Strötgen jannik.stroetgen at gmail.com
Tue Dec 16 09:32:51 UTC 2014

Dear list members,

We are happy to announce the release of version 1.8 of our multilingual, 
cross-domain, and easy-to-extend temporal tagger HeidelTime. [1]

In the context of the new version, Croatian resources were added - 
developed and kindly provided by Luka Shukan et al. (University of 
Zagreb). [2] Furthermore, the Italian resources were significantly 
improved in the context of the EVALITA-2014 EVENTI task. [3] Finally, we 
have made some processing speed and stability improvements affecting the 
UIMA kit and standalone versions.

In the meanwhile, 11 languages are supported (ordered alphabetically):
Arabic, Chinese, Croatian, Dutch, English, French, German, Italian, 
Russian, Spanish, and Vietnamese.

In addition, HeidelTime distinguishes between news-style documents and 
narrative-style documents (e.g., Wikipedia articles) in all languages. 
In addition, English colloquial (e.g., Tweets and SMS) and scientific 
articles (e.g., clinical trails) are supported.

HeidelTime is available at Google Code [1] as a UIMA component and as a 
Java standalone version. If you want to briefly test it, there is also 
an online demo. [4]

In addition to HeidelTime itself, the UIMA HeidelTime kit contains 
several collection readers and CAS consumers (mainly for processing 
temporally annotated corpora) as well as analysis engines wrapping 
several part-of-speech taggers to perform linguistic preprocessing in 
all supported languages.

Any kind of feedback is highly appreciated!

Best regards,
The HeidelTime Team


[2] Luka Skukan, Goran Glavas(, and Jan S(najder (2014): HeidelTime.Hr: 
Extracting and Normalizing Temporal Expressions in Croatian. In 
Proceedings of the 9th Language Technologies Conference, pages 99-103. ( 

[3] Giulio Manfredi, Jannik Strötgen, Julian Zell, and Michael Gertz 
(2014): HeidelTime at EVENTI: Tuning Italian Resources and Addressing 
TimeML's Empty Tags. In Proceedings of the 4th International Workshop 
EVALITA-2014, pages 39-43. ( 

[4] http://heideltime.ifi.uni-heidelberg.de/heideltime/

Jannik Strötgen
Institute of Computer Science
Database Systems Research Group
Im Neuenheimer Feld 348
69120 Heidelberg
Phone: +49 (0) 6221 / 54-5709
eMail: stroetgen at informatik.uni-heidelberg.de
www:   http://dbs.ifi.uni-heidelberg.de/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20141216/2fb0926e/attachment.htm>
-------------- next part --------------
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list