[Corpora-List] HeidelTime 1.8: cross-domain temporal tagger for 11 languages
Jannik Strötgen
jannik.stroetgen at gmail.com
Tue Dec 16 09:32:51 UTC 2014
Dear list members,
We are happy to announce the release of version 1.8 of our multilingual,
cross-domain, and easy-to-extend temporal tagger HeidelTime. [1]
In the context of the new version, Croatian resources were added -
developed and kindly provided by Luka Shukan et al. (University of
Zagreb). [2] Furthermore, the Italian resources were significantly
improved in the context of the EVALITA-2014 EVENTI task. [3] Finally, we
have made some processing speed and stability improvements affecting the
UIMA kit and standalone versions.
In the meanwhile, 11 languages are supported (ordered alphabetically):
Arabic, Chinese, Croatian, Dutch, English, French, German, Italian,
Russian, Spanish, and Vietnamese.
In addition, HeidelTime distinguishes between news-style documents and
narrative-style documents (e.g., Wikipedia articles) in all languages.
In addition, English colloquial (e.g., Tweets and SMS) and scientific
articles (e.g., clinical trails) are supported.
HeidelTime is available at Google Code [1] as a UIMA component and as a
Java standalone version. If you want to briefly test it, there is also
an online demo. [4]
In addition to HeidelTime itself, the UIMA HeidelTime kit contains
several collection readers and CAS consumers (mainly for processing
temporally annotated corpora) as well as analysis engines wrapping
several part-of-speech taggers to perform linguistic preprocessing in
all supported languages.
Any kind of feedback is highly appreciated!
Best regards,
The HeidelTime Team
http://code.google.com/p/heideltime/
https://twitter.com/HeidelTime
[1]
<http://code.google.com/p/heideltime/>http://code.google.com/p/heideltime/wiki/Downloads
[2] Luka Skukan, Goran Glavas(, and Jan S(najder (2014): HeidelTime.Hr:
Extracting and Normalizing Temporal Expressions in Croatian. In
Proceedings of the 9th Language Technologies Conference, pages 99-103. (
http://nl.ijs.si/isjt14/proceedings/isjt2014_17.pdf)
[3] Giulio Manfredi, Jannik Strötgen, Julian Zell, and Michael Gertz
(2014): HeidelTime at EVENTI: Tuning Italian Resources and Addressing
TimeML's Empty Tags. In Proceedings of the 4th International Workshop
EVALITA-2014, pages 39-43. (
http://dbs.ifi.uni-heidelberg.de/fileadmin/Team/jannik/publications/2014_EVALITA_ManfrediEtAl.pdf)
[4] http://heideltime.ifi.uni-heidelberg.de/heideltime/
--
Jannik Strötgen
Institute of Computer Science
Database Systems Research Group
Im Neuenheimer Feld 348
69120 Heidelberg
Germany
Phone: +49 (0) 6221 / 54-5709
eMail: stroetgen at informatik.uni-heidelberg.de
www: http://dbs.ifi.uni-heidelberg.de/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20141216/2fb0926e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list