[Corpora-List] JULIE Lab BioNLP Tool Suite

Joachim Wermter wermter at coling-uni-jena.de
Sun Jun 24 10:03:36 UTC 2007


The JULIE Lab at Jena University (www.julielab.de) offers a variety of NLP
tools suited for processing biomedical text data. All tools are available
under the open-source Common Public Licence (CPL) Licence. Here are some
highlights:

* Our tools are Apache UIMA (http://incubator.apache.org/uima) compatible
and will be offered as UIMA-PEAR components (and some of them as
stand-alone components). It is thus possible to configure a complete
UIMA-based NLP pipeline.

* Our tool suite contains text preprocessing components (sentence
splitter, tokenizer), morpho-syntactic components (POS tagger, phrase
chunker, syntactic parser) and semantic components (acronym resolver,
named entity recognizer).

* In addition, our tool suite also contains a MEDLINE Reader and a
Lucene (http://lucene.apache.org) Indexer for UIMA annotations, thus
providing all the facilities for semantic information retrieval.

* Most tools are based on machine learning and come along with models
trained on available biomedical corpora, in particular:
PennBioIE: http://bioie.ldc.upenn.edu
Genia: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA
Our entity recognizer is trained on PennBioIE's oncology data.

* In addition, we also offer UIMA wrappers and ML-based models for the
OpenNLP (http://opennlp.sourceforge.net) tool suite (OpenNLP sentence
splitter, tokenizer, POS tagger, phrase chunker, syntactic parser).

* Our tool suite will be continuously updated. Therefore, we need your
feedback!

If you use our components for an application of yours, please do
acknowldege our lab.

The JULIE Lab team.

------------------------------------------
Jena University Language and Information Engineering (JULIE) Lab
+49-3641-944324
+49-3641-944321
http://www.julielab.de



More information about the Corpora mailing list