[Corpora-List] JEX - A freely available multi-label categorisation tool trained for 22 languages

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Wed May 16 10:57:23 UTC 2012


WHAT IS JEX

 

The  <http://langtech.jrc.ec.europa.eu/Eurovoc.html> JRC EuroVoc Indexer JEX is readily trained multi-label categorisation software that assigns categories from the large-scale and wide-coverage EuroVoc Thesaurus <http://eurovoc.europa.eu/>  (consisting of thousands of categories). JEX is being distributed together with its training data (twenty to forty thousand documents per language). JEX has been trained for 22 languages on mostly parallel text (texts and their professionally produced translations). You can re-train JEX with your own documents, and even using your own categorisation scheme. JEX provides a graphical user interface (GUI), a command line option for batch processing, as well as an API.

 

 

DOWNLOAD JEX  –  LANGUAGE COVERAGE

 

Languages:  Readily trained for 22 languages, but trainable for many more: 

 

            Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek,

            Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, 

            Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish.

            

Language families: Germanic, Romance, Slavic, Hellenic, Finno-Ugric, Baltic and Semitic.

 

URL:        http://langtech.jrc.ec.europa.eu/Eurovoc.html

 

Creator:    European Commission – Joint Research Centre (JRC <http://langtech.jrc.ec.europa.eu/> )

 

 

WHAT JEX CAN BE USED FOR

                

JEX can be used fully automatically or as an interactive tool to support professional librarians in their work. 

 

JEX has also many potential uses in the field of Computational Linguistics because it is highly multilingual and it lends itself to cross-lingual tasks:

 

•          Use for multilingual classification experiments, e.g. to test the impact of different document representations, etc. (n-grams, lemmas, POS, word-sense disambiguation, …), across different languages and language families;

 

•          Use as input to other text mining applications, e.g.

•       Detect document translations (Pouliquen et al. 2004);

•       Cross-lingual plagiarism detection (Potthast et al. 2010);

•       Link related documents across languages (Pouliquen et al. 2008);

•       Support the lexical choice in Machine Translation;

•       Rank sentences in topic-specific summarisation;

•       …

 

 

MORE INFORMATION

 

At http://langtech.jrc.ec.europa.eu/, you find more information on the JRC’s multilingual language technology activity, download links for the JRC EuroVoc Indexer JEX, as well as a page pointing to further freely available multilingual resources. For details on JEX and its performance, you can read the following publication, which you might also want to use for scientific references:

 

Steinberger Ralf, Mohamed Ebrahim & Marco Turchi (2012). 
 <http://langtech.jrc.ec.europa.eu/Documents/2012_LREC-JEX-final.pdf> JRC EuroVoc Indexer JEX - A freely available multi-label categorisation tool. 
Proceedings of the 8th international conference on Language Resources and Evaluation 
(LREC'2012), Istanbul, 21-27 May 2012. 
Available at :  <http://langtech.jrc.ec.europa.eu/Documents/2012_LREC-JEX-final.pdf> http://langtech.jrc.ec.europa.eu/Documents/2012_LREC-JEX-final.pdf 

 

 

Ralf Steinberger, Mohamed Ebrahim & Marco Turchi
European Commission - Joint Research Centre (JRC)
21027 Ispra (VA), Italy

URL – Applications:  <http://emm.newsbrief.eu/overview.html> http://emm.newsbrief.eu/overview.html

URL – The science behind them:  <http://langtech.jrc.ec.europa.eu/> http://langtech.jrc.ec.europa.eu/ 





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120516/495d8cdf/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list