[Corpora-List] NewsExplorer: multilingual news analysis with cross-lingual news links

Ralf Steinberger ralf.steinberger at jrc.it
Wed Sep 13 15:04:29 UTC 2006


Please excuse multiple postings.

 

URL:         <http://press.jrc.it/NewsExplorer/>
http://press.jrc.it/NewsExplorer/   

LANGUAGES:  Arabic, Dutch, English, Estonian, Farsi, French, German, 

            Italian, Portuguese, Russian, Slovene, Spanish, Swedish.

COUNTRIES:  Austria, Belgium, France, Germany, Italy, Netherlands, 

            Spain, United Kingdom, United States.

VOLUME:     Approx. 15,000 news articles analysed every day.

            News on approx. 500,000 distinct names.

WEB USAGE:  Currently approximately 300,000 hits per day.

 

 

NewsExplorer is a publicly accessible, fully automatic news aggregation and

analysis system that makes use of various text analysis and visualisation

tools. NewsExplorer allows users to navigate the news across languages and

over time, to access articles via named persons and organisations, and to

get an overview of developments via visual time lines. NewsExplorer, which

was entirely developed at the European Commission's Joint Research Centre

(JRC) in Ispra (Italy), currently exists in 13 languages, but distinguishes

also country-specific news. Apart from the seamless integration of various

information extraction tools, its major novel features are its high

multilinguality and the ability to cross language borders.

 

NewsExplorer is fully automatic and will thus make mistakes. The news

analysis is bottom-up and without any political or other pre-conceptions.

The following text analysis tools are part of NewsExplorer:

 

- Document clustering.

- Geo-coding, including disambiguation of homographic place names.

- Name recognition (persons and - to some extent - organisations).

- Approximate matching and automatic merging of name variants, 

  monolingually and across languages 

  (e.g. http://press.jrc.it/NewsExplorer/entities/en/23.html).

- Daily calculation of weighted relations between persons, 

  based on their co-occurrence in millions of news articles.

- Identification of quotes by and about people.

- Automatic linking of names to the Wikipedia encyclopaedia.

- Detection of major new topics every day, week and month.

- Tracking of ongoing topics over time ('stories').

- Linking of news on the same subject across languages.

- Various visualisation tools:

  - Location of news in the world.

  - Biggest daily news clusters per language over time (time line).

  - Development of individual stories over time.

  - Relations between persons and organisations.

- More to come ...

 

An overview of the system is given in the following article (For more

detailed publications on individual tools and applications, see

http://langtech.jrc.it/):

 

   Steinberger Ralf, Bruno Pouliquen, Camelia Ignat. 

   Navigating multilingual news collections 

         using automatically extracted information.

   Journal of Computing and Information Technology 

         CIT 13, 2005, 4, 257-264.

   Available at: http://cit.zesoi.fer.hr/browseIssue.php?issue=23

 

NewsExplorer receives its news articles from the JRC's Europe Media Monitor

(publicly available on the NewsBrief page http://press.jrc.it/), which

continually crawls about 1,000 news sites in 30 different languages.

NewsBrief detects breaking news, roughly classifies all articles, and sends

out email summaries. 

 

NewsExplorer and NewsBrief have been developed as a service to the European

Commission and other EU institutions, as well as for the wider public. 

 

 

"Helping to unify Europe - One language at a time."

 

 

 

European Commission - Joint Research Centre (JRC, http://www.jrc.it/)

IPSC - SeS - Language Technology 

21020 Ispra (VA), Italy

URL: http://langtech.jrc.it <http://langtech.jrc.it/>   



More information about the Corpora mailing list