[Corpora-List] NewsExplorer: multilingual news analysis with cross-lingual news links
Ralf Steinberger
ralf.steinberger at jrc.it
Wed Sep 13 15:04:29 UTC 2006
Please excuse multiple postings.
URL: <http://press.jrc.it/NewsExplorer/>
http://press.jrc.it/NewsExplorer/
LANGUAGES: Arabic, Dutch, English, Estonian, Farsi, French, German,
Italian, Portuguese, Russian, Slovene, Spanish, Swedish.
COUNTRIES: Austria, Belgium, France, Germany, Italy, Netherlands,
Spain, United Kingdom, United States.
VOLUME: Approx. 15,000 news articles analysed every day.
News on approx. 500,000 distinct names.
WEB USAGE: Currently approximately 300,000 hits per day.
NewsExplorer is a publicly accessible, fully automatic news aggregation and
analysis system that makes use of various text analysis and visualisation
tools. NewsExplorer allows users to navigate the news across languages and
over time, to access articles via named persons and organisations, and to
get an overview of developments via visual time lines. NewsExplorer, which
was entirely developed at the European Commission's Joint Research Centre
(JRC) in Ispra (Italy), currently exists in 13 languages, but distinguishes
also country-specific news. Apart from the seamless integration of various
information extraction tools, its major novel features are its high
multilinguality and the ability to cross language borders.
NewsExplorer is fully automatic and will thus make mistakes. The news
analysis is bottom-up and without any political or other pre-conceptions.
The following text analysis tools are part of NewsExplorer:
- Document clustering.
- Geo-coding, including disambiguation of homographic place names.
- Name recognition (persons and - to some extent - organisations).
- Approximate matching and automatic merging of name variants,
monolingually and across languages
(e.g. http://press.jrc.it/NewsExplorer/entities/en/23.html).
- Daily calculation of weighted relations between persons,
based on their co-occurrence in millions of news articles.
- Identification of quotes by and about people.
- Automatic linking of names to the Wikipedia encyclopaedia.
- Detection of major new topics every day, week and month.
- Tracking of ongoing topics over time ('stories').
- Linking of news on the same subject across languages.
- Various visualisation tools:
- Location of news in the world.
- Biggest daily news clusters per language over time (time line).
- Development of individual stories over time.
- Relations between persons and organisations.
- More to come ...
An overview of the system is given in the following article (For more
detailed publications on individual tools and applications, see
http://langtech.jrc.it/):
Steinberger Ralf, Bruno Pouliquen, Camelia Ignat.
Navigating multilingual news collections
using automatically extracted information.
Journal of Computing and Information Technology
CIT 13, 2005, 4, 257-264.
Available at: http://cit.zesoi.fer.hr/browseIssue.php?issue=23
NewsExplorer receives its news articles from the JRC's Europe Media Monitor
(publicly available on the NewsBrief page http://press.jrc.it/), which
continually crawls about 1,000 news sites in 30 different languages.
NewsBrief detects breaking news, roughly classifies all articles, and sends
out email summaries.
NewsExplorer and NewsBrief have been developed as a service to the European
Commission and other EU institutions, as well as for the wider public.
"Helping to unify Europe - One language at a time."
European Commission - Joint Research Centre (JRC, http://www.jrc.it/)
IPSC - SeS - Language Technology
21020 Ispra (VA), Italy
URL: http://langtech.jrc.it <http://langtech.jrc.it/>
More information about the Corpora
mailing list