Corpora: Automatic Language Detection (Web Documents)

Arno Scharl scharl at wu-wien.ac.at
Sun Aug 26 22:42:48 UTC 2001


Dear CORPORA subscribers,

in order to extend the functionality of a prototype to analyze the textual
content of Web-based information systems (see preceding publication alert
on "Evolutionary Web Development"), we are currently working on a component
to automatically detect various languages. Thus we would be interested in

(a) general papers or books on automatic language detection (based on
words, n-grams,...).
(b) lists of the most common or typical words in certain languages.

Please reply to me personally and I'll post a summary of the responses to
the list.

Thank you
& best regards,
~ Arno Scharl

------------------------------------------------------------------------------
DDr. Arno Scharl, Associate Professor
Information Systems Department
Vienna University of Economics & Business Administration
Augasse 2-6, A-1090 Vienna, Austria
email: scharl at wu-wien.ac.at
tel: ++(43) 1-31336-4444; fax: ++(43) 1-31336-746
------------------------------------------------------------------------------
(c) 2000 Springer London:
EVOLUTIONARY WEB DEVELOPMENT
http://webdev.wu-wien.ac.at/
------------------------------------------------------------------------------



More information about the Corpora mailing list