[Corpora-List] Query about the (dual) language of web pages

Michael Maxwell maxwell at umiacs.umd.edu
Tue Oct 9 16:16:35 UTC 2007


> Everyone is aware that some languages/cultures (e.g. Swedish,
> Finnish) tend to have alternative webpages in English, while others
> (e.g. Arabic) are much less likely to.
> Does anyone have any reliable figures as to the frequency of
> appearance   of these parallel-corpora  (in English)for different
> (source) languages? I am interested at the moment in :
> Japanese, Chinese, Korean, Spanish, Portuguese, French, German,
> Italian, Arabic

...and I would be interested in similar figures for all languages.  (The
parallel text doesn't need to be in English in my case, it might be e.g.
in Spanish or Russian).

Another thought: is there any place that actually tracks these sorts of
pages?  I know Phil Resnik was collecting some of this in the past
(http://umiacs.umd.edu/~resnik/strand/), but I don't believe he is
actively doing so now.

    Mike Maxwell
    CASL/ U MD


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list