[Corpora-List] Most common non-Romance, non-Germanic words in English

Darren Cook darren at dcook.org
Wed Apr 9 23:17:43 UTC 2014


Trying again - I keep hitting the spam filter, so I'll try splitting my
response up!

> If not, I suppose I could produce one myself easily enough by taking a
> raw frequency list (such as Adam Kilgarriff's BNC lemma counts),
> querying each entry in a machine-readable dictionary which provides
> etymological information, and filtering appropriately.  But that
> presupposes that such a dictionary exists.  Does anyone know of a
> suitable freely available dictionary for this task?  

One approach would be to gather a lists of the words of interest:
 http://en.wikipedia.org/wiki/List_of_English_words_of_Arabic_origin
 http://en.wikipedia.org/wiki/List_of_English_words_of_Japanese_origin
 http://en.wikipedia.org/wiki/List_of_English_words_of_Chinese_origin
etc.

As most English words do come from the Romance or Germanic languages,
this is not an impossible task, though you may need to filter further
based on your exact criteria. E.g. tempura entered English from
Japanese, but entered Japanese from Portuguese. Admiral comes from a
French word which comes from an Arabic word; which does that count as.

Darren

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list