[Corpora-List] Most common non-Romance, non-Germanic words in English

Tristan Miller miller at ukp.informatik.tu-darmstadt.de
Tue Apr 8 12:56:41 UTC 2014


Dear all,

I'm interested in finding the most frequent words in English which do
not have an origin in any Romance or Germanic language.  Does anyone
know if such a list is available anywhere?

If not, I suppose I could produce one myself easily enough by taking a
raw frequency list (such as Adam Kilgarriff's BNC lemma counts),
querying each entry in a machine-readable dictionary which provides
etymological information, and filtering appropriately.  But that
presupposes that such a dictionary exists.  Does anyone know of a
suitable freely available dictionary for this task?  Since I'd need to
automatically query many thousands of words, I'd want something that I
can download for offline use and access through an API.  I could try
accessing an offline dump of Wiktionary using the JWKTL API, though I
suspect Wiktionary's etymological coverage is too spotty.

Regards,
Tristan

-- 
Tristan Miller, Research Scientist
Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Department of Computer Science, Technische Universität Darmstadt
Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 901 bytes
Desc: OpenPGP digital signature
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140408/b02f089c/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list