Hi Kim,<div><br></div><div>You can use the IcePaHC corpus to extract these frequencies. Although it is a historical corpus, it spans the period 12th-21st century, so you could use the texts from, say, the 19th-21st centuries, which represent the modern language well. IcePaHC is a free resource.</div>
<div><br></div><div>Note that the corpus is lemmatized and in addition to the treebank format, the main download includes formats which are more convenient for your purpose.</div><div><br></div><div><a href="http://www.linguist.is/icelandic_treebank/Download">http://www.linguist.is/icelandic_treebank/Download</a></div>
<div><br></div><div>Unfortunately, it does not have English glosses, and I don't have any ideal solution for that, but you might get something useful by loooking words up in this list:</div><div><a href="http://linguist.is/dictionary">http://linguist.is/dictionary</a></div>
<div>(it uses a different tagset, and is quite limited, but it is also a free resource)</div><div><br></div><div>The two tagsets you would be interested in are described in these pages:</div><div><a href="http://www.linguist.is/icelandic_treebank/Tagset">http://www.linguist.is/icelandic_treebank/Tagset</a></div>
<div><a href="http://linguist.is/icelandic_treebank/IFD_Tagset">http://linguist.is/icelandic_treebank/IFD_Tagset</a></div><div><br></div><div>There is an LREC paper on IcePaHC:</div><div><a href="http://www.lrec-conf.org/proceedings/lrec2012/summaries/440.html">http://www.lrec-conf.org/proceedings/lrec2012/summaries/440.html</a></div>
<div><br></div><div>If you have any questions regarding IcePaHC, feel free to email me or any other member of the IcePaHC project.</div><div><br></div><div>Best,</div><div>Anton</div><div><br><br><div class="gmail_quote">
On Mon, Nov 19, 2012 at 9:05 AM, Thommy Mayer <span dir="ltr"><<a href="mailto:thommy.mayer@gmail.com" target="_blank">thommy.mayer@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Kim,<br>
<br>
You could also check the "Frequency Dictionary Icelandic" from the<br>
Leipzig Wortschatz group or contact Uwe Quasthoff directly for the<br>
relevant data (<a href="mailto:quasthoff@informatik.uni-leipzig.de">quasthoff@informatik.uni-leipzig.de</a> ).<br>
<br>
Quasthoff, Uwe, Sabine Fiedler, Erla Hallsteinsdóttir (ed.). 2012.<br>
Frequency Dictionary Icelandic (Íslensk tíðniorðabók). Band 3 der<br>
Reihe Frequency Dictionaries. Universitätsverlag, 109 S. (+CD-ROM).<br>
<br>
Regards,<br>
Thomas<br>
<br>
---------------------------------------------------------------------------<br>
Thomas Mayer<br>
Research Unit "Quantitative Language Comparison"<br>
Forschungszentrum Deutscher Sprachatlas<br>
Philipps-Universität Marburg<br>
Hermann-Jacobsohn-Weg 3<br>
35032 Marburg<br>
<br>
Current address:<br>
Geschwister Scholl Platz 1<br>
80539 München, Germany<br>
Office: Schellingstraße 9, Raum 301<br>
Tel: <a href="tel:%2B49%2089%202180%206144" value="+498921806144">+49 89 2180 6144</a><br>
---------------------------------------------------------------------------<br>
<br>
<br>
2012/11/19 Kim Witten <<a href="mailto:kimwitten@gmail.com">kimwitten@gmail.com</a>>:<br>
<div class="HOEnZb"><div class="h5">> Hi Corpora Subscribers,<br>
> I'm wondering if somebody might be able to point me in the direction to find a simple list of the 5,000 most frequent words in Icelandic, from any (relatively current, non-historical) Icelandic corpus? With English gloss would be even better, but it's not necessary. Thanks!<br>
> -Kim<br>
> ---<br>
> Kim Witten, PhD candidate<br>
> Language & Linguistic Science<br>
> University of York, UK<br>
> <a href="mailto:kaw522@york.ac.uk">kaw522@york.ac.uk</a><br>
> <a href="http://www.MePhiD.com" target="_blank">www.MePhiD.com</a><br>
><br>
><br>
> _______________________________________________<br>
> UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
> Corpora mailing list<br>
> <a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
> <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br>
_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><a href="http://www.linguist.is/" target="_blank">www.linguist.is</a><br>tel: 215-350-7215<br>
</div>