[Corpora-List] frequency lists: Hungarian
Viktor Tron
v.tron at ed.ac.uk
Fri Apr 9 13:19:48 UTC 2004
Hello,
As for Hungarian, I think I can help.
For instance, I can send you a file with around 18.000 prefixless verb
stems
with the following fields:
1. rank (of text frequency)
2. frequency count in a web corpus of about 10 million tokens.
3. verb stem <dictionary form, i.e., present tense 3sg indefinite-obj>
4. same as 3?
5. number of alternative stems (not very informative)
6. number of different prefixes the stem occurs with
7. number of suffixes (i.e., suffix clusters) the stem occured with
8. orthographic family size: the number of all different verbal wordforms
that are derived from this stem
(any combination of added prefixes, suffixes, and capitalization patterns)
If you need lists where different prefixed versions are not stripped,
(this might make sense since different prefixed versions of the same
alleged stem
often have very different meanings) or more specific details, etc, just
write to me.
Disclaimer: the data and counts are obtained automatically and therefore
the
actual counts might be erroneous due to some systematic ambiguities.
The basic pattern however I reckon, is reliable.
If you use this data, please refer to the Szoszablya project
www.szoszablya.hu
Best
Viktor Tron
+------------------------------------------------------------------+
|Viktor Tron v.tron at ed.ac.uk|
|3fl Rm8 2 Buccleuch Pl EH8 9LW Edinburgh Tel +44 131 650 4414|
|European Postgraduate College www.coli.uni-sb.de/egk|
|School of Informatics www.informatics.ed.ac.uk|
|Theoretical and Applied Linguistics www.ling.ed.ac.uk|
| @ University of Edinburgh, UK www.ed.ac.uk|
|Dept of Computational Linguistics www.coli.uni-sb.de|
| @ Saarland University (Saarbruecken, Germany) www.uni-saarland.de|
|use LINUX and FREE Software www.linux.org|
+------------------------------------------------------------------+
On Fri, 9 Apr 2004 14:04:56 +0200, Milena Slavcheva <milena at lml.bas.bg>
wrote:
> Dear Corpora List Members,
>
> I am looking for downloadable lists of frequently used verbs in:
> - French;
> - Hungarian;
> - German.
>
> I would be grateful if you could provide me with information about such
> resources.
>
> Best regards,
>
> Milena Slavcheva
>
> Milena Slavcheva
>
> Linguistic Modeling Laboratory
> Institute for Parallel Processing
> Bulgarian Academy of Sciences
> 25A, Acad. G. Bonchev St.
> 1113 Sofia, Bulgaria
>
> Phone: (+359 2) 979 2812
> Fax: (+359 2) 70 72 73
> E-mail: milena at lml.bas.bg
More information about the Corpora
mailing list