[Corpora-List] frequency lists: Hungarian

Viktor Tron v.tron at ed.ac.uk
Fri Apr 9 13:19:48 UTC 2004


Hello,

As for Hungarian, I think I can help.
For instance, I can send you a file with around 18.000 prefixless verb
stems
with the following fields:

1. rank (of text frequency)
2. frequency count in a web corpus of about 10 million tokens.
3. verb stem <dictionary form, i.e., present tense 3sg indefinite-obj>
4. same as 3?
5. number of alternative stems (not very informative)
6. number of different prefixes the stem occurs with
7. number of suffixes (i.e., suffix clusters) the stem occured with
8. orthographic family size: the number of all different verbal wordforms
    that are derived from this stem
	(any combination of added prefixes, suffixes, and capitalization patterns)

If you need lists where different prefixed versions are not stripped,
(this might make sense since different prefixed versions of the same
alleged stem
often have very different meanings) or more specific details, etc, just
write to me.

Disclaimer: the data and counts are obtained automatically and therefore
the
actual counts might be erroneous due to some systematic ambiguities.
The basic pattern however I reckon, is reliable.

If you use this data, please refer to the Szoszablya project
www.szoszablya.hu

Best
Viktor Tron
+------------------------------------------------------------------+
|Viktor Tron                                        v.tron at ed.ac.uk|
|3fl Rm8 2 Buccleuch Pl EH8 9LW Edinburgh      Tel +44 131 650 4414|
|European Postgraduate College               www.coli.uni-sb.de/egk|
|School of Informatics                     www.informatics.ed.ac.uk|
|Theoretical and Applied Linguistics              www.ling.ed.ac.uk|
| @ University of Edinburgh, UK                        www.ed.ac.uk|
|Dept of Computational Linguistics               www.coli.uni-sb.de|
| @ Saarland University (Saarbruecken, Germany) www.uni-saarland.de|
|use LINUX and FREE Software                          www.linux.org|
+------------------------------------------------------------------+



On Fri, 9 Apr 2004 14:04:56 +0200, Milena Slavcheva <milena at lml.bas.bg>
wrote:

> Dear Corpora List Members,
>
> I am looking for downloadable lists of frequently used verbs in:
> - French;
> - Hungarian;
> - German.
>
> I would be grateful if you could provide me with information about such
> resources.
>
> Best regards,
>
> Milena Slavcheva
>
> Milena Slavcheva
>
> Linguistic Modeling Laboratory
> Institute for Parallel Processing
> Bulgarian Academy of Sciences
> 25A, Acad. G. Bonchev St.
> 1113 Sofia, Bulgaria
>
> Phone: (+359 2) 979 2812
> Fax:      (+359 2) 70 72 73
> E-mail:   milena at lml.bas.bg



More information about the Corpora mailing list