[Corpora-List] Serbian resources wanted
Vlado Keselj
vlado at cs.dal.ca
Sat Feb 23 00:09:03 UTC 2013
Hi Martin,
Resources prepared in our paper:
Vlado Keselj and Danko Sipka. A Suffix Subsumption-based Approach
to Building Stemmers and Lemmatizers for Highly Inflectional
Languages with Sparse Resources. In INFOTHECA, Journal of
Informatics and Librarianship, No 1-2, Volume IX, May 2008.
are available at:
http://web.cs.dal.ca/~vlado/nlp/2007-sr/
among other resource files, they include lists lemmatized words:
list-l: 47489 lemmas (0.47 KB)
list-w: 675140 word-forms (7.3 MB)
list-w-l: 696454 word-form/lemma pairs (14.6 MB)
Regards,
Vlado
On Fri, 22 Feb 2013, Adam Kilgarriff wrote:
> Hi Martyn,
>
> we have a Serbian corpus in the Sketch Engine so all she needs to do is
> upload her corpus and then run 'keywords' to compare hers with the
> reference.
>
> The one that is currently available is not lemmatised so comparisons there
> would be wordform-baed, however we are lemmatising and POS-tagging a newer,
> bigger dataset (courtesy of Nikola LjubeÅ¡iÄ) as we speak so can make that
> available too, then she can get key lemmas. If you or she ask, we can make
> a big sample of the lemmatised material available at a day or two's notice
>
> Best
>
> Adam
>
>
> On 22 February 2013 15:39, Martin Wynne <martin.wynne at it.ox.ac.uk> wrote:
>
> > I would like to pose a question on behalf of a student who would like to
> > generate keywords by comparing her corpus of contemporary online personal
> > ads in Serbian with a reference corpus.
> >
> > Does anyone know of any freely available wordlists for the modern Serbian
> > language? Ideally, we'd like a lemma frequency list generated from a
> > general reference corpus, although lists from various other text types
> > could be useful. We'd be interested if there is a corpus available to use
> > as well.
> >
> > Many thanks for any help.
> >
> >
> > --
> > Martin Wynne
> > IT Services, University of Oxford
> > Oxford e-Research Centre
> > Faculty of Linguistics, Philology and Phonetics
> >
> > martin.wynne at it.ox.ac.uk
> >
> >
> >
> > ______________________________**_________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
> >
>
>
>
> --
> ========================================
> Adam Kilgarriff <http://www.kilgarriff.co.uk/>
> adam at lexmasterclass.com
> Director Lexical Computing
> Ltd<http://www.sketchengine.co.uk/>
>
> Visiting Research Fellow University of
> Leeds<http://leeds.ac.uk>
>
> *Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
>
> *DANTE: a lexical database for
> English<http://www.webdante.com>
> *
> ========================================
>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list