[Corpora-List] how developing a lexicon

Dom Widdows widdows at google.com
Tue Apr 7 15:13:25 UTC 2009


Dear Farhad,

I think one of the questions that would first leap to mind for many
people on this list is what language pairs are you interested in, and
what parallel / comparable corpora do you have available?

If you have parallel corpora, there are lots of methods for extracting
term pairs that are relatively readily available nowadays (e.g., we
have some support for this in SemanticVectors, see
http://code.google.com/p/semanticvectors/wiki/BilingualModels).

In general, over recent decades many approaches have changed from
looking at questions like "give me a list of all English verbs and
their conjugations" to questions like "given a sample of the data
you're interested in working with, give me a list of prevalent English
verbs and their conjugations".

Best wishes,
Dominic

On Sun, Apr 5, 2009 at 7:17 AM, Eros Zanchetta <eros.zanchetta at gmail.com> wrote:
> Dear Farhad,
>
> if you're building a lexicon from scratch, you might be interested in
> the paper:
>
> Eros Zanchetta and Marco Baroni (2005) Morph-it! A free corpus-based
> morphological resource for the Italian language, proceedings of Corpus
> Linguistics 2005, University of Birmingham, Birmingham, UK
> (http://sslmit.unibo.it/~eros/downloads/Morph-it.pdf).
>
> It describes a method for the rapid creation of a lexicon using a
> mixture of corpus based techniques and manual checking. We created an
> Italian lexicon, but the method may be applied to other languages too.
>
> Best,
> Eros Zanchetta
>
> Farhad Atghiaee wrote:
>> dear members
>>
>> i am now developing a lexicon for a Machine Translation system.
>> if anybody knows a helpful source or data i would appreciate it.
>> as an example, suppose we want to gather all English verbs and their
>> conjugations, is there a resource for it?
>>
>> regards
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list