[Corpora-List] Re: Dictionary Creation Software

Leonel Ruiz Miyares (Centro Ling. Aplicada) leonel at lingapli.ciges.inf.cu
Thu Sep 19 11:23:18 UTC 2002


On 18 Sep 02, at 16:03, Ramesh Krishnamurthy wrote:

> Dear Dr De Lucca
>
> I have drawn up a checklist from my 15 years experience in
> corpus-based computational lexicography. I hope this helps.
>
> If you are going to create software for the whole process from raw
> data to publishing of a dictionary/reference book, I think these would
> be my requirements. Every process should be automated to the maximum,
> with allowance for human intervention or input of preferences.
>
> 1. for monolingual dictionaries, a large corpus of L1
> 2. for bilingual dictionaries, a large corpus of L1 and L2, with
> pointers in both directions to find suggested equivalent words and
> phrases 3. lemmatized frequency lists, to decide which words are
> important enough to include in the dictionary, and which forms are
> significant, etc 4. based on the frequency lists, a spelling checker,
> giving variant spellings 5. pronunciation, with regional variations;
> concordanced tone units to hear word pronunciation in context 6.
> statistics for regional variations 7. statistics for genre
> distribution: is the wordform used in all types of text, or mainly in
> speech, mainly in newspapers, mainly in novels, etc 8. grammar -
> wordclass identification, colligation, grammar patterns (valency,
> complementation, etc); with frequencies, regional variations, and
> genre-distribution 9. collocation: individual collocates, lexical
> phrases, etc; with frequencies,  regional variations, and
> genre-distribution 10. semantics - hypernyms, hyponyms, synonyms (i.e.
> thesaurus), antonyms 11. pragmatics - any relevant information 12.
> selected examples for each point from 3 onwards; large corpora yield
> hundreds or thousands of examples, so 13. spoken data: typical
> speaker, context, interlocutor, etc 14. concordancer to allow access
> to raw data and ability to check the information given from point 3
> onwards 15. automatic cut-and-paste to dictionary or reference book
> database 16. customizable database templates for reference books 17.
> validation routines to ensure database entry fields contain correct
> information and are in correct sequence 18. ability to interrogate
> database on any field or subfield, to count entries, check that
> editorial policies have been followed, check cross-references, check
> that examples contain the headword, etc 19. automatic conversion from
> database to typesetting formats - columnation, page numbering, headers
> and footers, widows and orphans, typefaces, etc 20. progress
> monitoring - which processes have been completed (e.g. compilation,
> editing, proofreading), which words have been done, who did them,
> when, etc
>
> All the tools should be flexible, to allow users to cater for local
> variations in any feature, from orthographic form (capitalization,
> punctuation, contractions, etc) to size of field in the databases,
> etc.
>
> Best wishes
> Ramesh
>
> Ramesh Krishnamurthy
> Consultant, Collins Cobuild and Bank of English Corpus;
> Honorary Research Fellow, Centre for Corpus Linguistics, University of
> Birmingham; Honorary Research Fellow, Computational Linguistics
> Research Group, University of Wolverhampton.
>
>
> ----- Original Message -----
> From: delucca at nilc.icmc.usp.br
> To: corpora at hd.uib.no
> Cc: delucca at usp.br
> Subject: [Corpora-List] Dictionary Creation Software
>
> Dear Colleagues,
>
> We are a team of researchers in Computational Linguistics and, at the
> present time, we are working on construction software tools for making
> Dictionaries.
>
> We would like to hearing from those who have experiences with the
> compiling dictionaries and vocabularies the following:  WHAT you would
> like, would need, and would hope of a Dictionary Creation Software.
> What type of tools would be essential for making dictionaries,
> vocabularies and other any type of reference work. A concordancer? A
> Spelling Checker? Pronouncing ?
>
> We look forward to hearing from you with great interest.
>
> Thank you very much in advance for your advice.
>
> Sincerely
>
>
>
>
> J.L. DeLucca, PhD
>
> Interinstitutional Center for Research and Development in
> Computational Linguistics (NILC) Sao Paulo University



More information about the Corpora mailing list