[Corpora-List] Transliteration/Romanization tool for Modern Greek

Alberto Barrón Cedeño albarron at lsi.upc.edu
Wed Feb 27 16:17:55 UTC 2013


Dear Isabella,

you coult try with ICU project:

http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Transliterator.html

Best regards,
Alberto

Alberto Barrón Cedeño (Ph.D.)
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
http://www.lsi.upc.edu/~albarron

On 27/02/13 17:06, Isabella Chiari wrote:
> Dear Corpora list members,
> on behalf of a colleague I ask your help in order to find a
> transliteration/romanization tool for modern greek texts. 
> Is there anything available (for free or for purchase?)?
> Thank you in advance for your help,
> Isabella
>
>
> Il giorno 25/feb/2013, alle ore 16:23, Gill Philip
> <g.philip.polidoro at gmail.com <mailto:g.philip.polidoro at gmail.com>> ha
> scritto:
>
>> Although it has its critics and its weak points, a pretty good point
>> of reference is Berlin & Kay 1969. Their listing of colour words
>> actually refers to existence in languages: if a language has a "blue"
>> colour term, then it already has black, white, red, green & yellow:
>> no language (in their study) can have, e.g. "pink" if it doesn't
>> already have "blue".
>>
>> Anyway, as a rough guide, their order is (Berlin and Kay 1969: 4)
>> white & black
>> red
>> yellow & green
>> blue
>> brown
>> pink / purple / grey/ orange
>>
>> When I looked at colour words in English and Italian, I got these
>> figures (freq. per million)
>>
>> ENGLISH (Bank of English, circa 2003)
>> white (316) & black (294)
>> red (182)
>> green (139), brown (136), blue (122)
>> grey (63)
>> yellow (51)
>> pink (37) & purple (15)
>> orange (35)
>>
>> ITALIAN (CORIS, circa 2003)
>> White (Bianco, 308)
>> Red (Rosso, 267) and Black (Nero, 265)
>> Green (Verde, 176)
>> Blue (=143: Azzurro, 85 plus Blu, 58)
>> Pink (Rosa, 90), Yellow (Giallo, 82), Grey (Grigio, 63)
>> Purple (Viola, 22)
>> Brown (Marrone, 13)
>> Orange (Arancione, 9)
>>
>> They're not an exact match with B&K's sequencing, but you can see the
>> basic principle at work. Black, white and red are clearly more common
>> than the other colours; blue and green are similar in frequency; pink
>> & purple form another group. I should mention, though, that this is a
>> fairly crude measure, and not based on POS-tagged data. There are
>> problems with homographs, e.g. "orange" is also the fruit in English
>> (but not in Italian); Brown is a surname in English (and was the name
>> of the then Chancellor, subsequently Prime Minister, so cropped up
>> disproportionately in the data).
>>
>> This data comes from my long-forgotten PhD dissertation "Collocation
>> and Connotation": I believe it's still hanging around on the web
>> somewhere.
>>
>> hope this helps,
>> Gill
>>
>> On 25 February 2013 14:31, H.A.E Viethen <H.A.E.Viethen at uvt.nl
>> <mailto:H.A.E.Viethen at uvt.nl>> wrote:
>>
>>     Hi,
>>
>>     we are looking for a way to estimate the relative frequency of colour
>>     terms in different languages, in particular Greek and Dutch. So for
>>     example, we'd like to know how frequent the term 'rood' (red) is in
>>     Dutch compared to the term 'roze' (pink), or how the frequencies of
>>     the terms 'ble' and 'galázio' compare in Greek.
>>
>>     We only need ballpark figures, the kind of thing one might estimate
>>     with hit counts in web searches, altough having slightly more
>>     reliable numbers than that would be nice. In any case, many Greek
>>     colour terms are derived from common nouns for objects in the natural
>>     environment and usually even spelled the same. This makes it
>>     difficult
>>     to distinguish the use of a word as a colour term from its use as a
>>     common noun.
>>
>>     Does anyone know of a resource (paper, website, anything) that might
>>     readily list relative frequencies for colour terms in Greek and
>>     Dutch?
>>     Alternatively, can anyone point us to a POS-tagged corpus of Greek or
>>     Dutch which would be suitable for counting the use of colour terms?
>>
>>     Many thanks,
>>
>>     Jette Viethen
>>     Tilburg University
>>
>>     _______________________________________________
>>     UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>     Corpora mailing list
>>     Corpora at uib.no <mailto:Corpora at uib.no>
>>     http://mailman.uib.no/listinfo/corpora
>>
>>
>>
>>
>> -- 
>> *********************************
>> Dr. Gill Philip
>> Università degli Studi di Macerata
>> Dipartimento di Scienze della Formazione, dei Beni Culturali, e del
>> Turismo
>> Piazzale L. Bertelli
>> Contrada Vallebona
>> 62100 Macerata
>> Italy _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no <mailto:Corpora at uib.no>
>> http://mailman.uib.no/listinfo/corpora
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130227/e0d34c7d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list