[Corpora-List] Help Regarding Cognates Identification

Tue Sep 28 12:41:59 UTC 2010

I understand the definition of "cognate" to be about the history of words,
not just about
similarities in the surface form. There are at least three ways that words
can come to be
similar across languages

1) words have a common ancestry: language A and language B have words that
can be traced back to the same root word in some ancestor language. These
cases can be interesting, like
 *étoile and star, *or routine, like night and nacht. In the interesting
one, a systematic process  has happened that makes the letter before the t
turn up as e-acute in French but s in English.

2) language A borrows the word from language B.

3) an accident happens. The words in language A and language B look the
same, but there is
no common ancestry and no borrowing.

Most people call pairs that fit under case (1) cognates, and the other two
"false cognates".
It is a very interesting problem to write programs to detect and take
advantage of systematic sound correspondences like the star/etoile thing.
Kondrak has worked on this extensively.

There was a good workshop on computational approaches to this stuff at ACL
2007

@InProceedings{nerbonne-ellison-kondrak:2007:CompHistPhon,
  author    = {Nerbonne, John  and  Ellison, T. Mark  and  Kondrak, Grzegorz},
  title     = {Computing and Historical Phonology},
  booktitle = {Proceedings of Ninth Meeting of the ACL Special
Interest Group in Computational Morphology and Phonology},
  month     = {June},
  year      = {2007},
  address   = {Prague, Czech Republic},
  publisher = {Association for Computational Linguistics},
  pages     = {1--5},
  url       = {http://www.aclweb.org/anthology/W/W07/W07-1301}
}

Personally, I wouldn't want to call borrowings cognates, and I would tend to
see references to named entities as similar to borrowings, because very
often the borrowed word is unchanged, or changed only as much as necessary
to make it minimally acceptable phonologically, so the
Japanese word "sekkusu" is just the way the language borrows the English
word "sex", not
the trace of some historically exciting process.

On Tue, Sep 28, 2010 at 1:23 AM, Padmini priyadharsini <
padminipriyadharsini at gmail.com> wrote:

> Hi All,
>
> Kindly let me know the availeble tools and used techniques for
> cognates identification.
>
> I will be summarizing all the reply to the list.
>
> Thanks,
> Padmini
>
> --
> Life is beautiful and enjoy its simplicity :)
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

-- 
Chris Brew, Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100928/7b9d74b1/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora