Corpora: language boundaries + code switching

D C Souter cs at scs.leeds.ac.uk
Thu Jul 20 10:08:50 UTC 2000


There were only a few responses to my query about language boundary
identification, but here's a summary:

Clive

-----------------------------------------------------------------

1. David Elworthy, Microsoft:
>From davidelw at microsoft.com Tue Jul  4 13:12:02 2000

I have an MPhil student who is working on this issue, so if you can wait a
couple of months until he has finished his project, he may be able to help.
Essentially he is taking an idea I presented for language identification,
which appeared in the 6th VLC workshop in 98. In this, you compute a
confidence range for each language you know about after each word of an
input and wait until one range diverges from all the others. The approach my
student is trying to is see if the range move back closer again, indicating
that the evidence supporting the choice of language is becoming weaker.

-- David

------------------------------------------------------------------

2. Simon Arnfield, Reading University:
>From llsarnf at reading.ac.uk Tue Jul  4 14:17:31 2000

Hi Clive,
I don't know if this is what you're after - I just know someone who worked
on it. Don't even know if data is available. Anyway here it is:

The LIDES Database - Language Interaction Data Exchange System
Building a database of codeswitching and language interaction data
http://www.ling.lancs.ac.uk/staff/ruthanna/lipps/lipps.htm

Simon

-------------------------------------------------------------------

3. Gregor Erbach, (Vienna Telecommunications Research Centre)
>From erbach at ftw.at Tue Jul  4 19:25:55 2000

Clive,
at what level would you want the language boundaries to be?
Paragraph, sentence, word, phrase, morph?

In German, we have English loan words with German derivational
and inflectional suffixes, like "downloaden", "gedownloadet"
etc.

best,
  Gregor

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Gregor Erbach                               gregor.erbach at ftw.at
Forschungszentrum Telekommunikation Wien FTW   Tel. +43/699-10389005
(Vienna Telecommunications Research Centre)    Fax: +43/1/5052830-99
Maderstrasse 1/9, A-1040 Wien, Austria     http://speech.ftw.at/~gor

-------------------------------------------------------------------

4. Petek Kurtboke
>From PtKur at netscape.net Wed Jul  5 00:37:21 2000

Clive, you're into what 5 generations of language contact researchers hav=
en't
been able to answer! Good luck and have a look at 'A corpus-driven study =
of
Turkish-English language contact in Australia' at
<http://www.vicnet.net.au/~petek/thesis/> for a window on what the 6th
generation should be doing! =

Cheers
Petek

--------------------------------------------------------------------



More information about the Corpora mailing list