[Corpora-List] Linguistics, corpus linguistics, and diglossia
Mike Maxwell
maxwell at umiacs.umd.edu
Wed Dec 15 23:40:25 UTC 2010
I was talking this afternoon with a lexicographer who is working on
western Panjabi (the variety--or varieties--spoken in Pakistan, and
written in a Perso-Arabic script). She was saying that corpus
linguistics was exactly the wrong way to build a dictionary of
colloquial Panjabi, because of a somewhat diglossic situation: the
written/ standardized language is not what most people speak.
There are of course many diglossic language situations around the world,
particularly in situations where a single "language" has been written
for centuries or millenia. I put "language" in scare quotes because of
course all languages will have changed over that period of time, to the
point of non-mutual intelligibility (if you can find any 2000 year old
speakers :-)).
At any rate, this certainly matters if you're trying to do
dictionaries--or any other study of the spoken or colloquial language,
or non-standard dialects. I don't recall seeing much discussion of the
issues of doing corpus linguistics in diglossic languages, the following
being one exception:
@article{fonseca2003radical,
title={{On the radical difference between the subject personal
pronouns in written and spoken European French}},
author={Fonseca-Greber, B. and Waugh, L.R.},
journal={Language and Computers},
volume={46},
number={1},
pages={225--240},
issn={0921-5034},
year={2003},
publisher={Rodopi}
}
They resort to some small corpora of transcribed spoken French, and
remark that they know about some usages that are not attested in these
corpora.
--
Mike Maxwell
maxwell at umiacs.umd.edu
"A library is the best possible imitation, by human beings,
of a divine mind, where the whole universe is viewed and
understood at the same time... we have invented libraries
because we know that we do not have divine powers, but we
try to do our best to imitate them." --Umberto Eco
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list