[Corpora-List] Linguistics, corpus linguistics, and diglossia

Mike Maxwell maxwell at umiacs.umd.edu
Wed Dec 15 23:40:25 UTC 2010


I was talking this afternoon with a lexicographer who is working on 
western Panjabi (the variety--or varieties--spoken in Pakistan, and 
written in a Perso-Arabic script).  She was saying that corpus 
linguistics was exactly the wrong way to build a dictionary of 
colloquial Panjabi, because of a somewhat diglossic situation: the 
written/ standardized language is not what most people speak.

There are of course many diglossic language situations around the world, 
particularly in situations where a single "language" has been written 
for centuries or millenia.  I put "language" in scare quotes because of 
course all languages will have changed over that period of time, to the 
point of non-mutual intelligibility (if you can find any 2000 year old 
speakers :-)).

At any rate, this certainly matters if you're trying to do 
dictionaries--or any other study of the spoken or colloquial language, 
or non-standard dialects.  I don't recall seeing much discussion of the 
issues of doing corpus linguistics in diglossic languages, the following 
being one exception:
@article{fonseca2003radical,
   title={{On the radical difference between the subject personal 
pronouns in written and spoken European French}},
   author={Fonseca-Greber, B. and Waugh, L.R.},
   journal={Language and Computers},
   volume={46},
   number={1},
   pages={225--240},
   issn={0921-5034},
   year={2003},
   publisher={Rodopi}
}
They resort to some small corpora of transcribed spoken French, and 
remark that they know about some usages that are not attested in these 
corpora.
-- 
	Mike Maxwell
	maxwell at umiacs.umd.edu
         "A library is the best possible imitation, by human beings,
         of a divine mind, where the whole universe is viewed and
         understood at the same time... we have invented libraries
         because we know that we do not have divine powers, but we
         try to do our best to imitate them." --Umberto Eco

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list