From HubeyH at Mail.Montclair.edu Sat Nov 27 17:27:28 1999 From: HubeyH at Mail.Montclair.edu (H. Mark Hubey) Date: Sat, 27 Nov 1999 12:27:28 -0500 Subject: [language] What exactly are allophones? Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> I sent this to the author. It is just as well putting it here. I have followed the exchange on phones, allophones and phonemes with interest. Of course, I have to disagree with almost everyone :-). (Just kidding.) The reasons are that there are at least three separate issues; (1) "theoretical" or what "ought-to-be" (deontology) (2) what really exists in various forms (3) some kind of a distillation of (2) which we may call "practical" or "what-is" (ontology) What I see in the literature, ranging from acoustics, to phonology, and from articulatory to phonemics, is that 1. we have some acoustic signal which we can record electrically and display. For concreteness, let us call this some kind of an . (I purposefully do not use [.] or /./]. We may record other samples of this so that this is a stochastic process and it should be treated as such. Here there is no reason to invent a new name. Instead we should use the words that already exist. This would be called a sample function, first because it is a sample (which is written above also) and second because it is a function of time. IT should be noted here that this should also have an identifying subscript so that it should be written something like n because there will be many of them. So n is the nth sample function which is a real acoustic signal not a set. As it stands this signal is practically useless. We compute some derived statistics from this signal. After massaging this signal then we may think of putting this sample function into some set. It is here that problems begin. 2. Suppose we extract N statistics from this signal. Like other signals of this type, this sample function then will be representable as a point in N-dimensional space. A cloud of such points in this N-space will represent a hypervolume. Equivalently we may think of it as a set of points in this N-space. We can then think of cutting up this hyperspace into hypervolumes according to some rules. It is here that allophones, phones, and phonemes start to get confused. Some authors immediately would write this and refer to it as "phone [i]". What are they referring to? Here we have two choices; (a) how it should be done and (b) how it is done. We can generate (or pretend that we have generated) many many of these sample functions and we make many many people listen to them, and they ask them to categorize them somehow into groups. But they will categorize them according to the "sounds" of their language. If we pretend that we have conducted an experiment in such a way that we have had hundreds of thousands of people speaking thousands of languages listen to these sounds and have categorized them (according to the "sounds" of their languages which is naturally what we'd expect), then what we have is that we can cut up this hypervolume into smaller hypervolumes in many different ways, at least as many ways as there are languages. These hypervolume divisions are then (more or less) the phonemes of various languages. In practice it might be slightly more complicated. We cannot do this and have not done it but instead pretend that what would have happened if such a thing were to be attempted has already been figured out by some people who are presumably experienced and competent to do this. So the division is really made by linguists some of whom have conducted experiments, some have read of such experiments, some know a little about acoustics, some a little about articulation etc. These pass off as the equivalent of this hypothetical experiment. Even if we were to perform such an experiment the judgements of the subjects would be colored by their education. So these would be denoted by a symbol like /i/n. (The n is a subscript so that to be clear and exact we would have to specify which language it is a phoneme of. In practice most of the time the subscript is implicit.) 3) We have yet another way to divide up this hyperspace. We can divide it up into smaller pieces such that any of the language-specific hyperspaces could be expressed in terms of the union (i.e. set theoretic union which will be like concatenation of these spaces) of these smaller hyperspaces. In principle, then we would never have to create any smaller objects than these. We should denote these as [i]. Note that both [i] and /i/ refers to a whole set of phones. But in many books the word "phone" refers both to a set like [i] but also to a specific realization (i.e. a sample function) which should be denoted by n and which should also be tagged with a subscript because there are many of these. So then since this is not with respect to any language (theoretically at least) these then are some kind of absolute quantities (not relative). IT should then be possible to describe any phoneme of any language in terms of unions of these phones. Now when someone says "allophone" which phone is meant? If the phone is in the sense of [i], then it means something different than if it is meant in the sense of n (please note the subscript, i.e. is not a set but a sample realization, a token or instantiation in the terminology of computer science). The literature has both meanings but usually what is meant (presumably) is that these allophones are really different phones in the sense of [.] not <.>. In other words (as explained in 3) a phoneme /&/ in some language might actually consist of the concatenation of the hyperspace for two phones like [&] and [#]. This is how some authors and many linguists mean it. Here one starts getting into difficulties about what a phoneme is or should be. -- Sincerely, M. Hubey hubeyh at mail.montclair.edu http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><><><>----Language----<><><><><><><><><><><><><><><> Copyrights and "Fair Use": http://www.templetions.com/brad//copyright.html "This means that if you are doing things like comment on a copyrighted work, mak ing fun of it, teaching about it or researching it, you can make some limited use of the work w ithout permission. For example you can quote excerpts to show how poor the writing quality is. You can teach a course about T.S. Eliot and quote lines from his poems to the class to do so. So me people think fair use is a wholesale licence to copy if you don't charge or if you are in ed ucation, and it isn't. If you want to republish other stuff without permission and think you have a fa ir use defence, you should read the more detailed discussions of the subject you will find through t he links above." You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu From HubeyH at Mail.Montclair.edu Sat Nov 27 17:27:28 1999 From: HubeyH at Mail.Montclair.edu (H. Mark Hubey) Date: Sat, 27 Nov 1999 12:27:28 -0500 Subject: [language] What exactly are allophones? Message-ID: <><><><><><><><><><><><>--This is the Language List--<><><><><><><><><><><><><> I sent this to the author. It is just as well putting it here. I have followed the exchange on phones, allophones and phonemes with interest. Of course, I have to disagree with almost everyone :-). (Just kidding.) The reasons are that there are at least three separate issues; (1) "theoretical" or what "ought-to-be" (deontology) (2) what really exists in various forms (3) some kind of a distillation of (2) which we may call "practical" or "what-is" (ontology) What I see in the literature, ranging from acoustics, to phonology, and from articulatory to phonemics, is that 1. we have some acoustic signal which we can record electrically and display. For concreteness, let us call this some kind of an . (I purposefully do not use [.] or /./]. We may record other samples of this so that this is a stochastic process and it should be treated as such. Here there is no reason to invent a new name. Instead we should use the words that already exist. This would be called a sample function, first because it is a sample (which is written above also) and second because it is a function of time. IT should be noted here that this should also have an identifying subscript so that it should be written something like n because there will be many of them. So n is the nth sample function which is a real acoustic signal not a set. As it stands this signal is practically useless. We compute some derived statistics from this signal. After massaging this signal then we may think of putting this sample function into some set. It is here that problems begin. 2. Suppose we extract N statistics from this signal. Like other signals of this type, this sample function then will be representable as a point in N-dimensional space. A cloud of such points in this N-space will represent a hypervolume. Equivalently we may think of it as a set of points in this N-space. We can then think of cutting up this hyperspace into hypervolumes according to some rules. It is here that allophones, phones, and phonemes start to get confused. Some authors immediately would write this and refer to it as "phone [i]". What are they referring to? Here we have two choices; (a) how it should be done and (b) how it is done. We can generate (or pretend that we have generated) many many of these sample functions and we make many many people listen to them, and they ask them to categorize them somehow into groups. But they will categorize them according to the "sounds" of their language. If we pretend that we have conducted an experiment in such a way that we have had hundreds of thousands of people speaking thousands of languages listen to these sounds and have categorized them (according to the "sounds" of their languages which is naturally what we'd expect), then what we have is that we can cut up this hypervolume into smaller hypervolumes in many different ways, at least as many ways as there are languages. These hypervolume divisions are then (more or less) the phonemes of various languages. In practice it might be slightly more complicated. We cannot do this and have not done it but instead pretend that what would have happened if such a thing were to be attempted has already been figured out by some people who are presumably experienced and competent to do this. So the division is really made by linguists some of whom have conducted experiments, some have read of such experiments, some know a little about acoustics, some a little about articulation etc. These pass off as the equivalent of this hypothetical experiment. Even if we were to perform such an experiment the judgements of the subjects would be colored by their education. So these would be denoted by a symbol like /i/n. (The n is a subscript so that to be clear and exact we would have to specify which language it is a phoneme of. In practice most of the time the subscript is implicit.) 3) We have yet another way to divide up this hyperspace. We can divide it up into smaller pieces such that any of the language-specific hyperspaces could be expressed in terms of the union (i.e. set theoretic union which will be like concatenation of these spaces) of these smaller hyperspaces. In principle, then we would never have to create any smaller objects than these. We should denote these as [i]. Note that both [i] and /i/ refers to a whole set of phones. But in many books the word "phone" refers both to a set like [i] but also to a specific realization (i.e. a sample function) which should be denoted by n and which should also be tagged with a subscript because there are many of these. So then since this is not with respect to any language (theoretically at least) these then are some kind of absolute quantities (not relative). IT should then be possible to describe any phoneme of any language in terms of unions of these phones. Now when someone says "allophone" which phone is meant? If the phone is in the sense of [i], then it means something different than if it is meant in the sense of n (please note the subscript, i.e. is not a set but a sample realization, a token or instantiation in the terminology of computer science). The literature has both meanings but usually what is meant (presumably) is that these allophones are really different phones in the sense of [.] not <.>. In other words (as explained in 3) a phoneme /&/ in some language might actually consist of the concatenation of the hyperspace for two phones like [&] and [#]. This is how some authors and many linguists mean it. Here one starts getting into difficulties about what a phoneme is or should be. -- Sincerely, M. Hubey hubeyh at mail.montclair.edu http://www.csam.montclair.edu/~hubey ---<><><><><><><><><><><><><><>----Language----<><><><><><><><><><><><><><><> Copyrights and "Fair Use": http://www.templetions.com/brad//copyright.html "This means that if you are doing things like comment on a copyrighted work, mak ing fun of it, teaching about it or researching it, you can make some limited use of the work w ithout permission. For example you can quote excerpts to show how poor the writing quality is. You can teach a course about T.S. Eliot and quote lines from his poems to the class to do so. So me people think fair use is a wholesale licence to copy if you don't charge or if you are in ed ucation, and it isn't. If you want to republish other stuff without permission and think you have a fa ir use defence, you should read the more detailed discussions of the subject you will find through t he links above." You are currently subscribed to language as: language at listserv.linguistlist.org To unsubscribe send a blank email to leave-language-4283Y at csam-lists.montclair.edu