Dear Benjamin,

thank you for your detailed explanation on Khalkha Mongolian.
As you pointed out, it will be impossible to make a fine-grained distinction between newer and older xenophones in lesser-studied languages.
Most grammars I consult simply say that “this phoneme appears in loanwords” or the like, without mentioning when they were introduced, how “nativized” they are, and so on.
So for the database, for the sake of consistency, I only aim to make a binary distinction between what appears in native words and what only appears in loanwords.
Ideally, a multi-layered approach like you mentioned will be much better, but it would be not realistic to do so with 700+ languages.
So it will remain one of the limitations of my database.

Dear Ian,

interesting question. With Khalkha Mongolian phonology, the distinction usually starts on step earlier:

1. regular words
2. ideophones (including /kʰ/ and /pʰ/)
3. xenophones (including /f/, /ɬ/)

When dealing with Khalkha Mongolian phonology, 2. is quite important, since it also means that these phones are realized in xenophones more consistently. The inclusion of /f/, on the other hand, greatly depends on your command of foreign languages and/or your social status. With most of my own social contacts, I would want to exclude it, since they articulate it as [pʰ]. On the other hand, there is also /ɬ/ which is restricted to a small number of old Tibetan loanwords, but produced by all speakers, and one also occasionally encounters devoiced /ɮ/ as [ɬ]. There‘s not enough research on phonetics to be more specific about this last point. But if I present a general phoneme system, I would tend to include /ɬ/, but exclude /f/.

Not to speak of word-initial /ɮ/ which a historical linguist would take as a xenophone, but which synchronically speaking is perfectly normal, and native speakers are no longer aware that words starting with /ɮ/ are all loans. I suspect this would be different for word-initial /w/ which tends to feature in quite transparent loans and which, at least in some of these conventionalized old loans, is consistently realized as [w], though foreign [w] to Mongolian /p/ can probably still be observed here and there. I don‘t think I understand this point properly without actually compiling loanword data from my own data and revising it. The rule preventing word-initial /r/ (usually through elliptic vowels) is largely still in place, though it might in some contexts be a socially stratifying feature to actually articulate word-initial /r/. There is not enough research on socio-phonology (but Sender Dovchin touches this topic here and there in her research and should perhaps be taken into account then).

As long as we try to abstract to a general phonemic system, I would perhaps want to include all phonemes that are by and large common to the community, thus /kʰ/, /pʰ/ and /ɬ/. But it should still be made clear that these have a special status. At the same time, I would want to exclude /f/, though there would be parts of society for which it might have to be included. Besides groups defined by their education, it would probably also be part of the phoneme inventory of all Khalkha speakers in China, since Mandarin /f/ is omnipresent there and even people with a low command of Mandarin would probably be able to use it (though I don‘t have actual empirical data showing that this guess is correct).

The issue is, since you are dealing with 700 languages, how do you want to make competent assessments of what I just wrote on Khalkha? A good database would probably contain nuanced explanations on all peripheral phonemes, possibly along with pre-made categorizations (at least „old xenophone“, „new xenophone“) so that the database user could make decisions based on her needs. For Khalkha, most (though not all) of what I just wrote is present in Svantesson et al.‘s „Phonology of Mongolian“, but I doubt that you have as much detailed info on the majority of your languages. Still, being as detailed as possible on such points and letting the user then decide what categories to include and exclude is probably what I would advise.

