ELL: definitions for various terms re: minority languages
mgunn at ucd.ie
Tue Mar 16 18:35:29 UTC 1999
*** EOOH ***
Return-Path: <owner-endangered-languages-l at carmen.murdoch.edu.au>
X-Authentication-Warning: carmen.murdoch.edu.au: majodomo set sender to
owner-endangered-languages-l at carmen.murdoch.edu.au using -f
Date: Tue, 16 Mar 1999 18:35:29 +0000
From: Marion Gunn <mgunn at ucd.ie>
Organization: ucd <http://listserv.heanet.ie/>
To: endangered-languages-l at carmen.murdoch.edu.au
Subject: Re: ELL: definitions for various terms re: minority languages
Content-Type: text/plain; charset=iso-8859-1
Sender: owner-endangered-languages-l at carmen.murdoch.edu.au
Reply-To: endangered-languages-l at carmen.murdoch.edu.au
Sorry for sending your entire msg (below) back to the list, Jeff, but as
many terms as poss. should be listed together, and you have missed out a
Because there is so little agreement on the proper terms, it is clear
that many are deeply disliked. Most-hated term among European natives
right now is that once-fashionable label LUL (Lesser-Used Languages) --
which in EU official circles is now read as no more than an extended
anagram of USELESS (to be treated accordingly).
SCL (Smaller Community Languages), SLC (Smaller Linguistic Communities)
and LCL (Less Common Languages) are all terms which are much better
liked, but much less used, if you will pardon that rather pathetic
With best wishes,
Jeff ALLEN wrote:
> At 09:57 15/03/99 -0500, Tom Tehan <tsc_msea at SIL.ORG> wrote:
> > I thought I would submit my question to the whole e-list for
> > discussion because I think many subscribers would have interesting
> > thoughts to add. The question has to do with what is a minority
> > language/group, a vernacular language, a threatened language/group, or
> > an endangered language.
> The following terms are those that are used quite a bit right now, with my
> definitions. I acknowledge that my definitons may not suit the needs of
> everyone on this list, but they do allow me to distinguish between the
> different sociolinguistic factors at work in my research.
> * high-density languages: languages for which there is abundant on-line
> electronic resources/data. (For ex. English, French)
> * low-density languages: languages for which there is very little, if not
> any, on-line electronic resources/data.
> (the density terminology tends to be used by the military)
> * sparse-data languages: equivalent to definition of low-density, but it is
> * less(er)-common(ly) taught languages: all languages other than French,
> English, Spanish and German.
> * lesser-used languages: takes into account the 1st and 2nd (3rd, 4th)
> languages that are written and spoken in the world to indicate what
> languages are used to what extent by how many people. The lesser-used are
> those that are in fact less-used on the scale with comparison to those that
> are most-used.
> * low-diffusion languages: (I just came across the term the other day and
> still haven't worked on a definition for it).
> * vernacular languages: this tends to be used in sociolinguistic circles to
> refer to traditional oral languages.
> * high language: In Ferguson terms, the politically and culturally dominant
> language in a diglossic situation.
> * low language: In Ferguson terms, the politically and culturally
> subordinate language in a diglossic situation.
> * neglected languages: those languages that could be developed in some way,
> but there are political, financial, economic, etc factors that are blocking
> the avancement of such development work.
> * endangered languages: languages that risk disappearing in the coming
> * official language: well this is clear.
> * national language: any language that is spoken by a significant part of
> the population. This is a difficult term to quantify.
> * language vs. dialect vs. patois
> (It all depends on your framework on thinking. The definition of
> language and dialect is completely different if you are a theoretical
> linguist, a dialectologist, a socio- or ethnolinguist, or simply a
> non-linguistics oriented person. I have learned to define my terminology
> with respect to my audience. When I change audiences, I often have to
> modify my definitions to adapt to their way of viewing the world).
> And here are some excerpts from various recent papers (as you can tell, I
> have a very good electronic collection of materials).
> Taken from my recent paper:
> ALLEN, Jeffrey. 1998a. Lexical variation in Haitian Creole and orthographic
> issues for Machine Translation (MT) and Optical Character Recognition (OCR)
> applications. Paper presented at the workshop on Embedded MT systems of the
> Association for Machine Translation in the Americas (AMTA) conference,
> Philadelphia, 28 October 1998.
> In this paper, several sociolinguistic and psycholinguistic variables
> pertinent to an adequate linguistic analysis of Haitian Creole are
> presented in order to resolve issues in the development of natural language
> processing (NLP) systems -- including machine translation, speech
> recognition and optical character recognition -- for this language.
> Consideration is taken with regard to the standard vs. non-standard status
> of the language being analyzed for NLP system development. Such
> extra-linguistic factors in 'vernacular' languages (e.g., Haitian Creole)
> must be evaluated in order to sufficiently provide processing techniques in
> systems for issues of linguistic variation that permeate the entire lexicon
> of such languages.
> Section 1. Standard vs. Vernacular Languages
> The Croatian language is an example of both a) a .ow-density.language --
> i.e., language with little or no accessible on-line data -- and b) a less
> commonly taught language. Haitian Creole (henceforth HC) is a + b, yet
> it presents an additional set of issues because it is also c) a
> .ernacular.language that is in the process of standardization and
> normalization. A vernacular language is defined as an "everyday spoken
> language or languages of a community, as contrasted with a standard or
> official language' -- generally, a 'Low' as opposed to a 'High' variety in
> Ferguson's (1959) terms" (Tabouret-Keller et al. 1997, p. 6).
>  References on Less Commonly Taught Languages (LCTLs):
>  HC is an exception in the general definition of vernaculars because
> the language was officialized and given equal status with French in 1986.
> Despite this decree, literacy and education in HC in Haiti is very limited.
> Only one higher education institution (Universit.Cara.e) offers classes
> taught in HC. HC therefore continues to reflect the status of other Creole
> and vernacular languages.
> Taken from:
> LENZO, Kevin, HOGAN, Christopher, and Jeffrey ALLEN. 1998.
> Rapid-Deployment Text-to-Speech in the DIPLOMAT System. Poster presented
> at the International Conference on Spoken Language Processing. 30 November
> - 4 December 1998, Sydney, Australia.
> Section on data collection:
> Collection -- The difficulty of text corpus collection varies by language.
> For the case of Korean, the collection of texts is a straightforward
> process, since information written in Korean is abundant and available from
> current resources on the Internet. For this case, texts were obtained from
> Internet broadcasting sources and the selected material did not pose any
> significant difficulty for Korean speakers.
> The task is significantly more difficult for languages that are not widely
> taught, such as Haitian Creole (Allen and Hogan, 1998, Decrozant and Voss,
> 1998), because they are "low-density" languages, and there are few
> available documents in electronic form. Finding electronic texts written in
> Creole required about five months of part-time research on the Internet, in
> addition to contacting dozens of non-governmental organizations and
> literacy institutes worldwide that eventually provided electronic versions
> of their texts.
> It is possible to scan and correct texts from paper documents, but our
> experience for Croatian and Haitian Creole was similar to that of
> (Decrozant and Voss, 1998) in that current OCR software packages provide
> poor recognition accuracy on less commonly taught languages for which
> customized character recognition has not been specifically developed. Our
> Creole corpus includes all types of text (e.g., novels, political speeches,
> language learning books, literacy primers, religious texts, etc.) that have
> been collected from all available resources whereas the Korean corpus
> remains in domain with abundant amounts of text.
> Taken from:
> Decrozant, Lisa and Clare Voss. 1999. In ELRA Newsletter. Vol 4 issue 1;
> January 1999, Paris: European Language Resources Association. pp. 10-11.
> As researchers tasked with evaluating machine translation (MT) tools
> for military linguists in the field, we must often work with "less commonly
> taught languages" (LCTLs) for which little readily available on-line text
> exists. While many linguistic resources needed for MT evaluation are
> commonly found in electronic form for the major languages of commerce
> (English, French, Japanese, etc.), this is typically not the case for LCTLs
>  . In this brief note, we describe our recent effort transforming
> hardcopy parallel, sentence-aligned text into on-line form.
> The LCTL we discuss here is thus a "low-density" or a "low-diffusion"
> language, in that few linguistic resources are available on-line.
> My colleague Christopher Hogan chose the term "minority language" for
> Haitian Creole in a recent paper:
> Hogan, Christopher (1998) Embedded Spelling Correction for OCR with an
> Application to Minority Languages. Paper presented at Workshop on Embedded
> MT Systems, in conjunction with the AMTA 98 conference. 28 October 1998,
> Langhorne, Pennsylvania.
> I tend to agree more with the terminology in the following two articles:
> SOMERS, Harold. Language Resources and Minority Languages. In Language
> Today. Number 5, 1998. Nottingham, UK: Language Publications
> > Ltd. pp. 20-24.
> Paul Baker, Tony McEnery, Mark Sebba, Lou Burnard. Minority Language
> Engineering. In ELRA Newsletter. Vol.3, Number 4, November 1998
> where Minority languages tend to be multiple in a country where there is a
> different majority official language. In the UK, English is the official,
> majority language, but there are many, many minority languages (East and
> West Indian, Mid-East, etc).
> This message will certainly stir up some discussion on the topic.
> Jeff ALLEN - Directeur Technique
> European Language Resources Association (ELRA) &
> European Language Resources Distribution Agency (ELDA)
> (Agence Europ.nne de Distribution des Ressources Linguistiques)
> 55, rue Brillat-Savarin
> 75013 Paris FRANCE
> Tel: (+33) (0) 18.104.22.168.33 - Fax: (+33) (0) 22.214.171.124.30
> mailto:jeff at elda.fr
> Endangered-Languages-L Forum: endangered-languages-l at carmen.murdoch.edu.au
> Web pages http://carmen.murdoch.edu.au/lists/endangered-languages-l/
> Subscribe/unsubscribe and other commands: majordomo at carmen.murdoch.edu.au
Endangered-Languages-L Forum: endangered-languages-l at carmen.murdoch.edu.au
Web pages http://carmen.murdoch.edu.au/lists/endangered-languages-l/
Subscribe/unsubscribe and other commands: majordomo at carmen.murdoch.edu.au
More information about the Endangered-languages-l