Excluding Basque data
    roslyn frank 
    roz-frank at uiowa.edu
       
    Sun Mar  5 00:07:59 UTC 2000
    
    
  
Since the topic of criteria for inclusion and exclusion of Basque data has
surfaced again on the list, perhaps the following information will be of
interest.
Larry Trask Fri. Oct. 15 1999 writes:
ECOLING at aol.com writes:
>Concerning Larry Trask's list of criteria for potential candidates
>for early Basque vocabulary lists:
> [on the choice of cut-off date]
[LA]
>>The details in the paragraph above suggest to me OBVIOUSLY
>>if you want early Basque, you use 1700 in preference to 1600,
>>because the 16th-century materials are so limited in content.
>>It is always possible to study any differences between 16th and 17th century
>>equivalent grammatical morphemes, forms of the same words, etc.,
>>where those are attested in both centuries, but obviously much
>>non-religious vocabulary will be systematically disfavored by the earlier
>>cut-off date.
[LT]
>Well, I've already explained that I am prepared to consider 1700 rather than
>1600. But I don't think the choice is obvious. My impression at this stage -
>which might prove to be wrong, of course - is that most of the words that meet
>my other criteria are already attested by 1600, and so, if possible, I'd
>prefer to use the more restrictive early date.
I've prepared the following background summary in order to aid those
members of the list who might be somewhat unfamiliar with the source
materials for Basque. Hopefully, the summary also will bring into focus
some of the problems that inevitably arise when one attempts to choose a
"more restrictive early date" for the cut-off, whether that be set at 1600,
1700 or even 1800.
1) There is essentially no Medieval Basque 'literature' of any kind. Hence,
by 1500 what we have accumulated are some epitaphs, a few song fragments
(verses taken from the oral traditions) stuck into things written in
Romance, place names, proper names, e.g., in legal documents, and a couple
of very short word lists compiled by non-Basque speakers. That's it.
Whoops, I left out the marginal notes in the _ Glosas Emilienses _ and a
word or short phrase here and there inserted into poetic works written in
Romance, e.g., in the poetry of Gonzalo de Berceo. He was a priest from La
Rioja who wrote in Castillian. Although he might not have been a
Basque-speaker himself, he apparently was at least passingly familiar with
the language. He was known by the name of the village where he was born,
Berceo (in La Rioja), while he received his education at the Monastery of
San Millan in Araba. This was a zone that we can assume was bilingual at
that time (13th century) since it was located near the Castilian-Basque
linguistic border. Curiously, as was the case with the author of the first
example of Castilian, i.e., the _Glosas Emilienses_, the first medieval
poet writing in Castilian whose name we know, also seems to have had some
knowledge of Basque. For example, as Larry Trask mentions in his book
(1997:45), Berceo "spattered his poems with vasconisms, such as <don
Bildur> (a personification of <bildur> 'fear'), <zatico> (a derivative of
western <zati> 'period of time') and <gabe> 'without'."  In contrast to the
author of the _Glosas Emilienses_ who was clearly a Basque speaker, it is
more difficult to determine the level of fluency that Berceo had in the
language.
2) The Renaissance legacy.
a) The first book in Euskera: Poems.
It wasn't until 1545 that the first book in Euskera was published
consisting of a short collection of poems, many of which are dedicated to
the Virgin Mary (and to women in general). And then there is one poem
dealing with the writer's personal problems with certain ecclesiastical and
civ authorities. In addition to the poems the book also has a very brief
introductory section in prose of about one page in length. The work called
_Linguae Vasconum Primitiae_ has a medieval ring to it. And since the
author was from Low Navarre, that is the dialect represented. The author is
Bernard Dechepare (his name has other spellings), a priest who at one point
in his life becomes an 'arcipreste'.   The overall length of the sample can
be appreciated if one keeps following fact in mind: that in the recent
facsimile edition of the work (1995), the Spanish translation of the Basque
original runs from page 107 to page 127.
b) The second book: The New Testament (a Protestant version).
Next we have Johannes Leizarraga, another religious man. Actually he was a
Protestant minister who ended up in charge of a group of translators who
would produce the first Basque translation of the New Testament (1571), a
work which ended up, not unexpectedly, quite plagued with Latinisms. The
team of 'Wycliffs' worked for Juana de Albret, Queen of Navarre and Dame of
Bearne. Bearne was the location where the translators worked and resided.
Firmly committed to Calvinism, Leizarraga himself was from Lapurdi
(Labourd) although he opted to use a mixture of the three northern Basque
dialects in the translation. When published it was accompanied by _ABC edo
Christinoen Instructionea_, a catechism that contained the principals of
the Calvinist Creed, and _Kalendarea_, calendrical tables to be used to
calculate the moveable feasts.
c) Three collections of sayings.
Then the century ends with three collections of proverbs and sentences
written as far as I know in Bizkaian which at that time was more similar to
the Lapurdin dialect than it is today. Two of the collections weren't
published until several centuries later.
The name of the author of only two of the three collections is known, i.e.,
Esteban de Garibay of Mondragon, chronicler of Phillip II. The latter
writer also included a fragment from a Basque ballad in his _Memorias_.
Leaving aside the proper names and repeated elements found in that
fragment, the text contains a total of 21 words.
d) Although not strictly a literary work, there is also a letter from an
individual residing in the New World to someone back home ( I don't
remember exactly who the recipient was). It is a couple of pages long as I
recall, and was written by a man born in Durango, Bizkaia. He was Fray Juan
de Zumarraga, first Bishop of Mexico. The letter is dated 1537.
Criteria.
Larry Trask has stated that he would prefer 1600 as the cut-off date for
"early attestation".  As one can see from the information given above, if
the cut off date is set at 1600, it's pretty slim pickins.
And furthermore, if one applies Larry Trask's other criterion to the same
corpus, namely, that for an item to be included in his list it must be
recorded early in all dialects or most of the dialects, we're fried
(although this might not be the way that Larry intends the dialectal
criterion to be applied to the data). The database in question simply
doesn't provide a wide sampling. In other words, serious difficulties would
arise if one were to apply the second criterion of widespread dialectal use
to items attested prior to 1600: the two books mentioned above and three
collections of proverbs do not cover all the dialects. Thus, a strict
application of Larry's second rule would actually eliminate all the words
found in these works. Again, he may assume that if the word is attested
early (prior to 1600) in one or more of the northern dialects that would be
sufficient.
Earlier Larry Trask has stated that in his opinion, when finished his list
would end up containing only 200 "native" Basque words. He may have stated
'a couple hundred' (sorry I don't have the exact citation). By my
calculations, there might be even fewer, unless he means that he would
include a if the word can be attested prior to 1600 in one dialect and then
rediscovered in the nineteenth and tntieth centuries in four of the five
dialects.
Thus, there is the question of how the sample is skewed because of the
following facts:
1) that many works are translations from Latin by clergymen;
2) that when the works are not translations they are nonetheless books or
treatises written by priests about religious themes; and
3) during the 16th and 17th centuries the works represent primarily one
dialectal zone, Lapurdi.
With respect to the first point, we note that both of the major books from
the 16th century were written by members of the clergy. In the 17th century
there were 36 editions (note: these are editions, not necessarily new
books) whose authors with one exception, that of Oihenart, were priests; in
the 18th century all the authors were members of the clergy except two,
Barrutia and Etcheberri. Hence, 90% of the works published in these three
centuries were by clergymen; this compares with 6% of the works written in
French for the same period. In terms of the authors themselves, 28% of the
French books were authored by members of the nobility; another 28% by
members of the clergy; and 66% by members of the Third Estate. In contrast,
90% of the books in Euskera were written by the clergy and 10% by members
of the Third Estate.
In terms of the dialect distribution, in the 17th century, of the 36
editions of books over 100 pages, 32 were written in Lapurdin (for a
dialect population estimated at 30,000), 1 in Low Navarrese and 3 in
Zuberoan. None of them were written in any of the southern dialects (for a
discussion of these statisitics, cf. Ibon Sarasola, _Historia social de la
literatura vasca_, Madrid: Akal, pp. 35-55).
So if my understanding is correct, Larry Trask's "more restrictive early
date" would admit one translation of the New Testament with a catechism and
tables for calculating moveable feasts, 1 short book of poems, a letter
from Mexico, and three brief collections of proverbs as his data base to
which he would add the miscellaneous citations, epigraphs, songs, place
names, proper names, a couple of very short word lists compiled by
non-Basques and some random words and phrases found in works written in
Romance prior to 1700.
Should such a database be considered a representative sample?
Keep reading, there are more surprises!!
List members unfamiliar with the highly oral nature of Basque culture might
be surprised to know that the date set for the beginning of Basque
literature is 1879. And to give people a better idea of just how few texts
there really are I would like to reproduce (actually summarize) information
in the form of three charts. In their original form the charts also
indicate which dialects the works were written in, but I've left that
information out. The cut off date for the statistical tabulation is 1879.
To qualify as a work, the text had to be a non-periodical and at least 48
pages long, a standard definition taken from UNESCO.
The time periods covered are:
XVIb 1545-1599
XVIIa 1600-1649
XVIIb 1650-1699
XVIIIa 1700-1749
XVIIIb 1750-1799
XIXa 1800-1849
XIXb 1850-1879
Chart #1.
 In chart #1 we have listed the number of works published in Euskera for
each half-century period, whether written originally in that language or as
a translation. The numbers on the right don't include re-editions of the
same work.
XVIb 3
XVIIa 7
XVIIb 13
XVIIIa 17
XVIIlb 43
XIXa 47
XIXb 64
Total 194
Chart #2
In chart #2 we have a list of all works written and published originally in
Euskera, i.e., those that are not translations.
XVIb 1
XVIIa  6
XVIIb 6
XVIIIa 5
XVIIlb 24
XIXa 26
XIXb 33
Total 101
Chart #3
In chart #3 we can appreciate more fully the dearth of non-religious works
meeting the above minimal UNESCO criteria. Stated differently, the
following list contains the tabulation of secular works written originally
in Euskera up to 1879.
XVIb 1
XVIIa 0
XVIIb 1
XVIIIa 1
XVIIlb 1
XIXa 4
XIXb 4
Total 12
For a full discussion of the data and these chartsf. Sarasola
1976:179-183.
I believe that these three charts help explain some of the reason that Jon
Patrick and I have repeatedly argued in favor of including Azkue's
dictionary as a legitimate and necessary addition to any database for
Euskera. In addition, Azkue was meticulous in noting the dialect, even
indicating the name of the village, in which he collected the item.
Moreover, he utilized some 150 Basque texts as part of his database and he
indicates precisely which text each item comes from, citing the entire
sentence in which it occurred so that the reader has the contextualization
of the entry.
In conclusion, keeping in mind the question of whether religious texts,
primarily translations, are appropriate (or the best) data sources for our
purposes, I would like to quote a few short passages from Jon Juaristi's
book _Historia de la literatura vasca_ from the series _Historia critica de
la literatura hispanica_ (Madrid: Taurus, 1987).
Speaking of the production during the 16th century, he states that the fact
the texts were written in Euskera at all can be attributed to the religious
struggle of that time that was being waged between Protestants and the
forces of the Counter-Reformation:
"Los primeros textos en euskera (al menos, los primeros que tuvieron una
entidad digna de tenerse en cuenta) fueron obras de caracter religioso para
la evangelizacion de un pueblo sin escritura, libros escritos por clerigos
para servir de apoyo a la labor pastoral de otros clerigos. Unamuno pudo
comparar con toda justicia la literatura vasca de los siglos XVI al XIX con
la literatura guarani que los jesuitas promovieron en sus Reducciones del
Paraguay: 'Los catecismos de doctrina cristiana se escribian en vascuence,
pero no para que los ninos los aprendiesen leyendolos, sino para que los
curas se los ensenasen de viva voz. Porque no se debe perder de vista que
el vascuence no ha sido letra escrita por el pueblo (Unamuno 1920).' En
efecto, como sucedia con los textos religiosos guaranies, los libros
euskericos rara vez se dejaban en manos de aquellos a cuya edificacion y
adoctrinamiento iban destinados (Juaristi 1987:13-14)."
In short, from my point of view, given the nature of the facts set out
above, to assign the cut-off point for the database at 1600 is not
particularly logical; and it would be only slightly more logical to assign
the cut-off to 1700. This is particularly so if we keep in mind that our
aim is to reconstruct a stage of the language roughly 2000 years earlier,
i.e., prior to its first contacts with the Roman invaders who entered the
Peninsula in 218 BC.  Considering the intended purpose of the database, it
is unlikely that any changes that the language would have undergone in the
hundred-year period, i.e., from 1599-1699, would affect the outcome of the
study in any significant way.
By the way is there a chronological cut-off point for words in Romance
languages? Or in Slavic? Just curious.
Ondo ibili,
Roz
    
    
More information about the Indo-european
mailing list