[Lingtyp] A list of 50 basic sentences
alex.francois.cnrs at gmail.com
Sun May 9 11:42:37 UTC 2021
As various people have pointed out, the idea of standardized questionnaires
for grammatical elicitation is far from new, and there have been many
valuable attempts already.
A rich selection of such questionnaires can be found on the homepage of the
CNRS Fédération *Typologie & Universaux du Langage* (TUL)
http://tulquest.huma-num.fr/en/node/9. The Questionnaire project, led
by Aimée Lahaussois (CNRS-HTL), has led to a dedicated website, and also to
a special volume of LD&C, with lots of relevant discussion:
- Aimée Lahaussois & Marine Vuillermet (eds.), *Methodological Tools for
Linguistic Description and Typology*. Special issue of *Language
Documentation & Conservation* 16, 155-196.
My own contribution to this volume was precisely motivated (among other
things) by the idea of creating parallel corpora for grammatical
elicitation and comparison.
I thus proposed *conversational questionnaires*: as a new tool:
- François, Alexandre. 2019. A proposal for conversational questionnaires
In Aimée Lahaussois & Marine Vuillermet (eds.), *Methodological Tools
for Linguistic Description and Typology*.
Special issue of *Language Documentation & Conservation *16,
(I sent this chapter to Ian yesterday in an offline message, and I just saw
that Ian sent it to the list – thanks Ian!)
A word of explanation:
I've used similar questionnaires in the field in Vanuatu and the Solomons
since 2003, and they've allowed me to efficiently collect data for
descriptive and comparative purposes. (Of course, in combination with the
recording of spontaneous speech in each language.) Some of my
questionnaires, or similar ones inspired by the method, have been also
successfully tested by students and colleagues in other locations.
One of the main concerns behind my proposal was the observation that
syntactic questionnaires based on isolated sentences (of the type "The bird
was eaten by the dog.') typically result in contrived and unnatural
responses, with a high risk of a translation bias, and uncertainty whether
the proposed translation is indeed communicatively / pragmatically
equivalent to the original. The conversational questionnaires I propose
focus on *idiomaticity*, which is one of my principal subjects of interest
in linguistics. The idea is to elicit data using snippets of
conversation instead of isolated sentences, so as to emulate the natural
ecology of linguistic utterances, which after all is always dialogical.
Eg. Dialogue 3 (p. 183, "Seeing the doctor") has this sort of dialogue :
14. B – Does it hurt during the day? or only at night?
15. A – Mostly at night. I don’t know why.
16. Doctor, I’m a bit worried: what is going on?
17. B – Did you eat anything particular lately?
18. A – Hm, let me remember… No, I don’t think so.
19. Oh wait, actually yes I did!
20. Last week, my child came back from the forest
with some strange fruit I had never seen.
21. He gave them to me, for me to try.
22. B – Did you?
23. A – Yes I did. Actually I liked it, it was sweet. I ate many of them.
24. But then, I became sick after that.
This method is not perfect of course, and some of the issues usually raised
by elicitation in general are still relevant; in terms of idiomaticity,
of course nothing beats actual spontaneous speech! But then, spontaneous
speech is, by essence, difficult to channel into a system of parallel
The chapter discusses some of the issues, weighs the pros and cons of
conversational questionnaires. On pp.171 sqq (“Conversational
questionnaires as parallel corpora”) I specifically discuss the possible
contribution of such questionnaires to linguistic typology. Thus, if
dialogue D3 above were to be translated into many languages, sentence #20
could be taken as a data point for relative clauses (*some strange fruit I
had never seen*); #21 would elicit one type of purposive clause; etc.
As I wrote yesterday to Ian in my offline message:
"By going in the form of entire "conversations" (albeit scripted ones), you
increase the chance of responses to be more idiomatic; and also, you give
your consultant more time to mentally elaborate a situation that may be
conducive to the sort of construction you're looking for.
For example, a passive voice is very difficult to elicit in an isolated
sentence, and I really advise my students against trying that; but a
dialogue allows you to construct a whole situation where some participants
are pragmatically backgrounded vs. others foregrounded, etc; so you can
create the typical discourse context in which a passive voice would be
idiomatically used, if it exists in the language. An isolated sentence
will never give you that."
The corpus of dialogues, of which I propose five samples in the chapter,
would be meant to grow as more and more linguists elaborate similar
questionnaires along the same principles. We could create a database of
such questionnaires: the database would grow as we add more
questionnaires, and also as we add more and more translations into
languages of the world.
LaTTiCe <http://www.lattice.cnrs.fr/en/alexandre-francois/> — CNRS–
Australian National University
<https://www.ae-info.org/ae/Member/François_Alexandre> – Academia.edu
Personal homepage <http://alex.francois.online.fr/>
On Sun, 9 May 2021 at 10:38, Martin Haspelmath <martin_haspelmath at eva.mpg.de>
> This is a great project – I have long thought that somone should come up
> with a standard list of "comparative sentence meanings" of this sort,
> analogous to the "comparative word meanings" as found in Swadesh-type lists
> (the Concepticon brings together 353 lists of this kind:
> Parallel-word studies have enormously profited from standard word meaning
> lists, and likewise, parallel-text studies will profit from standard
> sentence meanings in many languages.
> We already have quite a few parallel texts, but they are typically
> unglossed – so if linguists started to collect such "mini parallel sentence
> sets" (also from smaller languages), this would be very useful, I think.
> Of course, there are many individual ways in which Ian Joo's list could be
> improved, and there are also many larger issues (of the sort mentioned by
> Sandra). But maybe we should think of these as limitations that are
> inherent in the method, not as problems that make the method unsuitable. So
> I think it would be nice if a project like this got off the ground. (It was
> actually suggested to me by Michael Cysouw over 15 years ago, and I have
> kept thinking about something of this sort on and off.)
> Am 09.05.21 um 09:25 schrieb Sandra Auderset:
> Hi Ian,
> Following up on Hartmut, Yunfan and others, I have some questions:
> - What do you do with variation? I’m not familiar with the languages
> you work on, but ’Tense Future’ in German could be translated as “Ich werde
> morgen gehen” or “Ich gehe morgen”. The latter would be more frequent in
> spoken language, but you might get the former because of translation bias.
> Would you include both? You say that your method accounts for the choice of
> the speaker, but again I wonder if this isn’t just translation bias.
> - You say that this method has the advantage of including more
> frequently observed features. I wonder how you know whether that’s the case
> or not? Do you mean in spoken or written language? As Yunfan pointed out,
> with 50 sentences you might easily miss some common features.
> - How do you standardize the glosses? For example, how do you decide
> whether something should be glossed as ‘be’ or copula? That seems important
> to me, since glossing is very subjective and you might inadvertendly bias
> the whole calculation. Especially since you already wrote up the conclusion.
> - Lastly, I find it odd that Example 2) is calculated as having
> distance 1. To me, there are two differences: presence/absence of
> nominative and the presence/absence of a copula. How do you determine that
> the copula is in the same slot as the nominative for calculation?
> * Sandra Auderset <https://sauderset.github.io/>*
> PhD Candidate | [she/her]
> Department of Linguistic and Cultural Evolution
> MPI for Evolutionary Anthropology
> Department of Linguistics
> University of California Santa Barbara
> On Saturday, May 08, 2021 at 19:16, Hartmut Haberland <hartmut at ruc.dk>
> Dear Ian,
> I have a few comments.
> I was wondering about
> Is it a good idea to use ‘genitive’? Would ‘possessive’ not be better?
> Also I wonder about languages like Finnish which express contrast between
> definiteness and indefiniteness by word order:
> Auto on kadulla. ‘*The car* is in the street.’
> Kadulla on auto. ‘There is *a car* in the street.’ (-ulla is inessive
> Also think of Italian
> La macchina è rotta.
> È rotta la machina.
> both ‘The car is broken’, but are answers to different questions (Where is
> your car?, Why are you late?, resp.); same (SV vs. VS) in Greek. How would
> you get these results?
> Best, Hartmut
> *Fra:* Lingtyp <lingtyp-bounces at listserv.linguistlist.org>
> <lingtyp-bounces at listserv.linguistlist.org> *På vegne af *JOO, Ian
> *Sendt:* 8. maj 2021 15:08
> *Til:* LINGTYP <lingtyp at listserv.linguistlist.org>
> <lingtyp at listserv.linguistlist.org>
> *Emne:* [Lingtyp] A list of 50 basic sentences
> Dear all,
> I am trying to make a list of 50 basic sentential meanings.
> The goal is to make parallel corpora of different languages based on this
> list of sentences.
> Each sentence on the list serves to check whether a language has a given
> grammatical feature, and if so, in what form the language expresses it.
> When creating each sentence, I tried to limit its vocabulary to basic
> words that are found in most languages, avoiding culture-specific words.
> I would appreciate it if you could have a look at the attached file and
> advise what I should add/remove/modify.
> From Hong Kong,
> *This message (including any attachments) contains confidential
> information intended for a specific individual and purpose. If you are not
> the intended recipient, you should delete this message and notify the
> sender and The Hong Kong Polytechnic University (the University)
> immediately. Any disclosure, copying, or distribution of this message, or
> the taking of any action based on it, is strictly prohibited and may be
> *The University specifically denies any responsibility for the accuracy or
> quality of information obtained through University E-mail Facilities. Any
> views and opinions expressed are only those of the author(s) and do not
> necessarily represent those of the University and the University accepts no
> liability whatsoever for any losses or damages incurred or caused to any
> party as a result of the use of such information.*
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> Lingtyp mailing listLingtyp at listserv.linguistlist.orghttp://listserv.linguistlist.org/mailman/listinfo/lingtyp
> Martin Haspelmath
> Max Planck Institute for Evolutionary Anthropology
> Deutscher Platz 6
> D-04103 Leipzighttps://www.shh.mpg.de/employees/42385/25522
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Lingtyp