28.1878, Review: Afroasiatic; General Ling; Text/Corpus Ling; Typology: Vanhove, Mettouchi, Caubet (2015)

The LINGUIST List linguist at listserv.linguistlist.org
Thu Apr 20 16:21:32 UTC 2017


LINGUIST List: Vol-28-1878. Thu Apr 20 2017. ISSN: 1069 - 4875.

Subject: 28.1878, Review: Afroasiatic; General Ling; Text/Corpus Ling; Typology: Vanhove, Mettouchi, Caubet (2015)

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
                       Fund Drive 2017
                   25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Clare Harshey <clare at linguistlist.org>
================================================================


Date: Thu, 20 Apr 2017 12:21:26
From: Zoe Bartliff [0908450b at student.gla.ac.uk]
Subject: Corpus-based Studies of Lesser-described Languages

 
Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=36105117


Book announced at http://linguistlist.org/issues/26/26-2704.html

EDITOR: Amina  Mettouchi
EDITOR: Martine  Vanhove
EDITOR: Dominique  Caubet
TITLE: Corpus-based Studies of Lesser-described Languages
SUBTITLE: The CorpAfroAs corpus of spoken AfroAsiatic languages
SERIES TITLE: Studies in Corpus Linguistics 68
PUBLISHER: John Benjamins
YEAR: 2015

REVIEWER: Zoe Bartliff, University of Glasgow

Reviews Editor: Helen Aristar-Dry

SUMMARY

The volume ‘Corpus-based Studies of Lesser-described Languages: The CorpAfroAs
corpus of spoken AfroAsiatic languages,’ edited by Amina Mettouchi, Martine
Vanhove, and Dominique Caubert,  is a companion text to the online CorpAfroAs
corpus project. This corpus contains one-hour-long samples of recorded speech
from twelve AfroAsiatic languages: Kabyle, Tamashek (Berber), Hausa, Bata and
Zaar (Chadic), Afar, Beja, Gawwada, Ts'amakko (Cushitic), Wolaitta (Omotic),
Moroccan and Libyan Arabic, Juba-Arabic, Hebrew (Semitic). These have been
transcribed and annotated with the primary intention of allowing examination
of their prosodic and morphosyntactic features. This project was a pilot
corpus designed to provide a model for other similar projects as well as
allowing for the creation of a more user friendly and efficient version of the
traditional annotation programs ELAN and Toolbox. The volume of collected
essays reviewed here was written to accompany the project and was designed to
provide what are in essence extensive accompanying notes to the corpus’
construction as well as giving the initial findings from analysing the
languages themselves.

Divided into five parts, this volume covers a wide spectrum of content. There
is initially an extensive introduction to both the history of the project and
the volume itself. Written by the editors of the book (Mettouchi, Vanhove and
Caubet) the Preface commences with a description of the lamentable lack of
availability of Afro-Asiatic language sound-files and within that the lack of
systematic annotation. This was the impetus behind the creation of CorpAfroAs.
>From this point there is a brief overview of what makes the corpus unique,
namely the fact that it provided a homogenised model for the creation of such
a corpus. The languages chosen are deliberately diverse and representative of
all features of Afro Asiatic languages to enable the most comprehensive model
possible. Also included is an overview of the glossing system, the choice to
focus upon prosody and morphosyntactic features and finally a breakdown of the
later sections of the book each of which takes a different focus.

The first and second parts of the volume concentrate on analysing samples of
the corpus, but there is some discussion concerning the challenges of
transcription faced within the project. ‘Representation of Speech in
CorpAfroAs: Transcriptional Strategies and prosodic units’ by Shlomo Izre’el
and Mettouchi, for example, commences the volume with discussion on the
comprehensive and varying layers of transcription used within the corpus and
places these within the context of the corpus, explaining their use and
relevance. There is a distinct focus upon the tx tier (symbolic association or
phonetic transcription) designed to faithfully represent the speech and the
mot tier (morphosyntactic representation) as it is these units which most
faithfully represent the prosodic value of the sample. An extensive section of
this chapter is devoted to the explanation and demonstration of prosody of
differing levels within a scattered sampling from the corpus. This is intended
to be a survey of the corpus rather than a detailed study. Little attention is
paid to the glossing tiers of transcription as this is the focus of a later
chapter. 

Bernard Caron in ‘Tone and intonation’ offers to the volume a more focused
investigation into intonation within tonal languages. This chapter
demonstrates above all else the potential uses of the corpus, as Caron
conducts analysis of previously unavailable language samples, namely those
from Zaar.  Zaar is revealed as a mixed language with regard to intonation; it
possesses both internal intonation and peripheral intonation. 

In Part Two, Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira offer
a similar intonation centred analysis of Zaar, Tamasheq, Juba Arabic and
Tripoli in the chapter titled ‘Intonation of topic and focus.’ This is a  more
correlational study which treats each language individually in terms of the
relational aspects between prosody and structural and semantic content of
speech. Unlike Caron’s chapter, which is purely demonstrative of the
opportunities for analysis offered by CorpAfroAs, this chapter offers valuable
conclusions as to the tonal tendencies of the language family. Of particular
note is the evidence of shared patterns of intonation for thetic speech and
the bipartite division with interrogatives whereby the informational structure
is either intonational or morpho-syntactic in nature. 

The fourth chapter, by Il-Il Milibert and Martine Vanhove – ‘Quotative
constructions and prosody in some Afro-Asiatic languages; towards a typology’
- focuses upon four genetically varied languages of the corpus (Beja, Zaar,
Juba Arabic and Modern Hebrew) with regard to prosodic values of direct and
indirect reported speech. Adapting existing systems of analysis to suit the
annotation system of the corpus, Milbert and Vanhove provide a tentative model
for the appearance of prosody within reported speech. 

In Part Three, the focus of the volume shifts away from prosody and towards
the issues raised by glossing and the cross linguistic analysis that this
allows. Chapter Five, ‘Glossing in Semitic languages; a comparison of Moroccan
Arabic and Modern Hebrew’, by Ángeles Vicente, Malibert and Alexandrine
Barontini proposes a universal model for glossing morphological analysis to
allow all readers, not solely those familiar with a language, to understand
fully the purpose and meaning of individual morphemes. This is accomplished
first with an overview of historical glossing for each of the chosen languages
before progressing to the application of the proposed model to the languages
and those within the extended family tree.

The next chapter ‘From the Leipzig Glossing rules to the GE and RX lines’ by
Bernard Comrie returns to the focus of the volume, the CorpAfroAs project.
This chapter is the glossing equivalent to the opening chapter of the book and
focuses upon the adaptation of the standard Leipzig method of glossing to the
tiered approach used within CorpAfroAs. The chapter commences with an overview
of the tradition and importance of glossing as a manner of making a text
accessible to an audience. Discussion then develops on to the requirements of
the project and Afro-Asiatic languages as a whole, which require greater
flexibility and more categorical variety than that permitted by the Leipzig
Glossing Rules particularly with regard to the retrieval of grammatical
categories from the corpus. The Chapter entitled ‘Cross linguistic
comparability in CorpAfroAs’ by Metouchi, Graziano Savà and Mauro Tosco puts
these glossing practices to the test with a cross-linguistic comparison of
‘ventive’  extensions, gender and case endings within the corpus. The tiered
glossing proves an effective method for the retrieval of such grammatical
features for analysis. 

Further evidence of the effectiveness of the CorpAfroAs corpus for
cross-linguistic comparability is provided by Zygmunt Frajzyngier and
Mettouchi’s paper ‘Functional domains and cross-linguistic comparability.’ In
this paper the focus is specifically on overcoming the difficulties faced
within cross linguistic analysis in choosing the proper object for comparison.
This chapter presents an approach not currently utilised by the CorpAfroAs
corpus in that it transfers the data into a database, assigns functional
domains and subdomains which are applicable across languages. It intends to
shift the approach of such studies away from the use of universal categories
towards those which are actually encoded within the grammatical system of a
specific language. 

The fourth part of the volume analyses the phenomena of code-switching and
borrowing along with the implications that these have for the creation and use
of the corpus. Manfredi, Marie-Claude Simeone-Senelle and Tosco in ‘Language
contact, borrowing and codeswitching’ discuss the difficulties of glossing
these two phenomena. Innovatively, they also utilise prosodic analysis to
bring forward new conclusions concerning the identification of such
characteristics within Afro-Asiatic languages and, most notably, the
distinction between code-switching and borrowing. 

The final and most technical chapter examines the creation of ‘ELAN-CorpA:
lexicon-aided annotation in ELAN’ and is written by the software developer on
the team, Christian Chanard. Chanard discusses the limitations of existing
software for the creation of the corpus and then explains how the existing
features of the Toolbox were integrated into the ELAN software to create a new
and tailored program specifically designed to transcribe spoken samples.

EVALUATION

This volume, although describing a unique and valuable project, is quite
frustrating to read. The primary source of this frustration originates from
the feeling that the volume attempts to integrate both linguistic analysis and
analysis of the technological tools used to create the corpus. This is an
ambitious goal which unfortunately would have been better accomplished across
two separate volumes. As it stands the linguistic analysis feels not only
lacking but disorganised, and the discussion of the technologies used seems
fragmented to the point of incomprehension. It is possible to see that
attempts have been made to structure the volume so that the linguistic
analysis appears first, so as to demonstrate the value and extent of the
corpus, and then, in the latter half of the book to examine the
technicalities. Practically, however, the authors of the early sections are
required to include elements of the technological analysis in their
discussions. This leads to a slightly repetitive and confused development of
the volume. Each of the authors in Parts 1 and 2 devotes a section of their
chapter to describing, for example, the units utilised within prosodic
analysis, or other concepts which are universal to the volume as a whole. It
would perhaps have been a wiser editing decision to include an index for such
terms or even a further introductory chapter which describes them and ensures
consistency throughout. 

Equally frustrating is that although this book attempts to stand alone as a
critique and analysis of the CorpAfroAs project, it is essential that they are
viewed together. It is exceptionally difficult to follow the studies contained
within the volume without regular reference to the corpus itself, particularly
with regard to the sound samples provided. Even this option, however, is
denied in ‘The intonation of topic and focus’ where there is a section within
which the data used is not part of the CorpAfroAs project or indeed accessible
anywhere as it is personal data collected by one of the researchers. 

These aspects, however exasperating they make the volume to read, do not
detract from the value of the text and the project. CorpAfroAs is one of the
only readily accessible sources for spoken Afro-Asiatic languages and in
addition is a well-planned and wholly comprehensive model for the glossing of
speech samples. This is something that has for a long time been lacking from
the field of corpus linguistics and as such this project as a whole and the
volume here reviewed are invaluable advancements to modern linguistic studies.
Throughout the volume the contributors make an effort to suggest further paths
of investigation further demonstrating that the field of corpus based
Afro-Asiatic studies and the CorpAfroAs project are both within their early
stages and ripe for further academic attention.


ABOUT THE REVIEWER

I am a PhD candidate at the university of Glasgow. My thesis aims to examine
the interaction between Latin and Welsh during the Medieval period through the
use of corpus and comparative linguistics.





------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
                       Fund Drive 2017
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/

This year the LINGUIST List hopes to raise $70,000. This money
will go to help keep the List running by supporting all of our 
Student Editors for the coming year.

Don't forget to check out the Fund Drive 2017 site!

http://funddrive.linguistlist.org/

We collect donations via the eLinguistics Foundation, a
registered 501(c) Non Profit organization with the federal tax
number 45-4211155. The donations can be offset against your
federal and sometimes your state tax return (U.S. tax payers
only). For more information visit the IRS Web-Site, or contact
your financial advisor.

Many companies also offer a gift matching program. Contact
your human resources department and send us the necessary form.

Thank you very much for your support of LINGUIST!
 


----------------------------------------------------------
LINGUIST List: Vol-28-1878	
----------------------------------------------------------
Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
          http://multitree.org/







More information about the LINGUIST mailing list