27.989, Review: Computational Ling; Text/Corpus Ling; Translation: Fantinuoli, Zanettin (2015)

The LINGUIST List via LINGUIST linguist at listserv.linguistlist.org
Thu Feb 25 19:00:29 UTC 2016


LINGUIST List: Vol-27-989. Thu Feb 25 2016. ISSN: 1069 - 4875.

Subject: 27.989, Review: Computational Ling; Text/Corpus Ling; Translation: Fantinuoli, Zanettin (2015)

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
                   25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Sara  Couture <sara at linguistlist.org>
================================================================


Date: Thu, 25 Feb 2016 14:00:02
From: Daria Dayter [coocho at gmail.com]
Subject: New directions in corpus-based translation studies

 
Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=36079597


Book announced at http://linguistlist.org/issues/26/26-2729.html

EDITOR: Claudio  Fantinuoli
EDITOR: Federico  Zanettin
TITLE: New directions in corpus-based translation studies
SERIES TITLE: Translation and Multilingual Natural Language Processing
PUBLISHER: Language Science Press
YEAR: 2015

REVIEWER: Daria Dayter, Universität Basel

Reviews Editor: Robert Arthur Cote

SUMMARY

This volume, entitled “New directions in corpus-based translation studies” and
edited by Claudio Fantinuoli and Federico Zanettin, is a collection of six
papers on different aspects of the corpus-based methodology in translation
studies. The authors report on their own efforts within this relatively new
field, which explains the focus on the know-how,  custom-made corpus software,
 innovative annotation, and  probing for  new types of research questions. The
book is based on the presentations from  “Corpus-based translation studies”, a
panel held during the 7th Congress of the European Society of Translation
Studies in 2013. Its origin is evident throughout the collection, as most of
the papers give detailed accounts of works-in-progress, concentrating on
methodological decisions, rather than a systematic analysis or an overview of
quantified results, which are promised to follow as the projects unfold. This
is not to say that the collection is not a success. As anyone who works in 
corpus-based translation studies (CBTS) knows, technological solutions are
often ad-hoc, and the type of research carried out is sometimes constrained by
the tools available to the researcher. Even more importantly, corpus studies
so far have mostly addressed staple questions related to counting and
contrasting microlinguistic features with the aim of finding S- or
T-universals (Chesterman 2004). Here, the authors attempt to explore less
conventional territory  armed with  corpus tools (e.g. how translators form,
reject and confirm hypotheses during the translation process, investigated
with the help of a keystroke corpus in Serbina et al., this volume). The scope
of investigation includes seven European languages: Basque, Dutch, German,
Greek, Italian, Spanish, and English. Because of the innovative nature of the
collection, it will be of interest to scholars and advanced students in the
areas of translation and interpretation studies. It should also attract the
attention of corpus linguists, for it demonstrates the potential applications
of corpus methods in previously uncharted territory and covers new corpus
design and corpus software.

The first chapter, “Creating and using multilingual corpora in translation
studies” by Claudio Fantinuoli and Federico Zanettin, takes a welcome detour
from the established format of an introduction to an edited volume. Instead of
giving a summary of subsequent chapters, the editors identify the main issues
in CBTS that appear in every contribution. These issues predictably lie in the
areas of corpus design, annotation and alignment, and corpus analysis. In a
terminological aside, the editors’ propose to solve the debate surrounding the
terms “parallel/comparable corpus” by treating them as a function of corpus
architecture. In that case, a parallel corpus is a corpus where “two or more
components are aligned, that is, are subdivided into compositional and
sequential units (of differing extent and nature) which are linked and can
thus be retrieved as pairs (or triplets, etc.)” (p. 4). A comparable corpus,
in turn, is a corpus whose components are compared on the basis of assumed
similarity. The papers in this collection make use of both parallel and
comparable corpora sometimes drawing in existing monolingual corpora to verify
their results. The diversity of datasets and annotations (from automatically
tagged to full manual tagging) finds a reflection in the range of analyses
offered by the contributors, from theta theory to critical discourse analysis.
Recognising the achievements of the volume, Fantinuoli and Zanettin call for
“a stronger tie between technical expertise and sound methodological practice”
(p. 9) to continue to move CBTS forward.

The second chapter, “Development of a keystroke logged translation corpus”, is
written by Tatiana Serbina, Paula Niemietz, and Stella Neumann and focuses on
the process of translation. To this end, Serbina et al. collected three
sub-corpora in an experimental setting: an original English corpus of texts in
popular physics and their translations into German by two distinct subject
groups, professional translators and domain specialists. During the
experiment, the Translog software recorded all the keystrokes and mouse clicks
made by the translator and the length of pauses between them. Serbina et al.
also designed a custom alignment tool that enabled them to first align the
target keystrokes to tokens, and then align these to the alignment units
consisting of the source-target token counterparts. The result was a richly
annotated corpus that allowed the researchers to identify several intermediate
products of translation, juxtapose them to the final version, and draw
hypotheses about the thought process of the translator. In addition, the
presence of the intermediate versions enabled the researchers to explain the
mistakes in the final translation through the reasons other than lacking
competence in the target language or simple typos. For example, an incorrect
agreement marker on the indefinite article in the phrase “eine dünnes Blatt”
is ascribed to the fact that the preceding version of the translation
contained another noun phrase in the same position, namely “eine dünne
Alufolie”, where the feminine form “dünne” had been the correct choice (p.
23). Serbina et al. conclude with an outlook to further steps in the project:
expand the corpus and include eye-tracker data to complement the keystroke
logs.

Chapter 3 by Effie Mouka, Ioannis E. Saridakis, and Angeliki Fotopoulou,
“Racism goes to the movies: A corpus-driven study of cross-linguistic racist
discourse annotation and translation analysis”, is based on the PhD project of
the first author. Mouka et al. conducted critical discourse analysis of the
translation choices made when translating racial slurs in subtitles of movies
from English into Greek and Spanish. They used the categories from the
Appraisal Theory – attitude, graduation, engagement – to describe each slur
and to categorise the translation choice as mitigating the original,
overtoning it, or maintaining the same force. The corpus on which the study is
based consists of nine hours of film material annotated in ELAN and GATE
platform. The four American and one British film that the authors chose were
all feature films belonging to the drama genre, and the stories revolved
around racism and interracial relations (p.42). Mouka et al. raise an
important concern about the inherent multimodality of film data, and,
especially, the shift in the mode of the message in the three sub-corpora.
Although the original subtitles are text transcribed from an oral medium, the
target subtitles are written. In addition to the subtitles corpus, the authors
used enTenTen12, GkWaC, and esTenTen11 as reference corpora for English, Greek
and Spanish. The findings, which reflect the cultural sensitivity towards
heterophobic discourse that has developed in Greece and Spain as a result of
their “first frontier” status in the recent influx of refugees, are said to
demonstrate “the role of translation in the diachronic development of the
sociolinguistic dimension of racism” (p. 65).

The fourth chapter in the collection is “Building a trilingual parallel corpus
to analyse literary translations from German into Basque” by Naroa Zubillaga,
Zuriñe Sanz, and Ibon Uribarri. Given the minority status of Basque, certain
issues specific to this target language made the corpus compilation especially
difficult. For example, it is rare to find a book translated directly from
German into Basque without Spanish as a bridge language. In addition, there
are very few translators who work with the German-Basque language pair, and
until recently, no German-Basque dictionaries were even available (p.78).
Zubillaga et al. explain that although they initially only created a Spanish
subcorpus for the Basque target texts for which no direct translation was
available, they currently plan to complement every German-Basque alignment
pair with the Spanish text. The findings of translation research also
underscore the special status of Basque. The interference of Spanish, the
dominant language of the translators, makes itself known in the Basque
translations in the form of literal translations of Spanish idioms. The
standardising influence of Basque, a language which is rarely used outside of
official domains, is evident in downtoning of offensive language. At the
current stage, however, the authors see the creation of the parallel corpus
and the accompanying tools as their main achievements. This impressive
undertaking (the corpus is 5.5 mio words) involved the digitisation of
hundreds of books. Tagging and aligning was done with the help of
TRACE-Aligner, a program developed specifically for this corpus, which was
followed by manual fine-tuning. The release of the corpus for general use has
unfortunately been  delayed indefinitely because of the inevitable copyright
issues with the literary works.

Chapter 5 by Ekaterina Lapshinova-Koltunski, “Variation in translation:
Evidence from corpora”, compares the end product of human and machine
translations. Lapshinova-Koltunski extracted source English texts and their
translations into German by professional translators from the CroCo corpus.
She then supplemented this data with the translations by inexperienced human
translators using computer aided tools and  rule-based and statistical machine
translations. The resulting material was tokenized, lemmatized, and tagged
with part-of-speech information and segmented into syntactic chunks and
sentences. To compare the different types of translation, the author resorts
to the well-known features of translationese: explicitation, simplification,
normalisation, and convergence. They are operationalized through a number of
microlinguistic features that could be easily retrieved from the tagged
corpus. For instance, simplification is measured through lexical density and
type-token ratio, whereas explicitation is defined as the proportion of
nominal phrases filled with pro-forms vs. full nominal phrases. Interestingly,
the translation produced by the rule-based system was so poor that the
inclusion of these results into overall discussion is almost nonsensical.
Overall, Lapshinova-Koltunski finds that the feature of convergence is the
only one visible in all the texts, and it shows no significant variation among
translation methods. As the editors remark, although the features of
translationese have been extensively tested before, this paper stands out as
“one of the first investigations which compares corpora obtained through
different methods of translation to test a theoretical hypothesis rather than
to evaluate the performance of machine translation systems” (p. 7).

The contribution by Steven Doms, “Non-human agents in subject position:
Translation from English into Dutch: A corpus-based translation study of
‘give’ and ‘show’” forms the sixth chapter of the collection. Doms
investigates the choices that translators make when confronted with a
fundamental typological difference between two languages. The difference in
question is the constraint against non-human subjects in agent role in Dutch.
In English, of course, such subjects are perfectly acceptable, as the example
demonstrates: “Studies in animals have shown reproductive toxicity […]” (p.
116). The author uses the Dutch Parallel Corpus to extract sentences that
contain the verbs “give” and “show” in the English source text and then cleans
the data manually according to a number of criteria, e.g. filtering out the
phrasal verbs and idioms, choosing the sentences that have agent as the
subject, etc. Following D’haeyere (2010), Doms assigns the Dutch translations
to three categories: (1) the non-human subjects retained in the agent role;
(2) avoidance of a non-human agent through changes to the sentence; and (3)
the original non-human agent not translated. In Doms’ corpus, when choosing to
avoid a non-human agent, the translators either introduced a human agent in
Dutch,used a non-agentive subject (theme, recipient, possessor), or
substituted the original verb “give/show” for another one. The results show,
however, that in an overwhelming majority of cases (57.2%), the translators
retain the non-human agent, thus introducing English interference into Dutch
texts.

The collection concludes with Gianluca Pontrandolfo’s contribution
“Investigating judicial phraseology with COSPE: A contrastive corpus-based
study.” This chapter is based on a custom-made corpus of criminal judgements,
COSPE, which contains 6 mio tokens in English, Spanish, and Italian. This
contribution stands out from the rest of the volume because COSPE is not a
parallel but instead a comparative corpus, i.e. the texts are not translations
of each other but simply representative of the same legal genre. To query the
corpus, Pontrandolfo resorted to a variety of analytical steps, ranging from
corpus-driven to corpus-based. On the corpus-driven end of the continuum, he
looked at n-grams and collocations of common legal terms. On the corpus-based
end, he investigated complex prepositions and lexical doublets/triplets, both
of which are characteristic of the judicial genre. To establish the importance
of the investigated features for the legal judgements, Pontrandolfo used the
BNC, CORIS/CODIS and CREA as reference corpora for English, Italian, and
Spanish respectively. The findings confirmed that although there were some
differences between the three sub-corpora,  “phraseology is indeed a key
lexico-syntactic feature of this genre and it is part of judges’ idiosyncratic
drafting conventions” (p. 152).

EVALUATION

Once again, shortly after the appearance of Straniero Sergio & Falbo’s
“Breaking ground in corpus-based interpreting studies”, Italian scholarship
has announced its intention to stay on the cutting edge of CBTS. The main
strength of “New directions in corpus-based translation studies” is that it
reports on the most current, ongoing research that readers would not normally
have access to unless they attend thematic conferences. It is also the first
volume in the new series “Translation and Multilingual Natural Language
Processing” launched by Language Science Press, which promises to be a
thoughtful forum dedicated to empirical and interdisciplinary investigation of
translation. The contributions all tie in together well thanks to their
methodological unity, which makes the book interesting to researchers who are
currently working on or plan to undertake a project in quantitative
translation studies. The only paper that somewhat skews the pattern is
Pontrandolfo’s project on phraseology in legal texts because it uses a
comparable rather than a parallel corpus. As a result, it is oriented more
towards developing a teaching resource rather than answering fundamental
questions about translation, and ultimately is not well-situated within a
translation studies framework. 

The volume illustrates the trend in translation and interpreting studies to
shift the attention from the product of translation towards its process –  a
research objective which until now has mainly drawn the eyes of cognitive
linguists. In this vein, Serbina et al.’s paper proposes an excellent way to
query the translation process on the level of observable tokens that can be
compiled into a corpus. Zubillaga et al.’s work on a corpus of parallel
German/Spanish/Basque translations feeds into this research strand from a
different direction by giving a corpus analyst a view of the influence of the
intermediate language version and the dominant language of the translator.

I recognise a further value in the language combinations chosen by the
authors. It is especially cheering to see European minority languages, such as
Basque, investigated within translation studies and through a corpus lens.
Similarly, the papers based on the major language pairs such as Spanish,
Italian, and Greek (Mouka et al., Pontrandolfo) fill a gap in corpus-based
studies of societally relevant topics, e.g. heterophobic language and legal
judgements, which to date have  mostly been English based (cf. Baker et al.
2013 on representation of Islam in the British press, for example).

The work-in-progress nature of these papers, however, also gives rise to
certain drawbacks. Given that quantitative analysis is the key strength of a
corpus approach, it would have been desirable to see some overall systematic,
quantified results which the authors withhold due to the ongoing status of
their projects (see Serbina et al., Mouka et al., Zubillaga et al.). Some
methodological decisions are skimmed over, although they appear quite critical
to the study design. For example, Lapshinova-Koltunski’s paper makes one
wonder about the reliability of corpus-based findings when the chosen
operationalisation of the analytical categories is questionable. Is it
justifiable to define normalisation solely through the proportion of nominal
to verbal phrases? The author does remark that these definitions have
limitations. It seems to me, however, that the limitations are too severe to
talk of the global categories of translationese, and it would have been more
appropriate to talk of individual linguistic features instead. 

Finally, the quick production process, which brings the articles to the reader
in double time, resulted in some language and formatting issues. Nevertheless,
they do not affect readability or understanding in any important way. On the
whole, “New directions in corpus-based translation studies” is an excellent
publication that gives us a window into the ongoing research in CBTS and
undoubtedly deserves the attention of translation scholars, among them those
interested in literary translation, machine translation, legal translation,
and corpus design. The book can also serve as supplementary reading for
courses in translation studies to bring the students up to date on the state
of translation research; however, they would need to refer to a simpler text
to familiarize themselves with the basics.

REFERENCES

Baker, Paul, Costas Gabrielatos & Tony McEnery. 2013. Discourse analysis and
media attitudes. The representation of Islam in the British press. Cambridge:
Cambridge UP.

Chesterman, Andrew. 2004. „Hypotheses about translation universals.” In
Hansen, Gyde, Malmkjar, Kirsten & Daniel Gile (eds.), Claims, changes and
challenges in translation studies. Selected contributions from the EST
Congress, Copenhagen 2001, 1-13. Amsterdam: Benjamins.

D’haeyere, Laurence. 2010. Non-prototypical agents with proto-agent requiring
predicates: A corpus study of their translation from English into Dutch. Gent:
Hogeschool Gent.

Straniero Sergio, Francesco & Falbo, Caterina (eds.). 2012. Breaking ground in
corpus-based interpreting studies. Bern: Peter Lang.


ABOUT THE REVIEWER

Daria Dayter is a postdoctoral researcher at the University of Basel,
Switzerland. Her habilitation project is a corpus-based investigation of
simultaneous interpreting in the Russian-English language pair. Daria Dayter's
other research interests include pragmatics of CMC, youth language, and
teaching applications of the new media.





------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-27-989	
----------------------------------------------------------







More information about the LINGUIST mailing list