26.3511, Review: Computational Ling; Translation: Chan (2014)

Wed Aug 5 17:37:41 UTC 2015

LINGUIST List: Vol-26-3511. Wed Aug 05 2015. ISSN: 1069 - 4875.

Subject: 26.3511, Review: Computational Ling; Translation: Chan (2014)

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
              http://funddrive.linguistlist.org/donate/

Editor for this issue: Sara  Couture <sara at linguistlist.org>
================================================================

Date: Wed, 05 Aug 2015 13:37:24
From: Onna Nelson [onna.nelson at gmail.com]
Subject: Routledge Encyclopedia of Translation Technology

Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=36034117

Book announced at http://linguistlist.org/issues/25/25-4631.html

EDITOR: Sin-Wai  Chan
TITLE: Routledge Encyclopedia of Translation Technology
PUBLISHER: Routledge (Taylor and Francis)
YEAR: 2014

REVIEWER: Onna Adele Nelson, University of California, Santa Barbara

Review's Editor: Helen Aristar-Dry

SUMMARY

''The Routledge Encyclopedia of Translation Technology'' begins with a brief introduction to translation technology as it relates to computer science, pedagogy, and linguistics. Translation technology is defined broadly to include both machine translation and computer-aided translation, as well as skills and tools related to these. The editor, Chan Sin-wai, argues that there have been no major reference works published on the subject since 2004. Therefore, this first edition of ''The Routledge Encyclopedia of Translation Technology'' is a major contribution to the rapidly-growing and rapidly-changing field. As a general reference book, the target audience includes creators of translation technologies (computer scientists, computational/corpus linguists, and others) as well as users of translation technologies (international corporations, foreign language teachers, and others). 

Part I, ''General issues in translation technology'', covers the history of the field, defines key concepts, and introduces various approaches to machine translation (MT) and computer-aided translation (CAT). 

Chapter 1, ''The development of translation technology: 1967-2013'' by Chan Sin-wai, reviews the history of the field. The field began with the first conference on machine translation in 1952 and the first demonstration of Russian-to-English machine translation in 1954. By the 1960's, it had become clear that ''fully automatic high-quality machine translation'' was several decades from being realized, and research shifted towards computer-aided translation, which would be ''better, quicker, and cheaper'' (Sin-wai, 2015: 4). By the mid 1980's, the limitations of computer memory and storage space posed less of a problem for machine translation, and both commercial and academic researchers took up machine translation research again. As personal computers, word processing, and the Internet developed in the 1990's, machine translation went through a period of rapid growth. This growth expanded into more and more languages globally throughout the 2000's and continues to expand today.  

Chapter 2, ''Computer-aided translation: Major concepts'' by Chan Sin-wai, defines the key components of computer-aided translation systems. These include ''simulativity,'' ''emulativity,'' ''productivity,'' ''compatibility,'' ''controllability,'' ''customizability,'' ''collaborativity''. This chapter makes use of several diagrams, flowcharts, tables, and other visuals to aid in defining these concepts.

Chapter 3, ''Computer-aided translation: Systems'' by Ignacio Garcia, describes computer-aided translation (CAT) systems as translation memory (TM) databases. These CAT systems match new words and phrases to previous translations, stored in memory as references, to suggest new translations. These matches can either be exact, verbatim matches or ''fuzzy'' matches, based on probabilities. The chapter then describes several practical features of CAT systems, such as quality assurance controls, features for specific projects such as localization, management of large databases, and integration with other applications such as websites or word processing software.

Chapter 4, ''Computer-aided translation: Translator training'' by Lynne Bowker, is targeted at translators and translation companies who may be interested in learning about the benefits of CAT. It discusses a variety of software options for translators and makes suggestions for training translators to use them. This chapter also considers some of the difficulties translators and translator trainers face when learning these technologies.

Chapter 5, ''Machine translation: General'' by Liu Qun and Zhang Xiaojun, reviews the history of MT again, starting in the 1950's through the 2000's. It then provides a brief overview of various approaches to MT, including the interlingua approach, rule-based machine translation, example-based machine translation, statistical machine translation, and hybrid systems. The chapter also briefly covers methods of translation quality and accuracy evaluation.

Chapter 6, ''Machine translation: History of research and applications'' by W. John Hutchins, again reviews the history of MT. Unlike previews chapters, this review focuses on the theoretical background behind research decisions and gives more technical details about each major accomplishment made in the field. This chapter also delves into the details of corpus-based translation and modern speech translation applications.

Chapters 7 through 11 provide brief overviews of five different approaches to machine translation: ''Example-based machine translation,'' ''Open-source machine translation technology,'' ''Pragmatics-based machine translation,'' ''Rule-based machine translation,'' and ''Statistical machine translation.'' Each approach has their own pros and cons, discussed in the chapter.

Chapter 12, ''Evaluation in machine translation and computer-aided translation'' by Kit Chunyu and Billy Wong Tak-ming, explores methods for quantifying the effectiveness of translation technology systems. It again begins with a brief definition of terms, a brief history of the field, and a brief overview of various systems. It then recommends general evaluative criteria to use when comparing two or more translation technology systems, such as error analysis and the use of an intelligibility scale by human or automatic judges.

Chapter 13, ''The teaching of machine translation: The Chinese University of Hong Kong as a case study'' by Cecilia Wong Shuk Man, begins with a history of the field before detailing a case study in the teaching of translation technology. The chapter outlines the curriculum for two courses, including a list of topics covered, an overview of learning outcomes, and hands-on activities for each. 

Part II, ''The national/regional developments of translation technology'', discusses the history of the field in China, Canada, Hong Kong, Japan, South Africa, the Netherlands, and the United States.

Chapters 14 through 23 each provide a brief historical overview of the field as it developed in various countries. Early history in particular varied widely from country to country, but since the invention of the Internet, collaboration has become more commonplace. Each country also faced unique challenges based on their native language(s) and the languages they chose to target.

Part III, ''Specific topics in translation technology,'' focuses more on academic and technical issues encountered by people working in the field, such as corpora selection and part-of-speech tagging. 

Chapter 24, ''Alignment'' by Lars Ahrenberg, discusses the process of aligning the words and sentences from one language with the words and sentences of another language. Languages rarely have a one-to-one mapping as there are differences in word order, affixes, and the number of words it takes to express any given concept. Whole sentences, words, and individual morphemes may need to be aligned for proper translation to take place. This chapter discusses some of the common notation used in alignment and several of the statistical methods, models, and other algorithms used when aligning texts.

Chapter 25, ''Bitext'' by Alan K Melby, Arle Lommel, and Lucia Morado Vazquez, discusses the use of bitext as a tool for translation. Similar to alignment or glossing, the translator takes chunks the size of words or phrases and places the original and the translation side-by-side to create a bilingual text, or ''bitext.'' Bitext is especially useful in computer-aided translation, as a human translator works with small chunks of language data with the help of machine translation tools. Bitext is also useful since it can be presented in a number of visual ways. This chapter includes appendices which demonstrate the use of bitext in localization systems.

Chapter 26, ''Computational lexicography'' by Zhang Yihua, describes the process of compiling digital bilingual dictionaries using computational methods. The chapter briefly describes some of the advantages of computer-aided lexicography, such as data storage, and then delves into the use of corpora and databases to generate dictionaries. 

Chapter 27, ''Concordancing'' by Federico Zenettin, describes the use of contextual information in machine translation. A concordance usually provides the language surrounding a given word or phrase. Computer-generated concordances help translators understand the kinds of syntactic and pragmatic patterns in which a word occurs. This chapter also briefly covers the use of regular expressions to aid in searching corpora for concordances. 

Chapter 28, ''Controlled language'' by Rolf Schwitter, describes the use of restricted language to aid in language learning and in translation. A controlled language is a simpler subset of a language from which more complex concepts can be built. For example, the controlled language Basic English ''consists of only 850 words'' (p. 451). One method of translation is to first translate the source language into its basic form, and then translate it into the basic form of the target language. This can ''limit lexical ambiguity'' (p. 453). The chapter goes on to discuss several examples of controlled languages and their various uses. 

Chapter 29, ''Corpus'' by Li Lan, discusses the history of corpora, which are bodies of texts. Machine-readable corpora became popular tools for linguists starting in the 1990's and today there are dozens of corpora ranging in size from a few hundred thousand words to billions of words. This chapter lists some of the larger corpora commonly used in various languages, then describes how they are used to inform translation technology, including both qualitative and quantitative research.

Chapter 30, ''Editing in translation technology'' by Christophe Declercq, looks into several aspects of editing. Machine translation and even computer-aided translation is not always perfect, and must typically go through a quality assurance review, as well as a proofreading and editing process. This chapter covers some of the practical aspects of using software to edit, proofread, and ensure quality. 

Chapter 31, ''Information retrieval and text mining'' by Kit Chunyu and Nie Jian-Yun, discusses how information is stored in databases and retrieved through a process called text mining. Large corpora, including the World Wide Web, are indexed and people can use search engines to submit queries to find information which has been indexed. Probabilistic models help these databases sort through all the data to find the information which the user is most likely to be searching for. This chapter discusses the mathematical formulae behind these models at length before discussing how these models can be improved via user feedback. The chapter then discusses how this data storage and retrieval is relevant in cross-language information retrieval, particularly as it relates to bilingual data, translation data, and even other applications such as DNA sequencing or medical data.

Chapter 32, ''Language codes and language tags'' by Sue Ellen Wright, discusses the codes and tags which tell a machine what language a particular text is written in. The chapter covers the history and usage of several different competing standards for language codes. 

Chapter 33, ''Localization'' by Keiran J. Dunne, covers the topic of converting digital content, such as user interfaces or websites, into versions which are appropriate for different regions of the world, including translating the language or dialect, considering cultural differences, and considering fonts necessary for various scripts, character sets, and glyphs. The localization process often involves a machine translation step. This chapter details the process of localization using special software, such as Scribble, which aides translators in converting a program from one language to another.

Chapter 34, ''Natural language processing'' by Olivia Kwong Oi Yee, discusses natural language processing (NLP), which uses human languages to aide in human-computer interaction, as with voice recognition, speech-to-text, and text-to-speech software. Machine translation is cited as one of the most well-known applications of NLP. This chapter discusses the importance of corpora as well as statistical and machine learning algorithms. It then demonstrates common tasks carried out by NLP researchers, such as pre-processing, word sense disambiguation, and alignment. The author then demonstrates how these tasks can be applied to machine translation. 

Chapter 35, ''Online translation'' by Federico Gaspari, discusses the role of translators on the web, as well as the role of the web in translation. This chapter takes a historical approach, describing developments from the early 1990's until the present, including resources like online dictionaries, social media, parallel corpora, and other databases. This chapter also describes some of the tools available online for translators such as the Google Translator Toolkit, as well as marketplaces for translators to find freelance work. Finally, the author discusses some of the most recent developments in online translation, such as massive collaborative and crowdsourced projects like Wikipedia.

Chapter 36, ''Part-of-speech tagging'' by Felipe Sánchez-Martínez, discusses some of the difficulties machine translation systems have with identifying the part-of-speech (PoS) of words, particularly unknown words. The chapter covers several popular approaches to automatic PoS-tagging, including hidden Markov models, maximum entropy models, and support vector machines. Automatic tagging is helpful for machine translation systems, since whether a word is used as a noun or a verb can change its definition and therefore its translation.

Chapter 37, ''Segmentation'' by Freddy Y. Y. Choi, describes the process of dividing a large body of text into smaller, ''topically coherent parts'' (p. 605), much like chapters in a book. Segmenting a text into smaller chunks based on their topics can improve the accuracy of automatic translation. Segmentation involves clustering texts into groups based on metrics such as semantic similarity, which itself requires normalization and other pre-processing.

Chapter 38, ''Speech translation'' by Lee Tan, is the process of taking speech input in one language and producing speech input in another language. The history of speech translation is covered in this chapter, followed by a discussion of the major issues in speech translation. Speech-to-speech translation combines speech recognition, machine translation, and speech synthesis, each of which has its own challenges. 

Chapter 39, ''Technological strides in subtitling'' by Jorge Díaz Cintas, discusses the multimedia nature of the internet and the importance of subtitling and translating audiovisual media such as television programs. This chapter provides a brief overview of subtitling software, including crowdsourced subtitling and translation efforts on the world wide web. The author suggests machine translation technology can assist with subtitling and translating andthe massive amount of data on websites like YouTube.

Chapters 40 through 42 all deal with various databases and management systems that aid creators of machine translation systems and users of computer-aided translation systems.

Lastly, ''The Routledge Encyclopedia of Translation Technology'' ends with an index of major key terms and concepts to help direct the reader to pages of interest.

EVALUATION

Overall, this is a good general reference book to have on hand in companies and academic departments that deal with translation, cross-linguistic communication, computational linguistics, or translation technologies. Part III in particular has the most practical material, although Parts I and II may be of interest to many researchers. However, it should be noted that each entry is a longer work, and this book would therefore not be ideal as a quick reference guide. 

With fifty different contributors, ''The Routledge Encyclopedia of Translation Technology'' provides a diverse set of perspectives on this growing field. However, because there are so many different voices, there is a lack of continuity across chapters. For example, although the book claims to be aimed at ''general readers who are interested in knowing, learning, and using new concepts and skills of translation technology'' (Sin-wai, 2015: xxviii), some of the chapters assume substantial familiarity with the field, including its acronyms and jargon. The chapters themselves are rather lengthy for a reference work, about 10 to 25 pages each, with extensive references and sometimes appendices as well. The book is overall more akin to an edited volume or collection of journal articles than an encyclopedia. The advantage of this is that each article could be read on its own without reference to the others. However, there is repeated information throughout the book. For example, several ch
 apters repeat general information about the history of the field, despite Chapter 1 being dedicated to this topic. In a collection of articles, this would be expected. However, one expects encyclopedia entries to be concise, focused works. 

The chapters in Part I are aimed at the layperson interested in the history, theory, and general applications of translation technology. These chapters help define key terms and frame the field of translation technology in terms of other related fields such as computational linguistics or translation studies. Part II is aimed at people interested in particular languages or countries, with a distinct emphasis on the history of the field as well as lists of resources relevant to specific locales. These chapters would be most useful for people working with particular languages or seeking to work on a particular localization project. Most of the chapters in Part III are aimed at people who are either actively creating or actively using machine translation technologies. Chapters 2, 3, 4, 30, 33, 35, 39, 40, 41, and 42 would be especially useful for translators who seek to work with technology. The remaining  chapters in Part III, along with chapters 7 through 11, would be especially usefu
 l for computational linguists or others who wish to create or improve these technologies.

ABOUT THE REVIEWER

Onna Nelson is a PhD Candidate at the University of California, Santa Barbara. She is interested in research that applies corpus linguistic methods to areas such as emerging phenomenon on social media, linguistic typology, and cognitive linguistics.

----------------------------------------------------------
LINGUIST List: Vol-26-3511	
----------------------------------------------------------