16.1724, Review: Corpus Ling: Barnbrook et al. (2004)

Tue May 31 21:07:52 UTC 2005

LINGUIST List: Vol-16-1724. Tue May 31 2005. ISSN: 1068 - 4875.

Subject: 16.1724, Review: Corpus Ling: Barnbrook et al. (2004)

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Naomi Ogasawara <naomi at linguistlist.org>
================================================================  

What follows is a review or discussion note contributed to our 
Book Discussion Forum. We expect discussions to be informal and 
interactive; and the author of the book discussed is cordially 
invited to join in. If you are interested in leading a book 
discussion, look for books announced on LINGUIST as "available 
for review." Then contact Sheila Dooley at collberg at linguistlist.org. 

===========================Directory==============================  

1)
Date: 27-May-2005
From: Mikhail Mikhailov < Mihail.Mihailov at uta.fi >
Subject: Meaningful Texts 

-------------------------Message 1 ---------------------------------- 
Date: Tue, 31 May 2005 16:51:16
From: Mikhail Mikhailov < Mihail.Mihailov at uta.fi >
Subject: Meaningful Texts 

EDITORS: Barnbrook, Geoff; Danielsson, Pernilla; Mahlberg, Michaela
TITLE: Meaningful Texts
SUBTITLE: The Extraction of Semantic Information from Monolingual 
and Multilingual Corpora
SERIES: Corpus and Discourse
PUBLISHER: Continuum International Publishing Group Ltd
YEAR: 2004
Announced at http://linguistlist.org/issues/15/15-3578.html

Mikhail Mikhailov, School of Modern Languages and Translation 
Studies, University of Tampere, Finland

[This review contains ISO-8859-2 (Latin 2) and Cyrillic characters, and 
is best viewed using Unicode encoding.  -- Eds.]

DESCRIPTION/SUMMARY

This volume is an edited collection of papers on corpus and corpus-
based linguistics. Many of the papers were originally presented at the 
5th and the 6th TELRI seminars held in Ljubljana, Slovenia (2000) and 
Bansko, Bulgaria (2001). The papers present research in different 
language material, the discussed topics vary considerably, and 
different approaches are used. All in all, there are 21 papers in the 
volume plus Introduction. The book is divided into two parts: part I is 
devoted to monolingual corpora and part II is dealing with multilingual 
corpora. 

PART I. MONOLINGUAL CORPORA

1. Extracting concepts from dynamic legislative text collections (Gaël 
Dias, Sara Madeira, and José Gabriel Pereira Lopes), pp. 5-16.
In this paper the problems of automated extraction of multiword terms 
from legal texts are discussed. The software developed by the authors 
of the paper is used for processing a dynamic collection of raw texts in 
Portuguese. The SENTA (Software for the Extraction of N-ary Textual 
Associations) module extracts multiword combinations which are likely 
to be terms. Both contiguous and non-contiguous terms can be 
extracted. The basic principles used are similar to most research in the 
field: the observed frequency of re-occurence of the elements of the 
string is compared with those statistically expected. A web-based 
interface of the module has been developed.

2. A diachronic genre corpus: problems and findings from the 
DIALAYMED-Corpus (DIAchronic Multilingual Corpus of LAYman-
oriented MEDical texts) (Eva Martha Eckkrammer), pp. 17-30.
The paper is concerned with the issues of compiling diachronic 
corpora. The necessity to include texts from different chronological 
periods presents many difficulties for the compiler: 1) genuine orality 
not available for the early periods, 2) problems in texts' classification 
and sampling, 3) lack of texts of certain genres from the certain 
periods. The corpus presented in the paper is the DIALAYMED, a 
multilingual diachronic corpus of medical information texts (self-
counseling texts). The corpus comprises seven languages (Spanish, 
French, Italian, Portuguese, German, English) and is divided into 
seven periods, from Late Middle Ages to 21st century. The 
DIALAYMED can be used both for study of changes inside one of the 
languages of the corpus and for cross-cultural research.

3. Word meaning in dictionaries, corpora and the speaker's mind 
(Christiane Fellbaum with Lauren Delfs, Susanne Wolff and Martha 
Palmer), pp. 31-38.
The authors of the paper point out the importance of merging 
dictionaries and text corpora. Semantic tagging based on dictionary 
definitions is one of possible solutions of the problem. It is clear that 
manual semantic annotation of a large text corpus is an enormously 
difficult and expensive task, and automated semantic tagging is much 
needed. Nevertheless, it is important to study first the results produced 
by human tagging. It has been found, that there is a rather high rate of 
disagreement between different human annotators. Most probably, the 
reason lies in traditional dictionary models of meaning representation. 
The authors claim it is not likely that human or machine annotators can 
successfully perform one-to-one mapping of word senses in 
polysemous words. They suggest it would be more reasonable to aim 
at selecting clusters of senses or broader senses for the contexts 
where the meaning is unclear. 

4. Extracting meaning from text (Gregory Grefenstette), pp. 38-47.
There are two approaches to automated extraction of meaning from 
text: the first one targeting at imitating of human understanding, the 
second one based on statistical methods without referring to 
knowledge representation. In this paper the second approach is 
considered. Automated analysis of word lists and text structure can 
provide user with answers to various important questions: What kind of 
text is this? What other texts are like this? What is the text about? How 
good is the text? Annotated corpora can supply linguists with data on 
morphology and syntax.

5. Translators at work: a case study of electronic tools used by 
translators in industry (Riitta Jääskeläinen and Anna Mauranen), pp. 
48-53.
In this paper the use of software tools by Finnish translators is 
reviewed. The study was a part of international project SPIRIT 
(Supporting Peripheral Industries with Realistic Applications of Internet-
based Technology). It was found that most translators use basic tools 
like electronic dictionaries and the Internet. Terminology management 
software, translation memories, and corpus tools were virtually 
unknown to them. The experiment with a group of in-house translators 
showed that people of this profession are rather conservative and 
prefer familiar software. Jääskeläinen and Mauranen suggest that 
there should be more cooperation between the developers of software 
for translators and the end-users.

6. Extracting meteorological contexts from the newspaper corpus of 
Slovenian (Primo? Jakopin), pp. 54-61.
Jakopin presents in his paper methods of identification of weather 
forecasts in a corpus of newspaper texts and extracting significant 
data from such contexts. It was relatively easy to automatically extract 
texts of weather forecasts from the corpus because of their fixed 
length and standard headings. The quantitative study of the 
meteorological texts shows that quite few lexemes appear mostly in 
weather forecasts and not in the other texts of the corpus. However, 
word bigrams prove to be much more interesting. There were 
extracted eight two-word terms, which occur in meteorological texts 
with a probability rate over 99 percent. Finally, there are clichéd 
sentences in weather forecasts that have rather high frequency and 
occur mostly in meteorological texts.

7. The Hungarian possibility suffix "-hat/-het" as a dictionary entry 
(Ferenc Kiefer), pp. 62-69.
In this paper the problems of lexicographical description of modal 
words are discussed, the Hungarian possibility suffix as an example. 
Kiefer demonstrates that the entry from a traditional dictionary is 
inadequate both theoretically and descriptively. The small text corpus 
gives much more information about the use of possibility suffix and the 
kinds of possibility it expresses. Still, the author argues that a good 
entry may be based exclusively on corpus material; a good theory is 
needed for interpreting usage examples. In case of Hungarian suffix, 
the clear distinction between different kinds of modality (epistemic, 
deontic, circumstantial, boulomaic, and dispositional) and 
distinguishing semantic and pragmatic function would help to develop 
a more consistent lexicographical entry.

8. Dictionaries, corpora and word-formation (Simon Krek, Vojko 
Gorjanc and Marko Stabej), pp. 70-82.
Like the previous, this paper is concerned with the issues of 
lexicographical description. The authors study the principles of 
presenting English adverbs derived from adjectives and ending with -
ly. The data from the dictionaries was compared to that of BNC and 
Google search engine. Many -ly adverbs are registered as run-on 
entries, although derivatives not always take all the meanings of the 
primitives. Sometimes high-frequency derivatives are promoted to 
headwords. Compilers of bilingual dictionaries often have to seek 
different solutions because of the necessity to provide translation 
equivalents, which often demonstrates differences in meaning 
between the two words. The authors suggest that corpora and the 
Internet can give lexicographers additional data, which would help 
evaluating importance of lexical items and taking decisions on their 
status in the dictionary.

9. Hidden culture: using the British National Corpus with language 
learners to investigate collocational behaviour, wordplay and culture-
specific references (Dominic Stewart), pp. 83-95.
Stewart shows the ways of using corpora in language learning as a 
source of culture-specific information. It is fairly difficult to obtain some 
kinds of information using traditional dictionaries and encyclopedias. It 
is particularly difficult to find a clue to a wordplay, when some elements 
of an idiom are replaced by other words (e.g. special queue <= special 
brew). Stewart suggests looking up re-occurring collocates from the 
corpora. The method seems to work well even when retained elements 
are high frequency words (like in the example above). The use of 
corpora makes it possible even to look up idioms by structural patterns.

10. Language as an economic factor: the importance of terminology 
(Wolfgang Teubert), pp. 96-106.
Teubert focuses on importance of terminology in the modern world. 
Standardization of terminology is very important for development of 
technologies; many projects are carried out in international teams. 
Although English is used more and more as lingua franca, developing 
of national scientific discourse and national terminologies remains a 
part of technological progress. Therefore, there remains a great need 
for updating multilingual terminological banks and collecting 
multilingual text corpora. Special attention should be paid to 'soft 
terminology', i.e. new terms, which have already become part of 
discourse but are not yet standardized. Developing of knowledge 
extraction technologies would help to 'filter out' and create lists of such 
terms.

11. Lemmatization and collocational analysis of Lithuanian nouns 
(Andrius Utka), pp. 107-114.
In this paper, issues of lemmatization are discussed. Lemmatization is 
on the one hand a very useful procedure for bringing together all word 
forms of a lexeme; on the other hand it is sometimes criticized because 
important information on individual constituents of the lemma becomes 
unavailable. The Lithuanian language is heavily inflected, which makes 
the use of a lemmatized text corpus much more convenient. 
Nevertheless, the researcher should not forget about the different 
forms of the word and their usage. A case study of the word "teisyb?" 
('truth') demonstrates that different forms have different frequencies 
and different collocations. Thus, the analysis of the lemma only gives a 
generalized profile, while studying of each separate form would give 
more precise information on the usage of the word.

12. Challenging the native-speaker norm: a corpus-driven analysis of 
scientific usage (Geoffrey Williams), pp. 115-130.
Williams centers on the problem of non-native-speaker English. More 
and more researchers, whose native language is not English, submit 
their papers in English, while the proportion of native speakers of 
English is declining. The situation in technical writing is, probably, most 
difficult. A case study on use of relative pronouns "which/that" in a 
corpus of plant biology research articles has shown that in many cases 
an avoidance strategy is chosen, e.g. the writers tend to use simple 
constructions avoiding relative clauses. Williams emphasizes the 
importance of compiling specialized corpora as well as learner 
corpora, which would help to improve the level of technical writing.

PART II. MULTILINGUAL CORPORA
13. Chinese-English translation database: extracting units of 
translation from parallel texts (Chang Baobao, Pernilla Danielsson and 
Wolfgang Teubert), pp. 131-142.
This paper examines methods of extracting translation equivalents 
from parallel texts. The research is carried out on Chinese-English 
parallel corpus of about 17 million running words per language. 
Translation correspondences detected by software should be 
unambiguous. That is why the authors suggest that the best solution is 
to seek correspondences between multiword units. So, the texts of the 
corpus are chunked into multiword units. Both chunking and search for 
equivalents is done using statistical techniques. Four different 
statistical scores were tested (MI, Dice, log likelihood, chi-2) and it was 
found out that LL and chi-2 achieved better accuracy than the other 
two coefficients. Precision and recall of the software was improved by 
checking syntax patterns.

14. Abstract noun collocations: their nature in a parallel English-Czech 
corpus (Frantisek Cermák), pp. 143-153.
Cermák shows in his article, that there exist differences in functioning 
between abstract and concrete nouns. A contrastive analysis of 
abstract nouns in Orwell's "1984" and its Czech translation has been 
performed. It was found that there is no direct correspondence 
between items of source and target texts. Verbs can be translated as 
nouns, nouns as adjectives, etc. The study of verbal collocational 
patterns of ACTION, EMOTION and LANGUAGE abstract nouns 
demonstrates the following tendencies: 1) inchoative verbs were the 
most typical collocations for all three groups of abstract nouns, 2) 
terminative verbal collocations were the least typical, and the 
LANGUAGE nouns seem to avoid terminative phase, 3) the study 
revealed certain asymmetry between English and Czech noun 
collocations.

15. Parallel corpora and translation studies: old questions, new 
perspectives? Reporting "that" in Gepcolt: a case study (Dorothy 
Kenny), pp. 154-165.
Comparable corpora are used extensively nowadays in translation 
studies, the main issue of current research projects is language of 
translations in contrast to authentic language (see e.g. Baker 1993, 
Laviosa 1998, cf. Mauranen and Jantunen 2005). Kenny shows in her 
paper that it is difficult to explain findings from comparable corpora 
using only texts of translations without comparing them to original 
texts. That is why parallel corpora should be used together with 
comparable ones. The cross-language comparison of the use of 
optional German connective "dass" and optional English 
connective "that" in German-English parallel corpus of literary texts 
(Gepcolt) demonstrates that it is difficult to claim direct influence of 
source text on the language of translation. 

16. Structural derivation and meaning extraction: a comparative study 
of French/Serbo-Croatian parallel texts (Cvetana Krstev and Dusko 
Vitas), pp. 166-178.
This article shows the importance of structural derivation in Serbo-
Croatian language and the necessity of taking it into account in 
linguistic software applications. The use of traditional lemmatization 
(only different forms of the same lexeme) narrows results of the 
search. The authors of the article suggest expanding inflective classes 
of nouns so that various kinds of structural derivation (diminuatives, 
augmentatives, feminine forms, possessive adjectives, etc) are also 
included into augmented entries. This improves search results in 
parallel corpora and makes search for translation equivalents more 
accurate.

17. Noun collocations from a multilingual perspective (Ruta 
Marcinkeviciene), pp. 179-187.
The topic of the paper is close to that of Cermák's paper in this 
volume. Marcinkeviciene studies a parallel concordance of the English 
noun "memory" in Orwell's "1984" and six translations of the novel. A 
special interest is paid to the verbal collocations of "memory" and its 
equivalents in other languages. The research demonstrates that 
translators in most cases preserve collocational patterns of the target 
language rather than try to keep collocations of the source language 
in translation.

18. Studies of English-Latvian legal texts for Machine Translation 
(Inguna Skadina), pp. 188-195.
The paper deals with studying ambiguous words in parallel corpora. 
The aim of the research is to find the methods of improving the quality 
of machine translation. A study of parallel contexts for several Latvian 
words provided new translation equivalents not registered in the 
dictionaries, some of the equivalents appeared to be quite frequently 
used. The author suggests that parallel corpora of specialized texts 
are very valuable source of data for terminology databases and 
machine translation systems. The corpus-based approach is also one 
of the ways to improve the quality of printed dictionaries as well.

19. The applicability of lemmatization in translation equivalents 
detection (Marko Tadic, Sanja Fulgosi and Kresimir Sojat), pp. 196-
207.
In this paper, the process of automated extraction translation 
equivalents from Croatian-English parallel corpus is outlined. The 
current version of the software is based exclusively on statistical 
methods (pointwise mutual information), however the use of linguistic 
filters is also planned on the later stages. The algorithm extracts one-
to-one equivalent pairs, generation of other kinds of equivalent pairs 
(1-2, 2-1, 2-2, ...) is also possible. Still, the problem of very large 
number of combinations (combinatorial explosion) is to be solved. The 
algorithm was tested on both non-lemmatized and lemmatized 
material. The hypothesis that search for translation equivalents on 
lemmatized texts is more effective for inflected languages like Croatian 
was confirmed.

20. Cognates: free rides, false friends or stylistic devices? A corpus-
based comparative study (Spela Vintar and Silvia Hansen-Schirra), pp. 
208-221.
Vintar and Hansen-Schirra study cognate words (like EN "sport" vs. 
GE "Sport") in English-German and English-Slovene parallel corpora. 
The research demonstrated that percentage of cognates in Slovene 
translations from English is quite close to that in translations from 
English into German. However, the comparison with texts originally 
written in German and Slovene shows that percentage of cognates in 
Slovene translated texts is slightly lower than in original Slovene texts, 
while in German translations there twice more cognates than in 
original German texts. The phenomenon is most likely caused by purist 
tendencies in Slovene, a language of only two million speakers, and 
openness of the German language to linguistic influences. The 
comparison of frequencies of cognates and 'native' synonyms in 
Slovene and German reference corpora confirm the hypothesis.

21. Trilingual corpus and its use for the teaching of reading 
comprehension in French (Xu Xunfeng and Régis Kawecki), pp. 222-
228.
The paper examines the possibilities of use of parallel corpora in 
language teaching. An online English-French-Chinese parallel corpus 
was used in reading comprehension teaching. An experiment with 
three groups of students in Hong Kong showed that reading 
comprehension skills of the test group improved significantly after six 
weeks of reading trilingual texts on the Web. It is planned to further 
develop this learning tool: comprehension test and online 
concordancer will be added.

CRITICAL EVALUATION

The issues discussed in this volume have received a great deal of 
attention in research of the past decade. A strong side of the book is 
that it includes works of different scholars working with different 
languages. Actually, the publications of this book deal with twelve 
languages. The papers of the volume are fairly short and most of them 
present results of case studies. However, the articles are interesting to 
read and methods introduced are applicable to different linguistic 
phenomena. I read with special interest the papers by Christiane 
Fellbaum et al. (3), Dominic Stewart (9), Chang Baobao et al. (13), 
Dorothy Kenny (15), Cvetana Krstev and Dusko Vitas (16), Ruta 
Marcinkeviciene (17), Spela Vintar and Silvia Hansen-Schirra (20).

GENERAL COMMENTARY

1. I understand that it is extremely difficult to put together these very 
diverse contributions under the same title. However, the title of the 
volume is rather misleading. One would expect this to be a collection 
of papers on automated text processing, semantic tagging, 
disambiguation, translation memories, etc. Unfortunately, only half of 
the articles can be considered as studying the issues of "The 
Extraction of Semantic Information from Monolingual and Multilingual 
Corpora". Many articles are rather loosely related to the subject (e.g. 
chapters 2, 5, 7, 8). The title "Meaningful texts" without the subtitle 
would have been ambiguous enough to cover all the publications of 
the volume.

2. The division of the volume into two parts does not seem to work 
well. The editors themselves admit that there are the same questions 
discussed in both 'monolingual' and 'multilingual' parts, e.g. 
lemmatization, noun collocations (p. 1). Furthermore, the DIALAYMED 
corpus, presented in the paper by Eckkrammer is a multilingual 
corpus, and I don't quite understand why the article is placed in the 
first part of the book. The paper by Wolfgang Teubert is not devoted 
exclusively to monolingual corpora either. 

3. The order in which the papers are arranged leaves impression of a 
random order. Both Cermák (14) and Marcinkeviciene (17) are 
studying noun collocations. Chang Baobao et al (13), Skadina (18), 
and Tadic et al (19) discuss translation equivalents in parallel corpora. 
Why the editors did not place the papers dealing with the close 
problems one after another? After studying table of contents once 
again I realized that the order is alphabetical. Anyway, the best 
solution would be to arrange ALL the papers of the book in 
alphabetical order without dividing it into two parts.

4. Some papers of the volume reference each other (Marcinkeviciene 
=> Cermák) but it does not seem that cross-referencing is carried out 
persistently.

COMMENTARY ON SPECIFIC PAPERS

Gaël Dias, Sara Madeira, and José Gabriel Pereira Lopes:
It is not quite clear how effective the method is and what percentage of 
noise the software produces.

Andrius Utka:
The facts about different frequencies and different collocations of 
different forms of the same word are very interesting and important. 
However, I do not understand why one should give up lemmatization 
just because of that. To my mind, the researcher should combine the 
study of lexeme and its different forms. Besides, lemmatization and 
tagging would help to filter out homonymous forms. Thus, study of raw 
text would be a step back, it is better to improve the mark-up of the 
corpora.

Chang Baobao, Pernilla Danielsson and Wolfgang Teubert:
"Translation Equivalent Pair (TEP): a Translation Equivalent Pair is 
composed of both a source-language Translation Unit and a target-
language Translation Unit, which are mutual translations" (p. 133). I 
am not quite sure that bidirectional equivalence is common even in 
terminology. If the TEPs are extracted from Chinese-English corpus, 
they will be Chinese-English TEPs, not English-Chinese as well. For 
obtaining English-Chinese TEPs one would need English-Chinese 
parallel corpus. I am pretty sure the lists of TEP's obtained from 
English-Chinese and Chinese-English corpora would be different. In 
this respect, the idea of existence of '_mutual_ translations' is rather 
misleading and simplistic.

Cvetana Krstev and Dusko Vitas:
I completely agree with the authors that augmenting of derivatives into 
the dictionary entry is very important for automated text analysis of 
Slavonic languages. However, only noun derivatives are discussed, it 
would be interesting at least to mention verbal and adjectival 
derivation as well (e.g. verbal aspect pairs in Russian present a very 
serious problem for word alignment). Besides, although the problem of 
word alignment for "baron" is solved very elegantly in the paper, it 
would be interesting to discuss the possibilities of word-suffix 
alignment as well. Sometimes diminutives and other derivational 
suffixes may have explicit correspondences in translation, sometimes 
they have to be ignored, e.g. the Russian diminutive 
noun "berezka" 'little birch' can be translated into English 
as "birch", "pretty birch" or "little birch".

Marko Tadic, Sanja Fulgosi and Kresimir Sojat:
The method of extraction of translation equivalents introduced in the 
paper seems to generate on the first stage many 'impossible' pairs 
like 'article-verb' or 'conjunction-noun', which are of course filtered out 
on later stages but still slow down the process considerably. 
Generating 'reasonable' translation pairs from the very beginning by 
employing linguistic filters would help to avoid combinatorial explosion. 
Use of stopwords would be an easy and robust solution.

Spela Vintar and Silvia Hansen-Schirra:
The principles of automated search of cognates formulated in the 
paper look rather simplified, differences in orthographic traditions 
should be taken into account (see e.g. Tiedeman 2003: 50-51).
"According to Baker (1996), translations should be longer than 
originally produced texts in the target language or in the source 
language. The evidence for this tendency may, for example, be found 
in the text length (number of words of the individual texts)" (p. 212). I 
think the idea was originally formulated by Nida and Taber (Nida & 
Taber 1974: 163). Still, it is not quite clear, how one can compare 
lengths of source and target texts that are written in different 
languages. E.g. there are fewer words in translations from English into 
Russian (because there are no articles in Russian); translations from 
English and Russian into Finnish (no articles and few prepositions plus 
composite words in Finnish) also tend to be 'shorter'. Character counts 
also can be misleading, because words lengths differ from language to 
language. Thus, although the heuristic seems to be quite reasonable, 
it is not possible to prove the explicitation tendency simply by 
comparing word or character counts of source and target texts 
(Mikhailov 2003: 165-174).

Xu Xunfeng and Régis Kawecki:
It would be interesting to know what kind of teaching methods were 
used. Or was it just reading parallel texts?

Finally, I noticed some misprints in the volume, e.g. Russian examples 
in the paper by Marcinkeviciene (pp. 180-185). [Examples omitted; see
http://cf.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/pubs/reviews/get-review.cfm?SubID=54914]

To sum up, the book can be recommended for those who are 
interested in corpus-based linguistics and corpus-based translation 
studies, especially if their research is concerned with Slavonic or Baltic 
languages.

REFERENCES

Baker, Mona (1993) Corpora in Translation Studies: An Overview and 
Some Suggestions for Future Research. Target 7(2): 223-43.

Laviosa, Sara (1998) The English Comparable Corpus: A Resource 
and a Methodology, in Bowker, Lynne, Cronin, Michael et al (eds.) 
Unity in Diversity? Current Trends in Translation Studies. Manchester: 
St. Jerome.

Mikhailov, Mikhail (2003) Parallel'nye korpusa xudozhestvennyx 
tekstov: principy sostavlenija i vozmozhnosti primenenija v 
lingvisticheskix i perevodovedcheskix issledovanijax (Parallel corpora 
of literary texts: principles of compilation and use in linguistics and 
translation studies, in Russian) Acta Universitatis Tamperensis, 956. 
Acta Electronica Universitatis Tamperensis, 280. University of 
Tampere 2003.

Mauranen, Anna & Jarmo Jantunen, eds. (2005) Käännössuomeksi. 
Tutkimuksia suomennosten kielestä. Tampere University Press.

Nida E. A. & Taber C .R. (1974) The Theory and Practice of 
Translation. Leiden: E.J. Brill.

Tiedemann, Jörg (2003) Recycling translations: Extraction of lexical 
data from parallel corpora and their application in natural language 
processing. Uppsala: Acta Universitatis Upsaliensis. 
http://publications.uu.se/theses/abstract.xsql?dbid=3791. 

ABOUT THE REVIEWER

Mikhail Mikhailov is a senior lecturer at the School of Modern 
Languages and Translation Studies, University of Tampere, Finland. 
His main research interests lie in parallel corpora and corpus-based 
translation studies. He is currently working on methods of studying 
Russian-Finnish parallel texts.

-----------------------------------------------------------
LINGUIST List: Vol-16-1724