28.5267, Review: Cognitive Science; Discourse Analysis; Semantics; Syntax; Text/Corpus Linguistics: Pęzik, Waliński (2016)

The LINGUIST List linguist at listserv.linguistlist.org
Tue Dec 12 20:13:14 UTC 2017


LINGUIST List: Vol-28-5267. Tue Dec 12 2017. ISSN: 1069 - 4875.

Subject: 28.5267, Review: Cognitive Science; Discourse Analysis; Semantics; Syntax; Text/Corpus Linguistics: Pęzik, Waliński (2016)

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Clare Harshey <clare at linguistlist.org>
================================================================


Date: Tue, 12 Dec 2017 15:13:10
From: Franka Kermer [franka.kermer at uef.fi]
Subject: Language, Corpora and Cognition

 
Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=36312037


Book announced at http://linguistlist.org/issues/28/28-372.html

EDITOR: Piotr  Pęzik
EDITOR: Jacek Tadeusz  Waliński
TITLE: Language, Corpora and Cognition
SERIES TITLE: Łódź Studies in Language
PUBLISHER: Peter Lang AG
YEAR: 2016

REVIEWER: Franka Kermer, University of Eastern Finland

REVIEWS EDITOR: Helen Aristar-Dry

SUMMARY 

“Language, Corpora and Cognition”, edited by Piotr Pęzik and Jacek Tadeusz
Waliński, appears in Peter Lang’s prolific series “Łódź Studies in Language”,
edited by Barbara Lewandowska-Tomaszczyk and Łukasz Bogucki. “Language,
Corpora and Cognition” is the 51st publication in the series and contains a
collection of fifteen papers written by different authors. Most of the
contributions were presented during the 9th international conference on
“Practical Applications of Language Corpora” (PALC 2014), held at the
University of Łódź, Łódź, Poland, 20–22 November 2014, and three papers from
doctoral students who were invited at a later stage (Chapters 13–15). The
edited volume’s main focus is to explore how theoretical predications about
the relationship between language structure and cognition correspond to
findings from empirical linguistic data. The contributions in this book
attempt to evaluate various aspects of linguistic structure ranging from
syntax, semantics, and  morphology to phraseology with the tools and
methodologies related to corpus linguistics. 

Chapter 1: Gradience in cognitive scanning: participle modifiers in Polish and
English (Barbara Lewandowska-Tomaszczyk)

In the opening article, Lewandowska-Tomaszczyk attempts to explain the regular
patterns of pre- and post-nominal modifying participial constructions in
Polish and English on the basis of Langacker’s account of cognitive scanning
processes. She proposes that the occurrences of these regular patterns are
linked to the aspectual system of both languages, particularly to the
differences in the nature of the cognitive scanning process, which in Polish
is more of partial gradient nature, as compared to English where a more rigid
distinction between sequential and summary scanning is maintained. Taking
Langacker’s theoretical account on dynamicity and construal as the point of
departure of analysis and interpretation, Lewandowska-Tomaszczyk discusses
Polish and English speakers’ conceptualization of events described in terms of
atemporal relationships. The linguistic data extracted from two reference
corpora, the British National Corpus and the National Corpus of Polish, reveal
that postnominal present participle modifiers involve saliently marked
sequential scanning, whereas in the case of prenominal past participle
modifiers the scanning process possesses a lower gradient character. 

Chapter 2: Experimental applications of dependency-based phraseology
extraction (Piotr Pęzik)

This chapter reports on a study that tested the benefit of using a
dependency-based method of extracting phraseological units from large corpus
data. It is well-attested that prefabricated, conventionalised language chunks
play a central role in language reception and production. To explore the
nature and types of linguistic prefabrication, new techniques are needed to
extract phraseological units in large naturally-occurring data sets. In his
article, Pęzik aims to ascertain the benefit of the “dependency-based
phraseology extraction”; the method’s usefulness is approached by extracting
and aggregating phraseological units, analysing data from large reference
corpora as well as building Automatic Combinational Dictionaries from large
corpus data. The main goal of his study is to show that in order to detect
lexica-grammatical variability in prefabricated linguistic data, one first
needs to detect recurrent phraseological units in large corpora. The novelty
in this approach lies in the extraction of recurrent subtrees consisting of
more than two lexical items of a sentence dependency tree. 

Chapter 3: Computational distributional semantics and free associations: a
comparison of two word-similarity models in a study of synonyms and lexical
variants (Marcin Tatjewski, Mirosław Bańko, Adrianna Kucińska and Joanna
Rączaszek-Leonardi)

Tatjewski, Bańko, Kucińska and Rączaszek-Leonardi’s research focuses on the
comparison of two methods for measuring and evaluating word meaning
similarity. One method, which is at the heart of distributional semantics, is
known as Correlated Occurrence Analogue to Lexical Semantics, the other,
commonly used in the field of psychology, free association data provided by
informants. The evaluation of semantic proximity of word pairs, specifically
of lexical loans and native synonyms, in two languages, Polish and Czech, was
the main goal of this research. The results confirmed the authors’ hypothesis:
both methods yielded the same  results and were correlated on a statistically
significant level. This outcome implies that both computational semantic
analyses performed on large corpus data and experimental techniques are
equally suited for exploring the organisation of lexical semantic
representations at the cognitive level. 

Chapter 4: Grammars or corpora? Who should we trust? Empirical analysis of
morphological doubletism in Croatian (Dario Lečić) 

In this chapter, Lečić builds on previous research that investigates the
status quo and present-day usage of morphological doublets in Croatian.
Slavonic languages, such as Croatian, abound in examples of morphological
doubletism. Using data from three different sources, Lečić explores whether
two morphological variants in word stems and word endings exhibit the same
degree of conventionality, i.e. whether two competing forms have the same
status in the speaker’s mental grammar or not. Doubletism of stems in Croatian
appears in possessive pronouns and verbs, while doubletism of endings
encompasses singular form of masculine nouns, genitive plural of feminine
nouns and adjectives. Results of the comparison between corpus data and native
speakers’ questionnaire material showed a positive correlation between the
form’s frequency in the corpus and the acceptability rating by the native
speakers. The results rendered by the analysis of grammar reference works show
that their explanations cannot fully account for the richness of competing
variants in word stems and endings. 

Chapter 5: Figurative dimensions of health: a corpus-illustrated study
(Adamina Korwin-Szymanowska and Jacek Tadeusz Waliński)

Korwin-Szymanowska and Waliński report on a study which aims at mapping
conceptual metaphors of health taking into account that our conceptions of
health tend to be discussed figuratively in terms of our embodied physical
experience. Based on works on metaphorical representation in thought and
language, this study explores the figurative dimensions of health through the
lens of cognitive linguistics and corpus linguistics. Their research employs
two different reference corpora for English, the British National Corpus and
the Corpus of Contemporary American English, which were searched for all
lexical items that would specify the state of health. They found that the
dimension along the UP-DOWN and STRONG-WEAK scales appear to be prevailing
conceptual domains in the conceptualisation of health, which suggests that
peoples’ conceptual representations of health arise from embodied experience. 
  

Chapter 6: “Justice with an attitude?” – towards a corpus-based description of
evaluative phraseology in judicial discourse (Stanisław Goźdź-Roszkowski)

Goźdź-Roszkowski’s paper investigates the applicability of a corpus-based
phraseology perspective to identify and examine evaluative meanings in
judicial discourse. Specifically, this study brings together the descriptive
framework of local grammar with the methodological workbench of corpus
linguistics in order to explore the role of grammatical patterns in
expressions of attitudinal meanings. According to Goźdź-Roszkowski, studying
opinions and attitudes in expressions from the perspective of local grammar is
particularly fruitful for patterning and identifying words which share
evaluative meanings. The material employed for this study is drawn from the
highly domain-specific genre of the United States Supreme Court opinions. The
analysis revealed that judges have the tendency to employ certain linguistic
cues to signal their evaluation of arguments put forward by other legal
interactants. Furthermore, two grammatical patterns, v-link + ADJ + that
pattern (example The court is correct that many mental diseases…)and v-link +
ADJ + t-infinitive pattern (example It is quite wrong to invite state court
judges to discount…), were found to be a useful diagnostic to identify their
prototypical evaluative function.        

Chapter 7: Using time to express remoteness in space: A corpus-based study of
distance representations for motion medium in the National Corpus of Polish
(Jacek Tadeusz Waliński)

In Chapter Seven of this volume, Waliński examines the conceptions of
space-time relations in the semantic context of motion events from the
perspective of data obtained from the National Corpus of Polish. The author’s
assumption that the perception of space is inextricably connected to the
perception of time is tested by verifying how frequently spatial distance is
expressed in temporal terms. The domain of motion events, particularly the
semantic attribute of motion, the motion medium, are well suited for exploring
the interplay between temporal and spatial representations. Results indicate
that motion-framed distance is expressed both in spatial terms and temporal
terms by Polish speakers, with spatial representations being used more
frequently. This outcome is yet another testimony to previous work on the
mutual relationship between mental conceptions of space and time. 

Chapter 8: Avenues for Research on Informal Spoken Czech Based on Available
Corpora (Petra Klimešová, Zuzana Komrsková, Marie Kopřivová and David Lukeš)

In their study, Klimešová, Komrsková, Kopřivová and Lukeš attempt to show how
spoken corpora can be utilised for addressing a broad range of research topics
and revising prior findings based primarily on written discourse. To this end,
Klimešová et al. explore linguistic cues typical of spontaneous spoken
language on the one hand, and compare these distinct features with features
used in other discourse types, namely formal spoken and written discourse, on
the other. The data used to demonstrate these features are from corpora on
casual spoken communication in Czech. The results confirm the authors’
hypothesis in that certain lexical fillers, phonetic variants and grammatical
phenomena are inherent to casual spoken language. Furthermore, their results
confirm the relevance of employing corpora of informal spoken language as a
source of data as they facilitate the systematic study of a wide range of
discursive, sociolinguistic and linguistic features.  

Chapter 9: Introducing a corpus of non-native Czech with automatic annotation
(Alexandr Rosen)

Rosen discusses the need for, and use of, automated annotation tools
originally developed for native Czech for annotating a corpus of texts written
by non-native learners of Czech. The growing number of learner corpora has led
to a shift from annotating corpora manually to developing automated annotation
methods and tools targeting non-native language. Common annotation tools for
native language include taggers, lemmatizers, and spelling and grammar
checkers. The author introduces a corpus consisting of a collection of
transcribed essays written by students of Czech between 2009–2011. The
analysis shows, for example, that the results computed by the spelling and
grammar checker, Korektor, were sufficiently high to justify the use of this
tool in the annotation of non-native corpus data. The author concluded that
the use of automated annotation and tools, along with manual annotation of
non-standard language, could be complementary in achieving the best results.  
      

Chapter 10: Corpus-based Analysis of Czech Units Expressing Mental States and
Their Polish Equivalents
Identification of Meaning and Establishing Polish Equivalents Referring to
Different Theories (Elżbieta Kaczmarska)

Kaczmarska’s study focuses on polysemous mental state verbs in Czech and the
extent to which different linguistic theories can predict equivalents of these
verbs in Polish. The main objective of this study is to build an effective
algorithm for the selection of equivalents by applying methods of various
linguistic approaches. The pairs of equivalents are drawn from the parallel
corpus InterCorp. The research showed that case grammar and cognitive grammar
do not offer effective tools to predict pairs of equivalents in the proposed
algorithm. However, the frameworks of pattern grammar and valence analysis
provided powerful and promising methods for analysing word combinations and
equivalents. The author’s overarching objective is to show that the proposed
algorithm can be utilised in machine translation tools in the future.

Chapter 11: Problem solving in English and Polish: A cognitive corpus-based
study of selected metaphorical conceptualizations (Marcin Trojszczak) 

In Chapter 11 of this edited volume, Trojszczak examines selected aspects of
metaphorical conceptualisations of problem solving shared between English and
Polish speakers. Specifically, this study sets out to address how aspects of
speakers’ linguistic expressions of problem solving give insight into the ways
in which problem solving as a mental activity is metaphorically conceptualised
in different languages. The material employed for this study is obtained from
the British National Corpus and the National Corpus of Polish. The author
approaches expressions related to the activity of problem solving from the
perspective of cognitive corpus-based semantics, which combines the
theoretical perspective of conceptual metaphor theory and the methodological
tools related to corpus linguistics. The analysis suggests that speakers of
English and Polish employ common underlying metaphorical expressions when
describing the activity of problem solving. Among the shared conceptual
metaphors of problem solving are ABSTRACT OBJECTS ARE PHYSICAL OBJECTS and
MENTAL ACTIVITY IS A PHYSICAL ACTIVITY. As the author rightly observes, the
results obtained in this study pave the way for researching parallelism in
metaphorical representations across other languages.  

Chapter 12: Corpus Linguistics for Critical Discourse Analysis. What can we do
better? (Victoria Kamasa)

In her paper, Kamasa critically reviews thirty research papers published
between 2002 and 2013 in which techniques related to corpus linguistics were
used for some form of critical discourse analysis. The main goal of this
review was to analyse the methods employed in the studies as well as to
pinpoint some of the shortcomings associated with corpus-assisted critical
discourse analysis which can help researchers to avoid methodological
pitfalls. Corpus linguistics was expected to address vital key points of
criticism of critical discourse analysis, such as the decontextualisation of
analysed texts or the pivotal role of the researcher’s intuition. The review
shows that some of these problems can be tackled by paying more attention to
the research design and statistical analysis. For example, larger sets of
texts and rigorous statistical analytical tools may contribute to addressing
the decontextualization problem, while the usage of frequency and statistical
scores can prevent the bias in corpus-supported critical discourse analysis. 

Chapter 13: Towards quantitative and qualitative characterisation of various
types of dialogue: interviews vs. Panel Discussions (Dorota Pierścińska)

Pierścińska’s doctoral research aims at exploring and characterising
quantitative parameters of two types of dialogue, interviews and panel
discussions, to specify the underlying perception of these two genres.
Pierścińska’s research employs two different reference corpora, which were
examined for frequent lexemes, keywords and 4-word clusters. The patterns
selected for the analysis were expected to serve particular functions in the
interviews and panel discussions, which in turn would serve as the basis of a
more general characterisation of the two genres. The results are in line with
the prediction put forward by the author demonstrating that interviews and
panel discussions are unlike each other in that interviews are more verbal and
spontaneous, while panel discussions are more grammaticalised, structured and
well-organised. 

Chapter 14: Standardisation in safety data sheets? A corpus-assisted study
into the problems of translating safety documents (Aleksandra Beata Makowska)

In Makowska’s paper, the objective of the study is to analyse material safety
data sheets to pinpoint possible shortcomings and challenges related to the
translation process and terminology used in these materials. The author
collected ninety-three safety sheets containing 720,000 words published in
three languages, English, Polish and German. The overall purpose of this
doctoral research is to provide a basis for a higher level of standardisation
in the process of translating safety data instructions, as well as to put
forward general descriptions related to the actual translation process that
would aid the translator. The author interprets the results of the study as
suggesting that experienced and qualified translators should be involved in
the translation process to avoid translation problems with terminology,
general language and meaning of the communicated message.      

Chapter 15: Lexical bundles in English medical texts (Monika Betyna)

In the last chapter, Betyna’s doctoral research attempts to describe the
discourse function and use of frequently-used word combinations, i.e., lexical
bundles, in a topic-oriented corpus of medical texts. The main objective of
the investigation is to create an inventory of the most frequent lexical
bundles in medical texts and uncover their structural and functional
properties. The research material embraces a corpus of one hundred online
articles concerning a highly specific topic in medicine. The analysis revealed
that words such as ‘ulcers’ and ‘diabetic’ occurred frequently in the texts.
Furthermore, the lexical bundle ‘oxygen therapy’ made up 18% of the most
frequent lexical bundles and phraseological units in the corpus. The author
rightly states that the usage of such vocabulary indicates that these texts
are written for a very specific group of experts who are familiar with the use
of a highly specific register. 

EVALUATION

“Language, Corpora and Cognition” is a straightforward account of current
practices and approaches to studying the link between linguistic phenomena and
conceptual representation with the help of corpus linguistics methodology. The
edited volume brings together a variety of papers from diverse corpus
linguistic methodologies on various aspects of cognitive science and cognitive
linguistics. As the introduction states, it is a valuable contribution and
step forward in the understanding of how the use of empirical data can inform
theoretical predictions about the relationship between language and cognition.
The original studies and varied topics discussed rich quantitative data will
capture the interest of university professionals and students alike. 

The volume is well-organised and well-edited, and almost every chapter
provides goals, theoretical context, methods, and results in a straightforward
manner. The articles deal with  data from corpora on spoken and written
English, Polish, Czech, Croatian and German. The articles are well-written and
concise, with the focus on the most central aspects of the respective topic. 

The book has, however, some weaknesses. The first minor criticism concerns the
introductory chapter written by the editors, which could have outlined and
justified the limits of the books, referred to other contributions to the
field and provided more context. Another minor drawback concerns some
articles’ background, which presents effective if rather densely written
overviews of the respective topic and theoretical frameworks (e.g., Chapter
One). While these accounts show the expertise of the authors, it seems that
for some readers, especially novice researchers, these passages may be
slightly heavy. Lastly, it should be noted that nine out of fifteen
contributions deal with data from corpora on Slavic languages. As it is the
broad spectrum of linguistic phenomena discussed in this volume which makes
this book a valuable contribution, contributions discussing corpus data from
other languages would have been appreciated. These limitations, however, do
not diminish the relevance, validity and value of the book.


ABOUT THE REVIEWER

Franka Kermer received her Ph.D. in English Language and Culture from the
University of Eastern Finland in 2015 with a thesis entitled A Cognitive
Grammar Approach to Tense and Aspect Teaching in the L2 Context. Her research
interests are primarily concerned with cognitive linguistics, particularly
cognitive grammar, and second language acquisition. Her current post-doctoral
research focuses on cross-linguistic differences and influence from the
perspective of cognitive grammar and cognitive sociolinguistics.





------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-28-5267	
----------------------------------------------------------






More information about the LINGUIST mailing list