35.2647, Review: Python Programming for Linguistics and Digital Humanities: Weisser (2024)
The LINGUIST List
linguist at listserv.linguistlist.org
Mon Sep 30 21:05:07 UTC 2024
LINGUIST List: Vol-35-2647. Mon Sep 30 2024. ISSN: 1069 - 4875.
Subject: 35.2647, Review: Python Programming for Linguistics and Digital Humanities: Weisser (2024)
Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Editor for this issue: Joel Jenkins <joel at linguistlist.org>
================================================================
Date: 01-Oct-2024
From: David Karaj [davidmkaraj at gmail.com]
Subject: Computational Linguistics: Weisser (2024)
Book announced at https://linguistlist.org/issues/35.901
AUTHOR: Martin Weisser
TITLE: Python Programming for Linguistics and Digital Humanities
SUBTITLE: Applications for Text-Focused Fields
PUBLISHER: Wiley
YEAR: 2024
REVIEWER: David Karaj
SUMMARY
Python Programming for Linguistics and Digital Humanities:
Applications for Text-Focused Fields by Martin Weisser is an
introduction to Python programming written with students of
linguistics, digital humanities and social sciences in mind, that is
for students with no background in computer science. The main purpose
of the book is to introduce the basic concepts of programming and the
programming language Python to those with no prior experience in
programming in order to equip them with the skills essential to the
processing and analysis of various texts and the quantification and
visualization of the data for research purposes. The book is
accompanied by a website that contains the code discussed in the book
and sample texts that can be used as corpora for the exercises.
Python programming for Linguistics and Digital Humanities is divided
into 12 chapters that are rather self-explanatory. The chapters are
further subdivided into short sections. Chapter 1, Introduction
briefly explains what programming is and covers the necessary
technicalities including the installation of Python and various code
editors. Chapters 2 and 3, Programming Basics I and Programming Basics
II, respectively are an introduction to various data types (such as
strings, floats, booleans, etc.) and basic functions. Chapter 4
Intermediate Strings Processing explains different methods of string
manipulation using code. Chapter 5 Working with Stored Data covers the
methods of accessing and processing files and their manipulation.
Chapter 6 Recognising and Working with Language Patterns explores
working with different kinds of language patterns - how to identify
and process them using Python. Chapter 7 Developing modular programs
delves deeper into giving structure to one's programs and explains how
to reuse the code for different tasks; it further discusses
dictionaries. Chapter 8 Word Lists, Frequencies and Ordering allows
the students to quantify the studied patterns through frequency lists
and tables. Chapter 9 Interacting with Data and Users Through GUIs
discusses basic graphical user interfaces (GUIs) and widgets. Chapter
10 Web Data and Annotations delves into the subject of markup
languages (including a very brief introduction to HTML) and explores
how to extract text from web pages. Chapter 11 Basic Visualisation
focuses on the visualisation of analyzed data in the form of graphs.
Chapter 12 Conclusion very briefly summarizes the content of the book
chapter-by-chapter. Finally, the book has an Appendix that contains
the program code discussed across its chapters and an Index. Each of
the chapters contains numerous exercises pertaining to the content
introduced in the preceding section. The chapters are concluded with a
Discussion section that once again breaks down the introduced concepts
and explains the exercises from the given chapter.
EVALUATION
With Python Programming for Linguistics and Digital Humanities:
Applications for Text-Focused Fields, the author introduces Python
programming to a group of students who, for a long time, have been
considered less likely to employ programming in their studies and
research. While the field of digital humanities has been gaining more
and more popularity in recent years and linguists recognized the
importance of digital literacy and the usefulness of basic programming
skills in analyzing and presenting data, good introductions to this
art geared towards audiences from the field of humanities are sorely
lacking. Weisser fills this niche with a friendly step-by-step
introduction. Unlike in other manuals of the kind, the author does not
overwhelm the students with superfluous introductions regarding
programming. While some would definitely appreciate minimal
theoretical background about programming as such, I believe that a
detailed introduction filled with technical details could simply
discourage many of the students of humanities. After explaining some
necessary technicalities (such as Python installation and a few words
about various text editors), the students have no choice but to simply
dive in and start writing their code right away. Each chapter briefly
introduces new concepts and provides the students with exercises
before they are ready to move on to the new material.
The new concepts are explained in a very concise and clear way -
again, the author does not delve into unnecessarily confusing
technicalities but gives the students the information that is
absolutely indispensable at their current stage of learning. This way
the reader does not get bored and can immediately apply their newly
acquired knowledge. Importantly, the exercises contain a minimal
amount of code (usually a few lines) which makes the code more
accessible and not visually overwhelming - which is often a problem of
introductory books about programming. Thanks to this design of the
exercises, the students can identify potential mistakes right away.
Furthermore, the exercises build on each other which helps the
students develop the habit of reusing the code they have written
previously, this way they do not write a new program for the purpose
of each task. The exercises are concentrated on practical tasks useful
for working with text, particularly for linguists. Early on the
students are introduced to basic tasks useful for morphology analysis
such as sorting words, breaking them down to smaller elements, for
instance extracting affixes, finding specific morphemes, etc. They are
also introduced to methods of manipulating and formatting the obtained
output. These tasks allow the learners to perform a quick keyword and
collocations analysis, compile frequency lists and build simple
concordances. More advanced chapters present to the learners how to
interact with the aforementioned data using Python widgets which gives
more structure to their programs and makes them more approachable for
beginners. Furthermore, methods of data visualization are introduced,
which allows the learners to present the results of their work in the
form of graphs to aid the statistical analysis. Finally, sample texts
to be found on the companion website are a nice addition - they give
the learners some material to work on right away as they do not need
to specifically look for data for the purpose of the exercises in
external sources.
Moreover, as I have mentioned above, each chapter of the book is
concluded with a Discussions session that wraps up the covered
material and expands on the exercises. This gives the students a
chance to revise the material, correct their mistakes and, possibly,
understand where they come from. Discussions is perhaps the strongest
point of this textbook as it allows the students to internalize their
knowledge through exercises and “discuss” them despite the lack of a
teacher. These sections are put at the end of each chapter even though
they refer to specific exercises. Possibly the best way to make the
most out of this section would be to go back and forth between a given
exercise and a corresponding Discussion - a good idea would be to move
these sections under the corresponding exercises so that the student
does not need to go to the end of a given chapter to find the
commentary on the specific exercise.
Writing an introductory programming textbook for an audience with no
background in computer science is not an easy task, as the authors who
oftentimes have a strong programming background tend to overwhelm the
students with unnecessary details. Weisser offers an introduction that
guides those taking their first steps in programming in an extremely
clear way that focuses on the tasks that linguists and students of
humanities are most likely to be interested in. Given the above, it is
difficult to point out any major drawbacks. Python Programming for
Linguistics and Digital Humanities is a well-written, gentle
introduction. From a purely technical point of view, I have not
spotted any typographical errors which, especially when it comes to
the Python code, could be a source of problems for the students. Since
I received a digital copy of the book, I could not possibly assess how
easy or comfortable it would be to consult a physical copy when
working on the code and switch between the exercises and Discussions
sections, which I mentioned before. While the book seems to be
intended for self-learners, I do not see why it could not be used in
the classroom - I am confident that a skilled instructor could adapt
the material for classroom use. If I could make a suggestion - it
would be useful to provide some more guidance to the self-learner as
to how to use the book - how to best approach the material and how
much time to dedicate to each section. Similarly, the exercises would
benefit from more extensive guidance and comments as they are very
often limited to step-by-step instructions. Stating the objectives,
expected results and possible real-life applications of the code
written in a given exercise could help the students understand what a
given program could be good for. Despite that, after going through the
book, the learners should be able to write programs that allow for
simple morphological analysis and extracting statistics for a given
text.
The presented material gets progressively more difficult and while the
first few chapters can be covered relatively quickly, the intermediate
material requires significantly more time and effort therefore some
self-learners would appreciate some additional guidance as to how to
organize their learning. On a similar note, the textbook could benefit
from introducing one or two revision chapters to give the students a
breather and slow down the overall pace. I also wish more attention
was dedicated to various Python libraries and where to find them. The
author mentions only Matplotlib and Pandas; libraries such as NLTK are
extremely useful tools and it would be beneficial for the students to
be familiar with those from the beginning, they can also facilitate
the learning experience and encourage the students to explore more
advanced tools allowing for a more in-depth data extraction, analysis
and processing. Dedicating an additional section about Python
libraries could be an excellent occasion to introduce some more
exercises oriented on specific tasks. The book counts 288 pages
therefore expanding on some aspects of language processing with Python
would not necessarily “overload” it. Finally, some indication
regarding the next steps in programming could be a useful addition,
resources for intermediate programming for humanities research are
lacking and the learners might need additional guidance as to how to
take their skills further.
In sum, Python Programming for Linguistics and Digital Humanities
makes a good introduction to Python programming for those who do not
have any background in computer science but may want to facilitate
their research and analyses using this versatile and relatively
easy-to-learn programming language. The textbook is free of major
drawbacks and I am confident its clear structure and numerous
exercises will encourage the students to explore the possibilities of
programming for the purpose of research in the field of humanities,
particularly linguistics. The author has shown that programming can be
learned with relative ease and I hope that it will prompt more
university programs that teach Python outside of the computer science
courses.
ABOUT THE REVIEWER
David M. Karaj obtained his PhD in Linguistics at the University of
Pavia, Italy. His main research interests regard syntax, linguistic
typology, computational linguistics and valency-changing phenomena.
------------------------------------------------------------------------------
********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:
https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8
LINGUIST List is supported by the following publishers:
Bloomsbury Publishing http://www.bloomsbury.com/uk/
Brill http://www.brill.com
Cambridge University Press http://www.cambridge.org/linguistics
De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton
Equinox Publishing Ltd http://www.equinoxpub.com/
European Language Resources Association (ELRA) http://www.elra.info
John Benjamins http://www.benjamins.com/
Language Science Press http://langsci-press.org
Lincom GmbH https://lincom-shop.eu/
Multilingual Matters http://www.multilingual-matters.com/
Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/
Oxford University Press http://www.oup.com/us
Wiley http://www.wiley.com
----------------------------------------------------------
LINGUIST List: Vol-35-2647
----------------------------------------------------------
More information about the LINGUIST
mailing list