29.3469, Books: From Lexical Functional Grammar to Enhanced Universal Dependencies: Patejuk, Przepiórkowski
The LINGUIST List
linguist at listserv.linguistlist.org
Mon Sep 10 21:03:37 UTC 2018
LINGUIST List: Vol-29-3469. Mon Sep 10 2018. ISSN: 1069 - 4875.
Subject: 29.3469, Books: From Lexical Functional Grammar to Enhanced Universal Dependencies: Patejuk, Przepiórkowski
Moderator: linguist at linguistlist.org (Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté)
Homepage: https://linguistlist.org
Please support the LL editors and operation with a donation at:
https://funddrive.linguistlist.org/donate/
Editor for this issue: Jeremy Coburn <jecoburn at linguistlist.org>
================================================================
Date: Mon, 10 Sep 2018 17:03:30
From: Adam Przepiórkowski [adamp at ipipan.waw.pl]
Subject: From Lexical Functional Grammar to Enhanced Universal Dependencies: Patejuk, Przepiórkowski
Title: From Lexical Functional Grammar to Enhanced Universal
Dependencies
Subtitle: Linguistically informed treebanks of Polish
Publication Year: 2018
Publisher: Institute of Computer Science, Polish Academy of Sciences
http://nlp.ipipan.waw.pl/
Book URL: http://nlp.ipipan.waw.pl/Bib/pat:prz:18:book.pdf
Author: Agnieszka Patejuk
Author: Adam Przepiórkowski
Electronic: ISBN: 9788363159269 Pages: 263 Price: U.K. £ 0 Comment: Open Access (CC BY-NC-SA 4.0)
Abstract:
Syntactically annotated corpora, or ‘treebanks’, belong to the most
heterogeneous kinds of linguistic resources. They differ not only in the
general kind of approach they adopt (constituency or dependency), but also in
the number of representation levels they assume (often one, but sometimes two
or more) and in the extent to which they follow an established linguistic
theory (if at all). Also, even within one kind of approach, the representation
of a particular phenomenon may differ widely between treebanks.
In treebank development, there is a clear tension between theoretical accuracy
within a treebank and utilitarian consistency between treebanks of the same or
different languages. On the one hand, utterances should be annotated with
linguistically accurate and precise descriptions, and one way to achieve this
is by following a specific linguistic theory, one with a well-defined
terminology, good formal background and a body of carefully justified analyses
of many phenomena of typologically diverse languages. An example of such a
theory is Lexical Functional Grammar (LFG). However, LFG is not the only
theory of this kind, and even within one theory, similar phenomena may receive
very different representations, reflecting different traditions or different
weights assigned to pieces of evidence supporting one or another analysis. So
this theoretically-oriented approach to treebank development inevitably leads
to the creation of treebanks with very diverse annotation schemes, which are
often comprehensible only to a limited number of followers of a given
linguistic theory.
On the other hand, especially in the context of multilingual natural language
processing (NLP), treebanks should ideally follow a common annotation scheme,
one that is intelligible to a much broader group of treebank consumers than
professional linguists working within a given theory. Moreover, similar
phenomena and constructions should receive analogous representations, even if
there are subtle – from the point of view of practical applications –
differences suggesting dissimilar analyses. A recent attempt at such a
comprehensive syntactic annotation scheme is Universal Dependencies (UD;
http://universaldependencies.org/). As a practical solution, UD aims at
providing a maximally simple syntactic representation, one that is useful for
various NLP applications, even if at the cost of linguistic precision.
This monograph presents two treebanks of Polish which follow the two
approaches, as well as the procedure of converting one to the other. Part I
presents an LFG treebank, part II describes the procedure of converting this
LFG structure bank to a UD treebank, and part III offers a stand-alone
presentation of the resulting UD treebank of Polish.
Linguistic Field(s): Linguistic Theories
Syntax
Text/Corpus Linguistics
Subject Language(s): Polish (pol)
Language Family(ies): West Slavic
Written In: English (eng)
See this book announcement on our website:
http://linguistlist.org/pubs/books/get-book.cfm?BookID=129953
------------------------------------------------------------------------------
***************** LINGUIST List Support *****************
Please support the LL editors and operation with a donation at:
The IU Foundation Crowd Funding site:
https://iufoundation.fundly.com/the-linguist-list
The LINGUIST List FundDrive Page:
https://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-29-3469
----------------------------------------------------------
More information about the LINGUIST
mailing list