24.492, Books: Index Structures for the Exploration of Natural Language Corpora: Goller
linguist at linguistlist.org
linguist at linguistlist.org
Mon Jan 28 15:49:01 UTC 2013
LINGUIST List: Vol-24-492. Mon Jan 28 2013. ISSN: 1069 - 4875.
Subject: 24.492, Books: Index Structures for the Exploration of Natural Language Corpora: Goller
Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
<reviews at linguistlist.org>
Homepage: http://linguistlist.org
Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!
USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21
For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.
Editor for this issue: Rebekah McClure <rebekah at linguistlist.org>
================================================================
Date: Mon, 28 Jan 2013 10:48:15
From: Ulrich Lueders [lincom.europa at t-online.de]
Subject: Index Structures for the Exploration of Natural Language Corpora: Goller
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=24-492.html&submissionid=7307146&topicid=2&msgnumber=1
Title: Index Structures for the Exploration of Natural Language Corpora
Series Title: Linguistic Resources for Natural Language Processing 06
Publication Year: 2013
Publisher: Lincom GmbH
http://www.lincom-shop.eu
Book URL: http://www.lincom-shop.eu/
Author: Johannes Goller
Paperback: ISBN: 9783862884087 Pages: 140 Price: Europe EURO 64.80
Abstract:
This study describes the development of a large-scale corpus query system – a
specialized search engine used to perform advanced types of pattern search,
especially for patterns used by linguists interested in discovering syntactic
phenomena in large corpora.
Beginning with a review of traditional search engine algorithms, the main
focus then shifts to suffix arrays, a data structure that has been available
since 1987, but is not commonly used in large-scale search engines for various
technical reasons.
Recently developed algorithms are considered in this study as the starting
point for a new attempt to re-introduce the suffix array as a data structure
of practical value to corpus-linguistic research. One of the key findings is a
technique that combines several suffix arrays using indexed bit vectors and
enables the searching of layers of meta information, such as part-of-speech
information and semantic labels, in parallel to searching the text. A set of
algorithms operating on that data structure is presented, enabling
sophisticated pattern matching, such as gap-matching and gap-filling, as well
as improved methods of concordance generation. The final chapters present
practical examples of how the new system is used to make linguistically
relevant discoveries in real corpora.
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics
Written In: English (eng)
See this book announcement on our website:
http://linguistlist.org/pubs/books/get-book.cfm?BookID=64039
MAJOR SUPPORTERS
Brill
http://www.brill.nl
Cambridge Scholars Publishing
http://www.c-s-p.org
Cambridge University Press
http://us.cambridge.org
Cascadilla Press
http://www.cascadilla.com/
Bloomsbury Publishing
(formerly The Continuum International Publishing Group)
http://www.continuumbooks.com
De Gruyter Mouton
http://www.degruyter.com/mouton
Edinburgh University Press
http://www.eup.ed.ac.uk/
Elsevier Ltd
http://www.elsevier.com/linguistics
Emerald Group Publishing Limited
http://www.emeraldinsight.com/
Equinox Publishing Ltd
http://www.equinoxpub.com/
European Language Resources Association - ELRA
http://www.elra.info.
Georgetown University Press
http://www.press.georgetown.edu
Hodder Education
http://www.hoddereducation.co.uk
John Benjamins
http://www.benjamins.com/
Lincom GmbH
http://www.lincom.eu
MIT Press
http://mitpress.mit.edu/
Morgan & Claypool Publishers
Multilingual Matters
http://www.multilingual-matters.com/
Narr Francke Attempto Verlag GmbH + Co. KG
http://www.narr.de/
Oxford University Press
http://www.oup.com/us
Palgrave Macmillan
http://www.palgrave.com
Peter Lang AG
http://www.peterlang.com
Rodopi
http://www.rodopi.nl/
Routledge (Taylor and Francis)
http://www.routledge.com/
Springer
http://www.springer.com
University of Toronto Press
http://www.utpjournals.com/
Wiley-Blackwell
http://www.wiley.com
OTHER SUPPORTING PUBLISHERS
Association of Editors of the Journal of Portuguese Linguistics
http://www.fl.ul.pt/revistas/JPL/JPLweb.htm
International Pragmatics Assoc.
http://www.ipra.be
Netherlands Graduate School of Linguistics / Landelijke - LOT
http://www.lotpublications.nl/
SIL International
http://www.ethnologue.com/bookstore.asp
University of Nebraska Press
http://www.nebraskapress.unl.edu/catalog/CategoryInfo.aspx?cid=152
Utrecht institute of Linguistics
http://www-uilots.let.uu.nl/
----------------------------------------------------------
LINGUIST List: Vol-24-492
----------------------------------------------------------
More information about the LINGUIST
mailing list