24.492, Books: Index Structures for the Exploration of Natural Language Corpora: Goller

linguist at linguistlist.org linguist at linguistlist.org
Mon Jan 28 15:49:01 UTC 2013


LINGUIST List: Vol-24-492. Mon Jan 28 2013. ISSN: 1069 - 4875.

Subject: 24.492, Books: Index Structures for the Exploration of Natural Language Corpora: Goller

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Rebekah McClure <rebekah at linguistlist.org>
================================================================  


Date: Mon, 28 Jan 2013 10:48:15
From: Ulrich Lueders [lincom.europa at t-online.de]
Subject: Index Structures for the Exploration of Natural Language Corpora: Goller

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=24-492.html&submissionid=7307146&topicid=2&msgnumber=1
 


Title: Index Structures for the Exploration of Natural Language Corpora 
Series Title: Linguistic Resources for Natural Language Processing 06  

Publication Year: 2013 
Publisher: Lincom GmbH
	   http://www.lincom-shop.eu
	

Book URL: http://www.lincom-shop.eu/ 


Author: Johannes Goller

Paperback: ISBN:  9783862884087 Pages: 140 Price: Europe EURO 64.80


Abstract:

This study describes the development of a large-scale corpus query system – a
specialized search engine used to perform advanced types of pattern search,
especially for patterns used by linguists interested in discovering syntactic
phenomena in large corpora.

Beginning with a review of traditional search engine algorithms, the main
focus then shifts to suffix arrays, a data structure that has been available
since 1987, but is not commonly used in large-scale search engines for various
technical reasons.

Recently developed algorithms are considered in this study as the starting
point for a new attempt to re-introduce the suffix array as a data  structure
of practical value to corpus-linguistic research. One of the key findings is a
technique that combines several suffix arrays using indexed bit vectors and
enables the searching of layers of meta information, such as part-of-speech
information and semantic labels, in parallel to searching the text. A set of
algorithms operating on that data structure is presented, enabling
sophisticated pattern matching, such as gap-matching and gap-filling, as well
as improved methods of concordance generation. The final chapters present
practical examples of how the new system is used to make linguistically
relevant discoveries in real corpora.
 



Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics


Written In: English  (eng)

See this book announcement on our website: 
http://linguistlist.org/pubs/books/get-book.cfm?BookID=64039




MAJOR SUPPORTERS

	Brill          
		http://www.brill.nl	

	Cambridge Scholars Publishing          
		http://www.c-s-p.org	

	Cambridge University Press          
		http://us.cambridge.org	

	Cascadilla Press          
		http://www.cascadilla.com/	

	Bloomsbury Publishing
	(formerly The Continuum International Publishing Group)
		http://www.continuumbooks.com	

	De Gruyter Mouton          
		http://www.degruyter.com/mouton	

	Edinburgh University Press          
		http://www.eup.ed.ac.uk/	

	Elsevier Ltd          
		http://www.elsevier.com/linguistics	

	Emerald Group Publishing Limited          
		http://www.emeraldinsight.com/	

	Equinox Publishing Ltd          
		http://www.equinoxpub.com/	

	European Language Resources Association - ELRA          
		http://www.elra.info.	

	Georgetown University Press          
		http://www.press.georgetown.edu	

	Hodder Education          
		http://www.hoddereducation.co.uk	

	John Benjamins          
		http://www.benjamins.com/	

	Lincom GmbH          
		http://www.lincom.eu	

	MIT Press          
		http://mitpress.mit.edu/	

	Morgan & Claypool Publishers          
			

	Multilingual Matters          
		http://www.multilingual-matters.com/	

	Narr Francke Attempto Verlag GmbH + Co. KG          
		http://www.narr.de/	

	Oxford University Press          
		http://www.oup.com/us	

	Palgrave Macmillan          
		http://www.palgrave.com	

	Peter Lang AG          
		http://www.peterlang.com	

	Rodopi          
		http://www.rodopi.nl/	

	Routledge (Taylor and Francis)          
		http://www.routledge.com/	

	Springer          
		http://www.springer.com	

	University of Toronto Press          
		http://www.utpjournals.com/	

	Wiley-Blackwell          
		http://www.wiley.com	

OTHER SUPPORTING PUBLISHERS	

	Association of Editors of the Journal of Portuguese Linguistics
		http://www.fl.ul.pt/revistas/JPL/JPLweb.htm 

	International Pragmatics Assoc.
		http://www.ipra.be 

	Netherlands Graduate School of Linguistics / Landelijke - LOT
		http://www.lotpublications.nl/ 

	SIL International
		http://www.ethnologue.com/bookstore.asp 

	University of Nebraska Press
		http://www.nebraskapress.unl.edu/catalog/CategoryInfo.aspx?cid=152

	Utrecht institute of Linguistics
		http://www-uilots.let.uu.nl/ 



----------------------------------------------------------
LINGUIST List: Vol-24-492	
----------------------------------------------------------



More information about the LINGUIST mailing list