23.4752, Diss: Comp Ling/ Discourse Analysis/ Pragmatics/ Text/Corpus Ling/ English: Leidner: 'Toponym Resolution in Text...'

linguist at linguistlist.org linguist at linguistlist.org
Wed Nov 14 15:32:51 UTC 2012


LINGUIST List: Vol-23-4752. Wed Nov 14 2012. ISSN: 1069 - 4875.

Subject: 23.4752, Diss: Comp Ling/ Discourse Analysis/ Pragmatics/ Text/Corpus Ling/ English: Leidner: 'Toponym Resolution in Text...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Lili Xia <lxia at linguistlist.org>
================================================================  


Date: Wed, 14 Nov 2012 10:32:18
From: Jochen Leidner [leidner at acm.org]
Subject: Toponym Resolution in Text: Annotation, evaluation and applications of spatial grounding of place names

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=23-4752.html&submissionid=4558009&topicid=14&msgnumber=1
 
Institution: University of Edinburgh 
Program: School of Informatics 
Dissertation Status: Completed 
Degree Date: 2007 

Author: Jochen L. Leidner

Dissertation Title: Toponym Resolution in Text: Annotation, evaluation and
applications of spatial grounding of place names 

Dissertation URL:  http://www.era.lib.ed.ac.uk/handle/1842/1849

Linguistic Field(s): Computational Linguistics
                     Discourse Analysis
                     Pragmatics
                     Text/Corpus Linguistics

Subject Language(s): English (eng)


Dissertation Director(s):
Bonnie Webber
Claire Grover

Dissertation Abstract:

Background:  Spatial and temporal expressions refer to events in
space-time, and the grounding of events is a precondition for
reasoning. Thus, automatic grounding can improve many applications
such as automatic map drawing and question answering (e.g., for
questions like 'How far is London from Edinburgh?'). Whereas temporal
grounding has received considerable attention, robust spatial
grounding has long been neglected.  I define the task of automatic
Toponym Resolution as computing the mapping from instances of names
for places as found in a text to a representation of the extensional
semantics of the location referred to, such as a geographic
latitude/longitude footprint.  The mapping between names and locations
is referentially ambiguous: London can refer to the capital of the UK
or to London, Ontario, Canada, or other Londons on earth).

Objective: I investigate how referentially ambiguous spatial named
entities can be grounded, or resolved, with respect to an extensional
coordinate model robustly on open-domain news text.

Method:  While a small number of previous attempts have been made to
solve the toponym resolution problem, these were either not evaluated,
or evaluation was done by manual inspection of system output instead
of curating a reusable reference corpus.  Since the relevant
literature is scattered across several libraries, information
retrieval, natural language processing) and descriptions of algorithms
are mostly given in informal prose, I attempt to systematically
describe them and aim at a reconstruction in a uniform, semi-formal
pseudo-code notation for easier re-implementation.  A systematic
comparison leads to an inventory of heuristics and other sources of
evidence.  In order to carry out a comparative evaluation procedure,
an evaluation resource is required. Unfortunately, to date no gold
standard has been curated in the research community. To this end, a
reference gazetteer and an associated novel reference corpus with
human-labeled referent annotation are created.  These are subsequently
used to benchmark a selection of the reconstructed algorithms and a
novel re-combination of the heuristics cataloged in the inventory.  I
then compare the performance of the same TR algorithms under three
different conditions, namely applying it to the output of human
named entity annotation, automatic annotation using an existing
Maximum Entropy sequence tagging model, and a naive toponym
lookup procedure in a gazetteer. 

Evaluation:  The algorithms implemented in this thesis are evaluated
in an intrinsic or component evaluation. To this end, we define a
task-specific matching criterion to be used with traditional Precision
and Recall evaluation metrics.

Main Contributions: The major contributions of this thesis are as follows: 
- a new reference corpus in which instances of location named entities
  have been manually annotated with spatial grounding information for
  populated places.
- a new method and implemented system to resolve toponyms that is
  capable of robustly processing unseen text (open-domain online
  newswire text) and grounding toponym instances in an extensional
  model using longitude and latitude coordinates and hierarchical path
  descriptions, and a comparison between a replicated method as described
  in the literature, which functions as a baseline, and a novel
  algorithm based on minimality heuristics; and
- an empirical analysis of the relative utility of various heuristic
  biases and other sources of evidence. 






----------------------------------------------------------
LINGUIST List: Vol-23-4752	
----------------------------------------------------------



More information about the LINGUIST mailing list