[Corpora-List] Author+'s plans for books

Copperman, Max Max.Copperman at knova.com
Wed Mar 15 18:12:36 UTC 2006


Discourse structure theory may be an appropriate tool for this job.
However, Rhetorical Structure Theory is unlikely to be the discourse
structure theory that helps.  It's rather ad hoc (and I'm being
charitable here).  I'd look at work by Livia Polanyi and work on
Discourse Representation Theory. Someone actually familiar with the
field could probably make stronger recommendations.
 
Max Copperman

________________________________

From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Alexander Schutz
Sent: Wednesday, March 15, 2006 9:30 AM
To: D.G.Damle
Cc: CORPORA
Subject: Re: [Corpora-List] Author+'s plans for books



	I am trying to learn ontologies from text.  Evaluation is a
problem, since if you ask people to read the text and then to evaluate
the automatically generated ontology; every reader's concept structure
may be different.  The variation amongst readers may be too great!  

In my opinion, it will be extremely helpful to restrict the amount of
concepts (or the choice of concepts in general). It is not so obvious
what you are trying to achieve: 
Evaluating the learned concepts of a system against a gold standard?
Then, on which kind of corpus did you conduct your experiments? I assume
it is a domain specific corpus (of textbooks). In that case it would be
quite easy to agree on a subset of certain concepts for that domain, and
restrict the domain experts (readers) to refer only to elements of this
subset while evaluating your system.


	It is also difficult to have such an ontology marked by domain
experts.  What the domain experts know about the domain may not be
reflected in the text and so Rrecall is particularly difficult.  Also,
evaluators may not be willing to read large texts.

Evaluation in ontology learning is a pain in the neck, and your problem
with precision will by far outweigh your recall problem. Just imagine
that your goal is to *learn* ontology concepts (or relations). What if
your system is learning something new (i.e. which is not contained in
the gold standard, or in your subset of concepts agreed upon?). It will
then contribute to your precision error.
On the other hand, if you decide to compose your gold standard of all
the possible concepts in the whole world (just to make sure your system
will not run into precision problems described above), there will be
loads of concepts that you miss, because they are not contained in the
text (which accounts for the recall problem you described). Yes,
evaluation of ontology learning, it is a dilemma.

The fact that evaluators may not be willing to read large texts is in my
opinion not a problem of ontology learning and there is a lot you can do
to assure the loyalty of your evaluators (hint hint)


	Does the ontology defined by the author(s) of a large text
constitute a more objective yardstick?  Do authors have a list of
concepts and possibly some notion of structure about the text they set
out to create? (I am thinking particularly of textbooks).  Do any
authors commit something like a concept structure to paper or a computer
documentbefore they write the text?  Alternatively, is it likely that an
author could retrospectively  construct such a plan, notwithstanding the
issues of memory lapses etc.

To be honest I have not written any textbook but I would like to think
that before I write a larger chunk of text (say a paper), I have a
certain structure (and the containing concepts so to speak) in mind
before I actually start writing.


	Do any authors have such plans and the texts they wrote using
those plans in an electronic form which they would be happy to make
available for research?  What do list members who write textbooks, do?

If you speak of text planning, then maybe discourse and text theory is
the right thing for you, such as Rhetorical Structure Theory

@Article{thompson-mann87,
   Author="Thompson, Sandra A. and Mann, William C."
   Title="Rhetorical Structure Theory: A framework for the analysis of
texts",
     Journal="IPrA Papers in Pragmatics",

   Volume=1,
   Number=1,
   Pages="79-105",
     Abstract="One of the foundation papers of RST."
   Year=1987}

-- 
Alexander Schutz
Student of Computational Linguistics
University of Saarland, Germany 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060315/5a772785/attachment.htm>


More information about the Corpora mailing list