[Corpora-List] language-mention in corpora
Shomir Wilson
shomir at gmail.com
Sat Feb 13 21:32:34 UTC 2010
Hi,
I am a Computer Science PhD student at the University of Maryland, and I
am working on a project involving the use-mention distinction in natural
language. In particular, I want to identify patterns that indicate the
occurrence of language-mention, and I was wondering if anyone is aware
of existing corpora that either tag this phenomena or are likely to be
rich with instances of it.
I have a draft of a definition for what I'm calling "sentential"
language-mention, where the mentioned linguistic entity (or entities) is
referred to inside of the same sentence it occurs. For the moment, to
keep this problem tractable, I am focusing on that particular form of
the phenomenon.
The definition: For T a token or a set of tokens in a sentence, if T
refers to a property of the token T or the type of T, then T is an
instance of language-mention.
Here, a token can be a letter, sound, word, phrase, or entire sentence.
A property might be its spelling, pronunciation, original source (in
the case of quotation), meaning (for a variety of interpretations of
that term), or another aspect for which language is shown or demonstrated.
Here are some examples of this I pulled from Wikipedia, with the
mentioned language in quote marks (one of a few orthographic cues that
hint at but do not guarantee the phenomenon occurs):
-"Submerged forest" is a term used to describe the remains of trees
(especially tree stumps) which have been submerged by marine
transgression, i.e. sea level rise. (The phrase is mentioned to
elucidate its meaning)
-He also introduced the modern notation for the trigonometric functions,
the letter ''e'' for the base of the natural logarithm (now also known
as Euler's number) ... (The symbol 'e' is mentioned to clarify what it
represents)
-James Breckenridge Speed (middle name sometimes spelled "Breckinridge")
(1844-1912) was a successful businessman in Louisville, Kentucky and an
important philanthropist. (The spelling of the name inside of quote
marks is mentioned)
There is probably much ground for debate on this (as I've already
encountered offline!), but any help would be greatly appreciated.
I've searched the list archives and found little on this topic (aside
from my previous, less descriptive post to this list, made about 18
months ago). Please reply directly to my email address
(shomir at umd.edu), and I'll recirculate any findings on the list.
Thanks,
Shomir Wilson
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list