[Corpora-List] language-mention in corpora

Shomir Wilson shomir at gmail.com
Sat Feb 13 21:32:34 UTC 2010


Hi,

I am a Computer Science PhD student at the University of Maryland, and I 
am working on a project involving the use-mention distinction in natural 
language.  In particular, I want to identify patterns that indicate the 
occurrence of language-mention, and I was wondering if anyone is aware 
of existing corpora that either tag this phenomena or are likely to be 
rich with instances of it.

I have a draft of a definition for what I'm calling "sentential" 
language-mention, where the mentioned linguistic entity (or entities) is 
referred to inside of the same sentence it occurs.  For the moment, to 
keep this problem tractable, I am focusing on that particular form of 
the phenomenon.

The definition: For T a token or a set of tokens in a sentence, if T 
refers to a property of the token T or the type of T, then T is an 
instance of language-mention.

Here, a token can be a letter, sound, word, phrase, or entire sentence. 
  A property might be its spelling, pronunciation, original source (in 
the case of quotation), meaning (for a variety of interpretations of 
that term), or another aspect for which language is shown or demonstrated.

Here are some examples of this I pulled from Wikipedia, with the 
mentioned language in quote marks (one of a few orthographic cues that 
hint at but do not guarantee the phenomenon occurs):

-"Submerged forest" is a term used to describe the remains of trees 
(especially tree stumps) which have been submerged by marine 
transgression, i.e. sea level rise.  (The phrase is mentioned to 
elucidate its meaning)
-He also introduced the modern notation for the trigonometric functions, 
the letter ''e'' for the base of the natural logarithm (now also known 
as Euler's number) ... (The symbol 'e' is mentioned to clarify what it 
represents)
-James Breckenridge Speed (middle name sometimes spelled "Breckinridge") 
(1844-1912) was a successful businessman in Louisville, Kentucky and an 
important philanthropist.  (The spelling of the name inside of quote 
marks is mentioned)

There is probably much ground for debate on this (as I've already 
encountered offline!), but any help would be greatly appreciated.

I've searched the list archives and found little on this topic (aside 
from my previous, less descriptive post to this list, made about 18 
months ago).  Please reply directly to my email address 
(shomir at umd.edu), and I'll recirculate any findings on the list.

Thanks,


Shomir Wilson

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list