[Corpora-List] Quotable Statistics on Unstructured Data on the WWW

Jim Fidelholtz fidelholtz at gmail.com
Sun Dec 8 05:42:42 UTC 2013


Hi Mike et All,

One supposes that ambiguity is a principle cause of contradictions. I
forget who proved it (about a century or more ago), but *all* structured
entities of any complexity whatsoever (e. g., theories) are mathematically
*guaranteed* to produce contradictions! Also, one thing early (1960s)
attempts at automatic semantic analysis of sentences (e.g., the Harvard
project at that time) showed was that apparently inoffensive sentences
turned out to be surprisingly multiply ambiguous--not quite the number of
each word in a sentence's meanings multiplied by that number for each other
word in the sentence, but still a rather large number for even short
sentences. Unambiguous sentences are extremely difficult to produce, much
less find. So if even designed languages ineluctably lead to
contradictions, good luck with relational databases!

[Note: (I'm aware I skipped several steps in the above 'proof'', but just
sayin' ...)

Jim

James L. Fidelholtz
Posgrado en Ciencias del Lenguaje
Instituto de Ciencias Sociales y Humanidades
Benemérita Universidad Autónoma de Puebla, MÉXICO


On Fri, Dec 6, 2013 at 3:04 PM, maxwell <maxwell at umiacs.umd.edu> wrote:

> On 2013-12-06 15:47, Otto Lassen wrote:
>
>> If texts are structured or unstructured data depends on their origin.
>>
>
> I think a cross-cutting problem, and perhaps a more easily quantified (but
> maybe still useful) one, is that of ambiguity.  Structured data is often
> designed to avoid ambiguity.  (Structured data may provide an explicit
> representation of ambiguities, but the explicit representation should not
> in itself be ambiguous.)
>
> I'm sure someone will come up with counter-examples, but relational
> databases and XML documents are both designed to be unambiguously parseable
> (given a database schema or an XML schema).  So were blueprints, if anyone
> remembers those.  Natural language, otoh, is inherently (and often
> exceedingly) ambiguous.  So are Nekker cubes.
>
> So it might be helpful (if possible) to re-phrase the question to ask how
> much data is potentially ambiguous, and at what level (syntactically,
> morphologically, lexically, semantically, pragmatically).  By "potentially"
> ambiguous, I mean in principle; a particular instance of a natural language
> sentence might be syntactically unambiguous, but natural language in
> general is syntactically ambiguous.  I suppose anything is _pragmatically_
> ambiguous.
>
>    Mike Maxwell
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131207/b7f57c40/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list