[Corpora-List] Quotable Statistics on Unstructured Data on the WWW

Kristian Kankainen kristian at eki.ee
Sun Dec 8 07:10:44 UTC 2013


It was the mathematician Kurt Gödel who proved it in 1931 with his two 
incompleteness theorems (read more on Wikipedia). It is a very good 
point to bring out into the semiosphere of linguistics.

Alas this discussion on unstructured vs structured, I defend specialised 
terminologies (so shoot me too) but I do confess that collisions between 
terminologies is a headaching but forwardbringing force.

Kristian


08.12.2013 07:42, Jim Fidelholtz kirjutas:
> Hi Mike et All,
>
> One supposes that ambiguity is a principle cause of contradictions. I 
> forget who proved it (about a century or more ago), but *all* 
> structured entities of any complexity whatsoever (e. g., theories) are 
> mathematically *guaranteed* to produce contradictions! Also, one thing 
> early (1960s) attempts at automatic semantic analysis of sentences 
> (e.g., the Harvard project at that time) showed was that apparently 
> inoffensive sentences turned out to be surprisingly multiply 
> ambiguous--not quite the number of each word in a sentence's meanings 
> multiplied by that number for each other word in the sentence, but 
> still a rather large number for even short sentences. Unambiguous 
> sentences are extremely difficult to produce, much less find. So if 
> even designed languages ineluctably lead to contradictions, good luck 
> with relational databases!
>
> [Note: (I'm aware I skipped several steps in the above 'proof'', but 
> just sayin' ...)
>
> Jim
>
> James L. Fidelholtz
> Posgrado en Ciencias del Lenguaje
> Instituto de Ciencias Sociales y Humanidades
> Benemérita Universidad Autónoma de Puebla, MÉXICO
>
>
> On Fri, Dec 6, 2013 at 3:04 PM, maxwell <maxwell at umiacs.umd.edu 
> <mailto:maxwell at umiacs.umd.edu>> wrote:
>
>     On 2013-12-06 15:47, Otto Lassen wrote:
>
>         If texts are structured or unstructured data depends on their
>         origin.
>
>
>     I think a cross-cutting problem, and perhaps a more easily
>     quantified (but maybe still useful) one, is that of ambiguity.
>      Structured data is often designed to avoid ambiguity.
>      (Structured data may provide an explicit representation of
>     ambiguities, but the explicit representation should not in itself
>     be ambiguous.)
>
>     I'm sure someone will come up with counter-examples, but
>     relational databases and XML documents are both designed to be
>     unambiguously parseable (given a database schema or an XML
>     schema).  So were blueprints, if anyone remembers those.  Natural
>     language, otoh, is inherently (and often exceedingly) ambiguous.
>      So are Nekker cubes.
>
>     So it might be helpful (if possible) to re-phrase the question to
>     ask how much data is potentially ambiguous, and at what level
>     (syntactically, morphologically, lexically, semantically,
>     pragmatically).  By "potentially" ambiguous, I mean in principle;
>     a particular instance of a natural language sentence might be
>     syntactically unambiguous, but natural language in general is
>     syntactically ambiguous.  I suppose anything is _pragmatically_
>     ambiguous.
>
>        Mike Maxwell
>
>
>     _______________________________________________
>     UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>     Corpora mailing list
>     Corpora at uib.no <mailto:Corpora at uib.no>
>     http://mailman.uib.no/listinfo/corpora
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131208/12cc511a/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list