[Corpora-List] Quotable Statistics on Unstructured Data on the WWW
Kristian Kankainen
kristian at eki.ee
Sun Dec 8 07:10:44 UTC 2013
It was the mathematician Kurt Gödel who proved it in 1931 with his two
incompleteness theorems (read more on Wikipedia). It is a very good
point to bring out into the semiosphere of linguistics.
Alas this discussion on unstructured vs structured, I defend specialised
terminologies (so shoot me too) but I do confess that collisions between
terminologies is a headaching but forwardbringing force.
Kristian
08.12.2013 07:42, Jim Fidelholtz kirjutas:
> Hi Mike et All,
>
> One supposes that ambiguity is a principle cause of contradictions. I
> forget who proved it (about a century or more ago), but *all*
> structured entities of any complexity whatsoever (e. g., theories) are
> mathematically *guaranteed* to produce contradictions! Also, one thing
> early (1960s) attempts at automatic semantic analysis of sentences
> (e.g., the Harvard project at that time) showed was that apparently
> inoffensive sentences turned out to be surprisingly multiply
> ambiguous--not quite the number of each word in a sentence's meanings
> multiplied by that number for each other word in the sentence, but
> still a rather large number for even short sentences. Unambiguous
> sentences are extremely difficult to produce, much less find. So if
> even designed languages ineluctably lead to contradictions, good luck
> with relational databases!
>
> [Note: (I'm aware I skipped several steps in the above 'proof'', but
> just sayin' ...)
>
> Jim
>
> James L. Fidelholtz
> Posgrado en Ciencias del Lenguaje
> Instituto de Ciencias Sociales y Humanidades
> Benemérita Universidad Autónoma de Puebla, MÉXICO
>
>
> On Fri, Dec 6, 2013 at 3:04 PM, maxwell <maxwell at umiacs.umd.edu
> <mailto:maxwell at umiacs.umd.edu>> wrote:
>
> On 2013-12-06 15:47, Otto Lassen wrote:
>
> If texts are structured or unstructured data depends on their
> origin.
>
>
> I think a cross-cutting problem, and perhaps a more easily
> quantified (but maybe still useful) one, is that of ambiguity.
> Structured data is often designed to avoid ambiguity.
> (Structured data may provide an explicit representation of
> ambiguities, but the explicit representation should not in itself
> be ambiguous.)
>
> I'm sure someone will come up with counter-examples, but
> relational databases and XML documents are both designed to be
> unambiguously parseable (given a database schema or an XML
> schema). So were blueprints, if anyone remembers those. Natural
> language, otoh, is inherently (and often exceedingly) ambiguous.
> So are Nekker cubes.
>
> So it might be helpful (if possible) to re-phrase the question to
> ask how much data is potentially ambiguous, and at what level
> (syntactically, morphologically, lexically, semantically,
> pragmatically). By "potentially" ambiguous, I mean in principle;
> a particular instance of a natural language sentence might be
> syntactically unambiguous, but natural language in general is
> syntactically ambiguous. I suppose anything is _pragmatically_
> ambiguous.
>
> Mike Maxwell
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131208/12cc511a/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list