[Corpora-List] Quotable Statistics on Unstructured Data on the WWW

Jim Fidelholtz fidelholtz at gmail.com
Mon Dec 9 16:33:19 UTC 2013


Hi, Kristian,

Right. Thanks. As a mathematician, it's kind of embarrassing to forget
Gödel! Actually, I should go back and read his proofs, though perhaps I
should look up a translation into English (my originally lousy German is
now quite rusty from disuse!). I don't have anything in principle against
specialized terminologies (and, besides, unlike many Americans I don't own
any guns and don't have any plans to, so you are safe from me!), especially
as long as you take care to define your terms, and if it's a book you're
doing, include a glossary.

Jim

James L. Fidelholtz
Posgrado en Ciencias del Lenguaje
Instituto de Ciencias Sociales y Humanidades
Benemérita Universidad Autónoma de Puebla, MÉXICO


On Sun, Dec 8, 2013 at 1:10 AM, Kristian Kankainen <kristian at eki.ee> wrote:

>  It was the mathematician Kurt Gödel who proved it in 1931 with his two
> incompleteness theorems (read more on Wikipedia). It is a very good point
> to bring out into the semiosphere of linguistics.
>
> Alas this discussion on unstructured vs structured, I defend specialised
> terminologies (so shoot me too) but I do confess that collisions between
> terminologies is a headaching but forwardbringing force.
>
> Kristian
>
>
> 08.12.2013 07:42, Jim Fidelholtz kirjutas:
>
> Hi Mike et All,
>
>  One supposes that ambiguity is a principle cause of contradictions. I
> forget who proved it (about a century or more ago), but *all* structured
> entities of any complexity whatsoever (e. g., theories) are mathematically
> *guaranteed* to produce contradictions! Also, one thing early (1960s)
> attempts at automatic semantic analysis of sentences (e.g., the Harvard
> project at that time) showed was that apparently inoffensive sentences
> turned out to be surprisingly multiply ambiguous--not quite the number of
> each word in a sentence's meanings multiplied by that number for each other
> word in the sentence, but still a rather large number for even short
> sentences. Unambiguous sentences are extremely difficult to produce, much
> less find. So if even designed languages ineluctably lead to
> contradictions, good luck with relational databases!
>
>  [Note: (I'm aware I skipped several steps in the above 'proof'', but
> just sayin' ...)
>
>  Jim
>
> James L. Fidelholtz
> Posgrado en Ciencias del Lenguaje
> Instituto de Ciencias Sociales y Humanidades
> Benemérita Universidad Autónoma de Puebla, MÉXICO
>
>
> On Fri, Dec 6, 2013 at 3:04 PM, maxwell <maxwell at umiacs.umd.edu> wrote:
>
>> On 2013-12-06 15:47, Otto Lassen wrote:
>>
>>> If texts are structured or unstructured data depends on their origin.
>>>
>>
>>  I think a cross-cutting problem, and perhaps a more easily quantified
>> (but maybe still useful) one, is that of ambiguity.  Structured data is
>> often designed to avoid ambiguity.  (Structured data may provide an
>> explicit representation of ambiguities, but the explicit representation
>> should not in itself be ambiguous.)
>>
>> I'm sure someone will come up with counter-examples, but relational
>> databases and XML documents are both designed to be unambiguously parseable
>> (given a database schema or an XML schema).  So were blueprints, if anyone
>> remembers those.  Natural language, otoh, is inherently (and often
>> exceedingly) ambiguous.  So are Nekker cubes.
>>
>> So it might be helpful (if possible) to re-phrase the question to ask how
>> much data is potentially ambiguous, and at what level (syntactically,
>> morphologically, lexically, semantically, pragmatically).  By "potentially"
>> ambiguous, I mean in principle; a particular instance of a natural language
>> sentence might be syntactically unambiguous, but natural language in
>> general is syntactically ambiguous.  I suppose anything is _pragmatically_
>> ambiguous.
>>
>>    Mike Maxwell
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing listCorpora at uib.nohttp://mailman.uib.no/listinfo/corpora
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131209/a634a4e3/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list