[Corpora-List] Quotable Statistics on Unstructured Data on theWWW

Michele Filannino michele.filannino at cs.manchester.ac.uk
Fri Dec 6 13:18:41 UTC 2013


Up! :)


On Fri, Dec 6, 2013 at 12:52 PM, Reinhard Rapp <reinhardrapp at gmx.de> wrote:

> Dear Daniel,
>
> Please don't take this personally! Adam just in a pointed way worked out
> that there is a different view on this between linguists and engineers, and
> that he prefers the former one. It is just one of many examples where
> different communities working on similar topics look at things quite
> diffeently (e.g. semantic web community and computational linguistics
> community). It is Adam's privilege to be able to descirbe such matters in a
> very concise and entertaining way. Let's not discourage him to do so! He
> has always very interesting things to say! Political correctness is boring!
>
> Kind regards,
>
> Reinhard
>
>
> -----Ursprüngliche Nachricht----- From: Daniel Gerber
> Sent: Friday, December 6, 2013 1:12 PM
> To: Adam Kilgarriff
> Cc: corpora at uib.no
> Subject: Re: [Corpora-List] Quotable Statistics on Unstructured Data on
> theWWW
>
>
> Hallo Adam,
>
> On 06.12.2013, at 12:45, Adam Kilgarriff <adam at lexmasterclass.com> wrote:
>
>  I always squirm when I hear text referred to as unstructured data.
>> (Daniel - I see you do too, from the '(semi-)'.)    It feels like a
>> teenager declaring everyone over 25 as old.
>>
>
> As what do you see text then? Yes, I typically refer to text as being
> unstructured, tables and so on as semi structured und databases as
> structured.
> I’m sorry that you feel greatly offended by my understanding. But your
> reply does not answer my question nor does it help me to understand a
> different point of view any better.
>
>  Adam
>>
>> (PS - I first came across it in the IBM-promoted UIMA, the U is
>> unstructured, so the inventors of that acronym should be shot. Not sure if
>> the initiative is ongoing.)
>>
>
> I think you should apologize to the people you want to be shot. I can’t
> believe that someone (especially with a scientific background as you have)
> articulates in such manner.
>
> Daniel
>
>
>>
>>
>> On 6 December 2013 08:48, Daniel Gerber <dgerber at informatik.uni-
>> leipzig.de> wrote:
>> Hi,
>> I’m searching for any quotable statistics for the distribution of
>> structured vs.  (semi-)unstructured data on the web.
>> So far I could only find some blog post’s about Big Data statistics or
>> presentations which claim a 15%-85% distribution but forget to quote the
>> sources for this claim.
>>
>> Any help would be greatly appreciated,
>> Daniel
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>>
>> --
>> ========================================
>> Adam Kilgarriff                  adam at lexmasterclass.com
>> Director                                    Lexical Computing Ltd
>> Visiting Research Fellow                 University of Leeds
>> Corpora for all with the Sketch Engine
>>                         DANTE: a lexical database for English
>> ========================================
>>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Michele Filannino

CDT PhD student in Computer Science
Room IT301 - IT Building
The University of Manchester
http://www.cs.man.ac.uk/~filannim/
filannim at cs.manchester.ac.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131206/4ceb9f0c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list