[Corpora-List] Quotable Statistics on Unstructured Data on the WWW

Adam Kilgarriff adam at lexmasterclass.com
Fri Dec 6 13:23:09 UTC 2013


there's phrase structure and dependency structure and morphological
structure and text structure and rhetorical structure and semantic structure



On 6 December 2013 12:12, Daniel Gerber
<dgerber at informatik.uni-leipzig.de>wrote:

> Hallo Adam,
>
> On 06.12.2013, at 12:45, Adam Kilgarriff <adam at lexmasterclass.com> wrote:
>
> > I always squirm when I hear text referred to as unstructured data.
> (Daniel - I see you do too, from the '(semi-)'.)    It feels like a
> teenager declaring everyone over 25 as old.
>
> As what do you see text then? Yes, I typically refer to text as being
> unstructured, tables and so on as semi structured und databases as
> structured.
> I’m sorry that you feel greatly offended by my understanding. But your
> reply does not answer my question nor does it help me to understand a
> different point of view any better.
>
> > Adam
> >
> > (PS - I first came across it in the IBM-promoted UIMA, the U is
> unstructured, so the inventors of that acronym should be shot. Not sure if
> the initiative is ongoing.)
>
> I think you should apologize to the people you want to be shot. I can’t
> believe that someone (especially with a scientific background as you have)
> articulates in such manner.
>
> Daniel
>
> >
> >
> >
> > On 6 December 2013 08:48, Daniel Gerber <
> dgerber at informatik.uni-leipzig.de> wrote:
> > Hi,
> > I’m searching for any quotable statistics for the distribution of
> structured vs.  (semi-)unstructured data on the web.
> > So far I could only find some blog post’s about Big Data statistics or
> presentations which claim a 15%-85% distribution but forget to quote the
> sources for this claim.
> >
> > Any help would be greatly appreciated,
> > Daniel
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
> >
> >
> > --
> > ========================================
> > Adam Kilgarriff                  adam at lexmasterclass.com
> > Director                                    Lexical Computing Ltd
> > Visiting Research Fellow                 University of Leeds
> > Corpora for all with the Sketch Engine
> >                         DANTE: a lexical database for English
> > ========================================
>
>


-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for English
<http://www.webdante.com>                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131206/29d0099b/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list