[Corpora-List] Quotable Statistics on Unstructured Data on the WWW
Adam Kilgarriff
adam at lexmasterclass.com
Fri Dec 6 11:45:17 UTC 2013
I always squirm when I hear text referred to as unstructured data.
(Daniel - I see you do too, from the '(semi-)'.) It feels like a
teenager declaring everyone over 25 as old.
Adam
(PS - I first came across it in the IBM-promoted UIMA, the U is
unstructured, so the inventors of that acronym should be shot. Not sure if
the initiative is ongoing.)
On 6 December 2013 08:48, Daniel Gerber
<dgerber at informatik.uni-leipzig.de>wrote:
> Hi,
> I’m searching for any quotable statistics for the distribution of
> structured vs. (semi-)unstructured data on the web.
> So far I could only find some blog post’s about Big Data statistics or
> presentations which claim a 15%-85% distribution but forget to quote the
> sources for this claim.
>
> Any help would be greatly appreciated,
> Daniel
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
--
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director Lexical Computing
Ltd<http://www.sketchengine.co.uk/>
Visiting Research Fellow University of
Leeds<http://leeds.ac.uk>
*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
*DANTE: a lexical database for English
<http://www.webdante.com> *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131206/d807bbd3/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list