[Corpora-List] Quotable Statistics on Unstructured Data on the WWW

Trevor Jenkins trevor.jenkins at suneidesis.com
Fri Dec 6 13:17:55 UTC 2013


On 6 Dec 2013, at 12:12, Daniel Gerber <dgerber at informatik.uni-leipzig.de> wrote:

> Hallo Adam,
> 
> On 06.12.2013, at 12:45, Adam Kilgarriff <adam at lexmasterclass.com> wrote:
> 
>> I always squirm when I hear text referred to as unstructured data.   (Daniel - I see you do too, from the '(semi-)'.)    It feels like a teenager declaring everyone over 25 as old.
> 
> As what do you see text then? Yes, I typically refer to text as being unstructured, tables and so on as semi structured und databases as structured. 

Can't speak for Adam Kilgarriff but I see text as structured with individual glyphs forming words, words forming sentences, sentences forming paragraphs, paragraphs forming chapters, chapters forming books. And a variety of similar structures. 

I see databases and their internal tables as over-restrictively based on a highly biased perception of data and information. Relational databases are not the only solution. I worked for many years as a "database" consultant who could just as easily recommend a text database as a relational, hierarchical or network solution. One of these database organisation may be a better "fit" in a particular situation but relationalism is /not/ the panacea certain software suppliers sell it to be.

Regards, Trevor.

<>< Re: deemed!


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list