[Corpora-List] The genre of the Web

Paula Chesley pchesley at buffalo.edu
Mon Sep 19 03:05:23 UTC 2005


Marina Santini at Brighton has done a lot of work concerning genres and 
the Web. Her website is at:

http://www.itri.brighton.ac.uk/~Marina.Santini/

I'm sure she'd be aware of many of the quantitative numbers you're 
interested in.

Cheers,
Paula


Mark Davies wrote:
> I'm looking for publications or URLs that look at the genre of the web in quantitative terms.
>  
> In other words, if one looks at the four major genres/registers SPOKEN, FICTION, NEWSPAPER, ACADEMIC, most would probably agree that the web is more like NEWSPAPER and ACADEMIC than it is SPOKEN or FICTION, although there are certainly bits and pieces of all of these genres/registers on the web.
>  
> I imagine that something like the following has already been done, but it would seem that a person could look at the frequency of 50-60 words or phrases in the major genres/registers of the BNC, for example, and then compare this to the frequency of the same words and phrases on the Web.  In quantitative terms, the web would be "most like" the register with the highest correlation coefficient. 
>  
> Three notes: 
> 1) A BNC-based site like VIEW [http://view.byu.edu] allows users to quickly compare the frequency in different registers [use "Charts" on the VIEW site]. 
> 2) This assumes we can abstract away from the basic methodological problem of calculating frequencies from the web -- an issues that has been discussed in a number of threads here on CORPORA.
> 3) This is a very simplistic lexically-oriented comparison, with no attempt to look at syntactic features, etc.
>  
> On the other hand, does it even make sense to try and relate the overall genre orientation of the web to one of these four or five discrete genres?  Would it be better to simply refer to it as as mix of GENRE1 + GENRE2?  Going even further, does it make sense to even try and relate the web to pre-defined genres, rather than perhaps just referring to it as its own "Web" register?
>  
> Thanks in advance,
>  
> Mark Davies
>  
> =================================================
> Mark Davies
> Assoc. Prof., Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> http://davies-linguistics.byu.edu
> 
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ================================================= 
>  
> 
> 
> 

-- 
Two-thirds of the population of New Orleans is [was] black. More than a
quarter of the city lives [lived] in poverty.
        --"From Margins of Society to Center of the Tragedy",
        NY Times, 2 Sept. 2005



More information about the Corpora mailing list