[Corpora-List] Corpus heterogeneity

Adam Kilgarriff adam at lexmasterclass.com
Wed Nov 7 07:27:43 UTC 2012


Also

   - Adam Kilgarriff  Comparing
Corpora<http://kilgarriff.co.uk/Publications/2001-K-CompCorpIJCL.pdf>2001
   *International Journal of Corpus Linguistics* 6 (1): 1-37.
   - Reprinted in *Corpus Linguistics: Critical Concepts in
Linguistics.*Teubert and Krishnamurthy, editors. Routledge. 2007.
      -

(with work on this from back in the 20th century. I think it stands up OK.
We are currently reviewing, and implementing an improved version of the
definition given there of 'corpus heterogeneity' for viewing in the Sketch
Engine.  In brief, the new definition builds on a definition of corpus
similarity, and is,  "the similarity between the two most different
parts".  We cluster documents to identify the two most different parts. )

Adam

On 6 November 2012 15:33, Stefan Th. Gries <stgries at gmail.com> wrote:

> Dear Alexander
>
> Please see: Gries, Stefan Th. 2006. Exploring variability within and
> between corpora: some methodological considerations. Corpora 1(2).
> 109-151.
>
> Cheers,
> STG
> --
> Stefan Th. Gries
> -----------------------------------------------
> University of California, Santa Barbara
> http://www.linguistics.ucsb.edu/faculty/stgries
> -----------------------------------------------
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for
English<http://www.webdante.com>
                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121107/a751bfc5/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list