[Corpora-List] corpus homogeneity

A.DeRoeck A.Deroeck at open.ac.uk
Tue Sep 14 13:05:41 UTC 2004


And we've done some further work, using Adam's as a starting point, in 

A. De Roeck, A. Sarkar and P. Garthwaite. "Frequent Term Distribution
Measures for Dataset Profiling". Proceedings of LREC, pp 1647- 1651.
Lisbon. Longer description of the work also available as a technical
report:

Technical Report Number 2004/07  
Title:  Defeating the Homogeneity Assumption: some findings on the
distribution of very frequent terms  
Author(s):  A. De Roeck, A. Sarkar, P. Garthwaite  

Here
http://computing-reports.open.ac.uk/index.php/2004/200407

Anne

> -----Original Message-----
> From: owner-corpora at lists.uib.no 
> [mailto:owner-corpora at lists.uib.no] On Behalf Of Adam Kilgarriff
> Sent: 13 September 2004 17:18
> To: 'Cormac O'Brien'; corpora
> Subject: RE: [Corpora-List] corpus homogeneity
> 
> 
> Cormac,
> 
> 	No software to offer, but an easy-to-implement measure 
> is defined in my "Comparing Corpora", Int Jnl of Corpus 
> Linguistics, 6 (1) 2001 Pp 1-37, also ITRI-01-15 available at 
> http://www.itri.brighton.ac.uk/techreports/ 
> 
> Adam
> 
> 
> -----Original Message-----
> From: owner-corpora at lists.uib.no 
> [mailto:owner-corpora at lists.uib.no] On Behalf Of Cormac 
> O'Brien
> Sent: 07 September 2004 09:50
> To: corpora at hd.uib.no
> Subject: [Corpora-List] corpus homogeneity
> 
> 
> Hi,
> 
> Does anyone have a program for testing corpus homogeneity? 
> I'd be very grateful.
> 
> Cormac
> 
> -----------------------------------------
> Cormac O'Brien
> Postgraduate Student (M.Sc. by research)
> Computational Linguistics Group
> Trinity College, Dublin
> 
> Tel: 00353 1 608 2866
> 
> 
> 
> 
> 
> 



More information about the Corpora mailing list