[Corpora-List] corpus homogeneity
A.DeRoeck
A.Deroeck at open.ac.uk
Tue Sep 14 13:05:41 UTC 2004
And we've done some further work, using Adam's as a starting point, in
A. De Roeck, A. Sarkar and P. Garthwaite. "Frequent Term Distribution
Measures for Dataset Profiling". Proceedings of LREC, pp 1647- 1651.
Lisbon. Longer description of the work also available as a technical
report:
Technical Report Number 2004/07
Title: Defeating the Homogeneity Assumption: some findings on the
distribution of very frequent terms
Author(s): A. De Roeck, A. Sarkar, P. Garthwaite
Here
http://computing-reports.open.ac.uk/index.php/2004/200407
Anne
> -----Original Message-----
> From: owner-corpora at lists.uib.no
> [mailto:owner-corpora at lists.uib.no] On Behalf Of Adam Kilgarriff
> Sent: 13 September 2004 17:18
> To: 'Cormac O'Brien'; corpora
> Subject: RE: [Corpora-List] corpus homogeneity
>
>
> Cormac,
>
> No software to offer, but an easy-to-implement measure
> is defined in my "Comparing Corpora", Int Jnl of Corpus
> Linguistics, 6 (1) 2001 Pp 1-37, also ITRI-01-15 available at
> http://www.itri.brighton.ac.uk/techreports/
>
> Adam
>
>
> -----Original Message-----
> From: owner-corpora at lists.uib.no
> [mailto:owner-corpora at lists.uib.no] On Behalf Of Cormac
> O'Brien
> Sent: 07 September 2004 09:50
> To: corpora at hd.uib.no
> Subject: [Corpora-List] corpus homogeneity
>
>
> Hi,
>
> Does anyone have a program for testing corpus homogeneity?
> I'd be very grateful.
>
> Cormac
>
> -----------------------------------------
> Cormac O'Brien
> Postgraduate Student (M.Sc. by research)
> Computational Linguistics Group
> Trinity College, Dublin
>
> Tel: 00353 1 608 2866
>
>
>
>
>
>
More information about the Corpora
mailing list