[Corpora-List] (no subject)
Martin Reynaert
reynaert at uvt.nl
Thu Jul 19 23:11:31 UTC 2012
Dear Amanda,
I have a sneaky feeling you may be interested in what is called
`vocabulary growth curves'. In which case: do check out:
http://zipfr.r-forge.r-project.org/ .
If that proves to be too much all of a sudden, do check out:
H. Baayen (2001). Word frequency distributions. Kluwer, Dordrecht.
You will want to go on to his more recent publications after that ;0)
Best,
Martin
On 07/20/2012 12:43 AM, Amanda wrote:
> Dear all,
>
> Does anyone know an existing (and available) software which can
> automatically:
>
> 1. Compare every two consecutive texts in a corpus; and
> 2. List every new word that occur in the latter text?
>
> Or do you know any papers about that?
>
> Thank you for your help!
>
> All the best.
> Amanda
>
> -----邮件原件-----
> 发件人: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] 代表
> corpora-request at uib.no
> 发送时间: 2012年7月19日 11:00
> 收件人: corpora at uib.no
> 主题: Corpora Digest, Vol 61, Issue 17
>
> Today's Topics:
>
> 1. Seeking corpus for academic domain (Lushan Han)
> 2. Re: Seeking corpus for academic domain (Lushan Han)
> 3. English confusables (Carter, Simon)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 18 Jul 2012 11:03:56 -0400
> From: Lushan Han <lushan1 at umbc.edu>
> Subject: [Corpora-List] Seeking corpus for academic domain
> To: corpora at uib.no
>
> Dear all,
>
> I am looking for a very large corpus ( > 1 billion words) made for academic
> domain, mainly describing university, project, conference, paper, author and
> etc. I will compute statistics from it, which is used in building a query
> system on structured data for academic domain.
>
> Does anyone know such a corpus? Any information will be appreciated.
>
>
> Thanks,
>
> Lushan Han
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 497 bytes
> Desc: not available
> URL:
> <http://www.uib.no/mailman/public/corpora/attachments/20120718/6bd5e090/atta
> chment.txt>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 18 Jul 2012 15:30:20 -0400
> From: Lushan Han <lushan1 at umbc.edu>
> Subject: Re: [Corpora-List] Seeking corpus for academic domain
> To: corpora at uib.no
>
> A corpus of smaller size (e.g. millions of words) can also be very helpful
> to me. Please inform me if you happen to know it.
>
> Thanks,
>
> Lushan
>
> On Wed, Jul 18, 2012 at 11:03 AM, Lushan Han <lushan1 at umbc.edu> wrote:
>
>> Dear all,
>>
>> I am looking for a very large corpus ( > 1 billion words) made for
>> academic domain, mainly describing university, project, conference,
>> paper, author and etc. I will compute statistics from it, which is
>> used in building a query system on structured data for academic domain.
>>
>> Does anyone know such a corpus? Any information will be appreciated.
>>
>>
>> Thanks,
>>
>> Lushan Han
>>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 1021 bytes
> Desc: not available
> URL:
> <http://www.uib.no/mailman/public/corpora/attachments/20120718/59a42a2e/atta
> chment.txt>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 19 Jul 2012 09:28:03 +0000
> From: "Carter, Simon" <S.C.Carter at uva.nl>
> Subject: [Corpora-List] English confusables
> To: "corpora at uib.no" <corpora at uib.no>
>
> Dear Corpora List,
>
> I was wondering if there was list of English confusables somewhere?
>
> Thanks,
>
> Simon
>
>
>
>
>
>
>
>
>
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
> corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
> corpora-request at uib.no
>
> You can reach the person managing the list at
> corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific than
> "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 61, Issue 17
> ***************************************
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list