[Corpora-List] (no subject)

Martin Reynaert reynaert at uvt.nl
Thu Jul 19 23:11:31 UTC 2012


Dear Amanda,

I have a sneaky feeling you may be interested in what is called
`vocabulary growth curves'. In which case: do check out:
http://zipfr.r-forge.r-project.org/ .

If that proves to be too much all of a sudden, do check out:

H. Baayen (2001). Word frequency distributions. Kluwer, Dordrecht.

You will want to go on to his more recent publications after that ;0)

Best,

Martin

On 07/20/2012 12:43 AM, Amanda wrote:
> Dear all,
>
>     Does anyone know an existing (and available) software which can
> automatically:
>
>     1. Compare every two consecutive texts in a corpus; and 
>     2. List every new word that occur in the latter text?
>
>     Or do you know any papers about that?
>
>     Thank you for your help!
>
> All the best.
> Amanda
>
> -----邮件原件-----
> 发件人: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] 代表
> corpora-request at uib.no
> 发送时间: 2012年7月19日 11:00
> 收件人: corpora at uib.no
> 主题: Corpora Digest, Vol 61, Issue 17
>
> Today's Topics:
>
>    1.  Seeking corpus for academic domain (Lushan Han)
>    2. Re:  Seeking corpus for academic domain (Lushan Han)
>    3.  English confusables (Carter, Simon)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 18 Jul 2012 11:03:56 -0400
> From: Lushan Han <lushan1 at umbc.edu>
> Subject: [Corpora-List] Seeking corpus for academic domain
> To: corpora at uib.no
>
> Dear all,
>
> I am looking for a very large corpus ( > 1 billion words) made for academic
> domain, mainly describing university, project, conference, paper, author and
> etc. I will compute statistics from it, which is used in building a query
> system on structured data for academic domain.
>
> Does anyone know such a corpus? Any information will be appreciated.
>
>
> Thanks,
>
> Lushan Han
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 497 bytes
> Desc: not available
> URL:
> <http://www.uib.no/mailman/public/corpora/attachments/20120718/6bd5e090/atta
> chment.txt>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 18 Jul 2012 15:30:20 -0400
> From: Lushan Han <lushan1 at umbc.edu>
> Subject: Re: [Corpora-List] Seeking corpus for academic domain
> To: corpora at uib.no
>
> A corpus of smaller size (e.g. millions of words) can also be very helpful
> to me.  Please inform me if you happen to know it.
>
> Thanks,
>
> Lushan
>
> On Wed, Jul 18, 2012 at 11:03 AM, Lushan Han <lushan1 at umbc.edu> wrote:
>
>> Dear all,
>>
>> I am looking for a very large corpus ( > 1 billion words) made for 
>> academic domain, mainly describing university, project, conference, 
>> paper, author and etc. I will compute statistics from it, which is 
>> used in building a query system on structured data for academic domain.
>>
>> Does anyone know such a corpus? Any information will be appreciated.
>>
>>
>> Thanks,
>>
>> Lushan Han
>>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 1021 bytes
> Desc: not available
> URL:
> <http://www.uib.no/mailman/public/corpora/attachments/20120718/59a42a2e/atta
> chment.txt>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 19 Jul 2012 09:28:03 +0000
> From: "Carter, Simon" <S.C.Carter at uva.nl>
> Subject: [Corpora-List] English confusables
> To: "corpora at uib.no" <corpora at uib.no>
>
> Dear Corpora List,
>
> I was wondering if there was list of English confusables somewhere? 
>
> Thanks,
>
> Simon
>
>
>
>
>
>
>
>
>
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
> 	corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
> 	corpora-request at uib.no
>
> You can reach the person managing the list at
> 	corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific than
> "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 61, Issue 17
> ***************************************
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list