[Corpora-List] (no subject)

Ulugbek Nurmukhamedov un3 at nau.edu
Fri Jul 20 02:54:35 UTC 2012


or something like this.   http://www.lextutor.ca/text_lex_compare/

You enter two texts and the software will indicate overlapping words as
well as unique words.

B

On Thu, Jul 19, 2012 at 4:11 PM, Martin Reynaert <reynaert at uvt.nl> wrote:

> Dear Amanda,
>
> I have a sneaky feeling you may be interested in what is called
> `vocabulary growth curves'. In which case: do check out:
> http://zipfr.r-forge.r-project.org/ .
>
> If that proves to be too much all of a sudden, do check out:
>
> H. Baayen (2001). Word frequency distributions. Kluwer, Dordrecht.
>
> You will want to go on to his more recent publications after that ;0)
>
> Best,
>
> Martin
>
> On 07/20/2012 12:43 AM, Amanda wrote:
> > Dear all,
> >
> >     Does anyone know an existing (and available) software which can
> > automatically:
> >
> >     1. Compare every two consecutive texts in a corpus; and
> >     2. List every new word that occur in the latter text?
> >
> >     Or do you know any papers about that?
> >
> >     Thank you for your help!
> >
> > All the best.
> > Amanda
> >
> > -----邮件原件-----
> > 发件人: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] 代表
> > corpora-request at uib.no
> > 发送时间: 2012年7月19日 11:00
> > 收件人: corpora at uib.no
> > 主题: Corpora Digest, Vol 61, Issue 17
> >
> > Today's Topics:
> >
> >    1.  Seeking corpus for academic domain (Lushan Han)
> >    2. Re:  Seeking corpus for academic domain (Lushan Han)
> >    3.  English confusables (Carter, Simon)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 18 Jul 2012 11:03:56 -0400
> > From: Lushan Han <lushan1 at umbc.edu>
> > Subject: [Corpora-List] Seeking corpus for academic domain
> > To: corpora at uib.no
> >
> > Dear all,
> >
> > I am looking for a very large corpus ( > 1 billion words) made for
> academic
> > domain, mainly describing university, project, conference, paper, author
> and
> > etc. I will compute statistics from it, which is used in building a query
> > system on structured data for academic domain.
> >
> > Does anyone know such a corpus? Any information will be appreciated.
> >
> >
> > Thanks,
> >
> > Lushan Han
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: not available
> > Type: text/html
> > Size: 497 bytes
> > Desc: not available
> > URL:
> > <
> http://www.uib.no/mailman/public/corpora/attachments/20120718/6bd5e090/atta
> > chment.txt>
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Wed, 18 Jul 2012 15:30:20 -0400
> > From: Lushan Han <lushan1 at umbc.edu>
> > Subject: Re: [Corpora-List] Seeking corpus for academic domain
> > To: corpora at uib.no
> >
> > A corpus of smaller size (e.g. millions of words) can also be very
> helpful
> > to me.  Please inform me if you happen to know it.
> >
> > Thanks,
> >
> > Lushan
> >
> > On Wed, Jul 18, 2012 at 11:03 AM, Lushan Han <lushan1 at umbc.edu> wrote:
> >
> >> Dear all,
> >>
> >> I am looking for a very large corpus ( > 1 billion words) made for
> >> academic domain, mainly describing university, project, conference,
> >> paper, author and etc. I will compute statistics from it, which is
> >> used in building a query system on structured data for academic domain.
> >>
> >> Does anyone know such a corpus? Any information will be appreciated.
> >>
> >>
> >> Thanks,
> >>
> >> Lushan Han
> >>
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: not available
> > Type: text/html
> > Size: 1021 bytes
> > Desc: not available
> > URL:
> > <
> http://www.uib.no/mailman/public/corpora/attachments/20120718/59a42a2e/atta
> > chment.txt>
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Thu, 19 Jul 2012 09:28:03 +0000
> > From: "Carter, Simon" <S.C.Carter at uva.nl>
> > Subject: [Corpora-List] English confusables
> > To: "corpora at uib.no" <corpora at uib.no>
> >
> > Dear Corpora List,
> >
> > I was wondering if there was list of English confusables somewhere?
> >
> > Thanks,
> >
> > Simon
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ----------------------------------------------------------------------
> > Send Corpora mailing list submissions to
> >       corpora at uib.no
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >       http://mailman.uib.no/listinfo/corpora
> > or, via email, send a message with subject or body 'help' to
> >       corpora-request at uib.no
> >
> > You can reach the person managing the list at
> >       corpora-owner at uib.no
> >
> > When replying, please edit your Subject line so it is more specific than
> > "Re: Contents of Corpora digest..."
> >
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
> >
> > End of Corpora Digest, Vol 61, Issue 17
> > ***************************************
> >
> >
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Ulugbek Nurmukhamedov,
Northern Arizona University,
GSAAL page - http://www.cal.nau.edu/gsaal/

Be not content with stories of those who went before you. Go forth and
create your own story (Mawlana al-Rumi)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120719/b0ef3834/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list