[Corpora-List] (no subject)

Thu Jul 19 22:43:46 UTC 2012

Dear all,

    Does anyone know an existing (and available) software which can
automatically:

    1. Compare every two consecutive texts in a corpus; and 
    2. List every new word that occur in the latter text?

    Or do you know any papers about that?

    Thank you for your help!

All the best.
Amanda

-----邮件原件-----
发件人: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] 代表
corpora-request at uib.no
发送时间: 2012年7月19日 11:00
收件人: corpora at uib.no
主题: Corpora Digest, Vol 61, Issue 17

Today's Topics:

   1.  Seeking corpus for academic domain (Lushan Han)
   2. Re:  Seeking corpus for academic domain (Lushan Han)
   3.  English confusables (Carter, Simon)

----------------------------------------------------------------------

Message: 1
Date: Wed, 18 Jul 2012 11:03:56 -0400
From: Lushan Han <lushan1 at umbc.edu>
Subject: [Corpora-List] Seeking corpus for academic domain
To: corpora at uib.no

Dear all,

I am looking for a very large corpus ( > 1 billion words) made for academic
domain, mainly describing university, project, conference, paper, author and
etc. I will compute statistics from it, which is used in building a query
system on structured data for academic domain.

Does anyone know such a corpus? Any information will be appreciated.

Thanks,

Lushan Han
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 497 bytes
Desc: not available
URL:
<http://www.uib.no/mailman/public/corpora/attachments/20120718/6bd5e090/atta
chment.txt>

------------------------------

Message: 2
Date: Wed, 18 Jul 2012 15:30:20 -0400
From: Lushan Han <lushan1 at umbc.edu>
Subject: Re: [Corpora-List] Seeking corpus for academic domain
To: corpora at uib.no

A corpus of smaller size (e.g. millions of words) can also be very helpful
to me.  Please inform me if you happen to know it.

Thanks,

Lushan

On Wed, Jul 18, 2012 at 11:03 AM, Lushan Han <lushan1 at umbc.edu> wrote:

> Dear all,
>
> I am looking for a very large corpus ( > 1 billion words) made for 
> academic domain, mainly describing university, project, conference, 
> paper, author and etc. I will compute statistics from it, which is 
> used in building a query system on structured data for academic domain.
>
> Does anyone know such a corpus? Any information will be appreciated.
>
>
> Thanks,
>
> Lushan Han
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1021 bytes
Desc: not available
URL:
<http://www.uib.no/mailman/public/corpora/attachments/20120718/59a42a2e/atta
chment.txt>

------------------------------

Message: 3
Date: Thu, 19 Jul 2012 09:28:03 +0000
From: "Carter, Simon" <S.C.Carter at uva.nl>
Subject: [Corpora-List] English confusables
To: "corpora at uib.no" <corpora at uib.no>

Dear Corpora List,

I was wondering if there was list of English confusables somewhere? 

Thanks,

Simon

----------------------------------------------------------------------
Send Corpora mailing list submissions to
	corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
	corpora-request at uib.no

You can reach the person managing the list at
	corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific than
"Re: Contents of Corpora digest..."

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

End of Corpora Digest, Vol 61, Issue 17
***************************************

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora