[Corpora-List] List of combinations( adjective+noun and verb+prep+noun ) with statistic (~1.2M)

Christian Chiarcos christian.chiarcos at web.de
Tue Nov 8 07:16:17 UTC 2011


Dear Tural,

this looks like an interesting resource, and it augments similar data sets
obtained from other corpora. I would only ask you to define a license for
using it (Creative-Commons Attribution ?), and a way to refer to it (a
proper url ? some paper ? your thesis ?). Otherwise, people would not be
sure whether they can use your data (in a legally safe way), and how to
refer to it in their publications (your data certainly contains some
noise, and I wouldn't take responsibility for it).

At least for me, these points (especially the first) represent severe
obstacles to work with your data.

Best,
Christian

On Mon, 07 Nov 2011 09:56:16 +0100, Tural Gurbanov <madcat1991 at gmail.com>
wrote:

> Hello to everyone!
> During my master degree work i had extracted combinations like
> verb+preposition+noun and adjective+noun from reuter news dump.
> Like result I get nearly 1.2M unique combinations and the number of times
> that each of combinations occurs.
>
> The result has pushed here:
> http://zalil.ru/31971263
> http://zalil.ru/31971306
> http://zalil.ru/31971423
>
> If you looking for something like this you can take it(in every  
> combination
> i can guaranty syntactic coherence of words).
>
> In return I would like you to look a small fragment of the resulting
> combinations (500-1000 combination) for correctness, because I do not  
> have
> enough knowledge of English to a good estimate.
> And, if not a secret, tell us what problems you are going to deal with  
> this
> base. Not necessarily tell the solution - just why you need it. This is
> needed to my thesis review.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list