[Corpora-List] Question: Citing Linguistic Corpora

Adam Kilgarriff adam at lexmasterclass.com
Thu Mar 7 07:36:36 UTC 2013


Dear Morteza,

Yes, you definitely should cite the corpus.

It is always likely that your POS-tagger will have failings because of
characteristics of the corpus it was trained on.  People should be able to
look at it in this light, with an account of how the corpus was prepared,
available to them.

Sometimes there is no obvious way to cite the corpus.  Sometimes a URL is
best (which is what I do for example for the BNC, as the website is
long-life and with full and good documentation, and the only alternative is
to a technical report that no-one is actually going to track down).  As a
producer of corpora, I aim to write them up in a paper that is easy to find
and to read and serves as a reference.

 Adam

On 7 March 2013 06:27, M. Rezaei <mrezaeis at mehr.sharif.ir> wrote:

> Dear all,
> Salam.
> Suppose I use a text corpus and I extract some statistical information
> from it or I train a POS tagger based on it. Well, I have used the corpus,
> but I have not directly used the paper which describes it i.e. I have not
> quoted a paragraph from the paper in my research. Is there any standard
> style for citing the corpus itself, as a data set? Is it a good idea to do
> so? What about the corpus authors, do they prefer users to cite their paper
> rather than the corpus itself?
> Looking forward to receiving your responses.
> Best Regards
> Morteza Rezaei
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for
English<http://www.webdante.com>
                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130307/0efa56bc/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list