[Corpora-List] Question: Citing Linguistic Corpora

M. Rezaei mrezaeis at mehr.sharif.ir
Thu Mar 7 08:29:30 UTC 2013


Dear Adam, Eric, and Marc
Thank you for your responses.

Suppose I use the Corpus of Contemporary Arabic for some NLP or corpus
linguistics purpose, would it be strange to cite it as follows:
Al-Sulaiti, L., & Atwell, E. S. (2006). Corpus of Contemporary Arabic
(CCA). Leeds, UK: University of Leeds. Retrieved from
http://www.comp.leeds.ac.uk/eric/latifa/research.htm

instead of:
Al-Sulaiti, L., & Atwell, E. S. (2006). The design of a corpus of
contemporary Arabic. *International Journal of Corpus Linguistics*, *11*(2),
135-171.

?

On Thu, Mar 7, 2013 at 11:33 AM, Marc Brysbaert <marc.brysbaert at ugent.be>wrote:

>  Hi,****
>
> ** **
>
> Researchers get most credit for their work when it is published in a
> journal that features in ISI or Scopus, as it is then used for all types of
> metrics (whether you like this or not). From my own experience, I’ve
> noticed that it is not so easy, however, to get manuscripts on corpora (or
> word frequency lists) published, even though they are well cited. Does
> anyone have a list of ISI journals that publish information on corpora?
> Thus far I have published most of my findings in Behavior Research Methods,
> but this is aimed at a psychological audience (and hence will only accept
> papers that are interesting for them).****
>
> ** **
>
> Best, marc****
>
> **
>

On Thu, Mar 7, 2013 at 11:04 AM, Eric Atwell <E.S.Atwell at leeds.ac.uk> wrote:

> Morteza,
>
> This question is timely in the UK where we are preparing for REF.
> Whatever Corpus Linguists may think, the wider academic world expects
> citations of published journal/conference papers or books. So, when a
> corpus is created, the developers should also publish a paper or book on
> the research undertaken to develop the corpus, and this is what you
> should cite. Even if you don't directly quote from the paper, you are
> citing the academic research idea embodied in the paper. Sometimes a
> corpus project can lead to several publications.
> It is good practice for creators of a corpus to nominate a specific paper
> whcih should be cited by users of the corpus, e.g. on the website where
> you get the corpus from. This helps people like you who want to know
> what to cite; and it helps the corpus creators to accumulate due credit
> for their work. For example for REF, we nominate up to 4 key papers for
> assessment, so it helps if others cite these specific 4 papers.
>
> Eric Atwell, Leeds University

-- 
Eric Atwell, Associate Professor, Language research group,
 I-AIBS Institute for Artificial Intelligence and Biological Systems
 School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
 Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
 WWW: http://www.comp.leeds.ac.uk/**eric <http://www.comp.leeds.ac.uk/eric>
      http://www.comp.leeds.ac.uk/**nlp <http://www.comp.leeds.ac.uk/nlp>
      http://www.comp.leeds.ac.uk/**arabic<http://www.comp.leeds.ac.uk/arabic>

>

 **
>
> *From:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] *On Behalf
> Of *Adam Kilgarriff
> *Sent:* 07 March 2013 08:37
> *To:* M. Rezaei
> *Cc:* corpora at uib.no
> *Subject:* Re: [Corpora-List] Question: Citing Linguistic Corpora****
>
> ** **
>
> Dear Morteza,****
>
> ** **
>
> Yes, you definitely should cite the corpus.****
>
> ** **
>
> It is always likely that your POS-tagger will have failings because of
> characteristics of the corpus it was trained on.  People should be able to
> look at it in this light, with an account of how the corpus was prepared,
> available to them.****
>
> ** **
>
> Sometimes there is no obvious way to cite the corpus.  Sometimes a URL is
> best (which is what I do for example for the BNC, as the website is
> long-life and with full and good documentation, and the only alternative is
> to a technical report that no-one is actually going to track down).  As a
> producer of corpora, I aim to write them up in a paper that is easy to find
> and to read and serves as a reference.****
>
> ** **
>
>  Adam
>
-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk/>
  ****

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk/>
              ****

                        *DANTE: a lexical database for
English<http://www.webdante.com/>
                  *****
========================================

> ****
>
> ** **
>
> On 7 March 2013 06:27, M. Rezaei <mrezaeis at mehr.sharif.ir> wrote:****
>
> Dear all,****
>
> Salam.****
>
> Suppose I use a text corpus and I extract some statistical information
> from it or I train a POS tagger based on it. Well, I have used the corpus,
> but I have not directly used the paper which describes it i.e. I have not
> quoted a paragraph from the paper in my research. Is there any standard
> style for citing the corpus itself, as a data set? Is it a good idea to do
> so? What about the corpus authors, do they prefer users to cite their paper
> rather than the corpus itself?****
>
> Looking forward to receiving your responses.****
>
> Best Regards****
>
> Morteza Rezaei
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130307/88a4812d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list