[Corpora-List] Question: Citing Linguistic Corpora

Marc Brysbaert marc.brysbaert at ugent.be
Thu Mar 7 08:52:42 UTC 2013


The International Journal of Computer Linguistics is in the ISI. So, it is
in the authors' (and the journal's) interest to have a citation to their
article. At the same time, chances are higher readers will use the reference
if they feel they have easy access to it. So, I'd go for something like the
following:

 

Al-Sulaiti, L., & Atwell, E. S. (2006). The design of a corpus of
contemporary Arabic. International Journal of Corpus Linguistics, 11(2),
135-171. Retrieved March 7, 2013, from
http://www.comp.leeds.ac.uk/eric/latifa/research.htm.

 

In this way you have the best of both worlds J

 

Best, marc

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of M.
Rezaei
Sent: 07 March 2013 09:30
To: corpora at uib.no
Subject: Re: [Corpora-List] Question: Citing Linguistic Corpora

 

Dear Adam, Eric, and Marc

Thank you for your responses. 

 

Suppose I use the Corpus of Contemporary Arabic for some NLP or corpus
linguistics purpose, would it be strange to cite it as follows:

Al-Sulaiti, L., & Atwell, E. S. (2006). Corpus of Contemporary Arabic (CCA).
Leeds, UK: University of Leeds. Retrieved from
http://www.comp.leeds.ac.uk/eric/latifa/research.htm

 

instead of:

Al-Sulaiti, L., & Atwell, E. S. (2006). The design of a corpus of
contemporary Arabic. International Journal of Corpus Linguistics, 11(2),
135-171.

 

?

 

On Thu, Mar 7, 2013 at 11:33 AM, Marc Brysbaert <marc.brysbaert at ugent.be>
wrote:

Hi,

 

Researchers get most credit for their work when it is published in a journal
that features in ISI or Scopus, as it is then used for all types of metrics
(whether you like this or not). From my own experience, I've noticed that it
is not so easy, however, to get manuscripts on corpora (or word frequency
lists) published, even though they are well cited. Does anyone have a list
of ISI journals that publish information on corpora? Thus far I have
published most of my findings in Behavior Research Methods, but this is
aimed at a psychological audience (and hence will only accept papers that
are interesting for them).

 

Best, marc

 

On Thu, Mar 7, 2013 at 11:04 AM, Eric Atwell <E.S.Atwell at leeds.ac.uk> wrote:

Morteza,

This question is timely in the UK where we are preparing for REF.
Whatever Corpus Linguists may think, the wider academic world expects
citations of published journal/conference papers or books. So, when a
corpus is created, the developers should also publish a paper or book on the
research undertaken to develop the corpus, and this is what you
should cite. Even if you don't directly quote from the paper, you are
citing the academic research idea embodied in the paper. Sometimes a corpus
project can lead to several publications.
It is good practice for creators of a corpus to nominate a specific paper
whcih should be cited by users of the corpus, e.g. on the website where
you get the corpus from. This helps people like you who want to know
what to cite; and it helps the corpus creators to accumulate due credit for
their work. For example for REF, we nominate up to 4 key papers for
assessment, so it helps if others cite these specific 4 papers.

Eric Atwell, Leeds University

-- 

Eric Atwell, Associate Professor, Language research group,
 I-AIBS Institute for Artificial Intelligence and Biological Systems
 School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
 Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
 WWW: http://www.comp.leeds.ac.uk/eric
      http://www.comp.leeds.ac.uk/nlp
      http://www.comp.leeds.ac.uk/arabic

 

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Adam Kilgarriff
Sent: 07 March 2013 08:37
To: M. Rezaei
Cc: corpora at uib.no
Subject: Re: [Corpora-List] Question: Citing Linguistic Corpora

 

Dear Morteza,

 

Yes, you definitely should cite the corpus.

 

It is always likely that your POS-tagger will have failings because of
characteristics of the corpus it was trained on.  People should be able to
look at it in this light, with an account of how the corpus was prepared,
available to them.

 

Sometimes there is no obvious way to cite the corpus.  Sometimes a URL is
best (which is what I do for example for the BNC, as the website is
long-life and with full and good documentation, and the only alternative is
to a technical report that no-one is actually going to track down).  As a
producer of corpora, I aim to write them up in a paper that is easy to find
and to read and serves as a reference.

 

 Adam

-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com                                             
Director                                    Lexical Computing Ltd
<http://www.sketchengine.co.uk/>                 
Visiting Research Fellow                 University of Leeds
<http://leeds.ac.uk/>      

Corpora for all with the Sketch Engine <http://www.sketchengine.co.uk/>


                        DANTE: <http://www.webdante.com/>  a lexical
database for English                  

======================================== 

 

On 7 March 2013 06:27, M. Rezaei <mrezaeis at mehr.sharif.ir> wrote:

Dear all,

Salam.

Suppose I use a text corpus and I extract some statistical information from
it or I train a POS tagger based on it. Well, I have used the corpus, but I
have not directly used the paper which describes it i.e. I have not quoted a
paragraph from the paper in my research. Is there any standard style for
citing the corpus itself, as a data set? Is it a good idea to do so? What
about the corpus authors, do they prefer users to cite their paper rather
than the corpus itself?

Looking forward to receiving your responses.

Best Regards

Morteza Rezaei

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130307/7ac8e3fc/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list