[Corpora-List] Question: Citing Linguistic Corpora

Eric Atwell E.S.Atwell at leeds.ac.uk
Thu Mar 7 09:00:49 UTC 2013


If you search on Scopus for "Corpus of Contemporary Arabic"
you find the IJCL paper, not the source website:

http://www.scopus.com/results/results.url?sort=plf-f&src=s&st1=Corpus+of+Contemporary+Arabic&sid=503D1412F0FBFFA5AE251576FBA8F943.WXhD7YyTQ6A7Pvk9AlA%3a170&sot=b&sdt=b&sl=36&s=TITLE%28Corpus+of+Contemporary+Arabic%29&origin=searchbasic&txGid=503D1412F0FBFFA5AE251576FBA8F943.WXhD7YyTQ6A7Pvk9AlA%3a17

So, i would prefer you to cite the journal paper, as the standard 
across other academic disciplines is to cite journal papers.

Of course, you could always cite both!

Eric Atwell, Leeds University


PS maybe we should set up a web-page for corpus linguists
to support each other in the run-up to REF, 
PleaseCiteMyRefPapers.html - where UK corpus linguists
can post their 4 REF paper references, so that we can crowd-source
REF paper citations by citing each other when we write new papers!


On Thu, 7 Mar 2013, M. Rezaei wrote:

> Dear Adam, Eric, and Marc
> Thank you for your responses. 
> 
> Suppose I use the Corpus of Contemporary Arabic for some NLP or corpus
> linguistics purpose, would it be strange to cite it as follows:
> Al-Sulaiti, L., & Atwell, E. S. (2006). Corpus of Contemporary Arabic (CCA).
> Leeds, UK: University of Leeds. Retrieved from
> http://www.comp.leeds.ac.uk/eric/latifa/research.htm
> 
> instead of:
> Al-Sulaiti, L., & Atwell, E. S. (2006). The design of a corpus of
> contemporary Arabic. International Journal of Corpus Linguistics, 11(2),
> 135-171.
> 
> ?
> 
> On Thu, Mar 7, 2013 at 11:33 AM, Marc Brysbaert <marc.brysbaert at ugent.be>
> wrote:
>
>       Hi,
>
>        
>
>       Researchers get most credit for their work when it is published
>       in a journal that features in ISI or Scopus, as it is then used
>       for all types of metrics (whether you like this or not). From my
>       own experience, I’ve noticed that it is not so easy, however, to
>       get manuscripts on corpora (or word frequency lists) published,
>       even though they are well cited. Does anyone have a list of ISI
>       journals that publish information on corpora? Thus far I have
>       published most of my findings in Behavior Research Methods, but
>       this is aimed at a psychological audience (and hence will only
>       accept papers that are interesting for them).
>
>        
>
>       Best, marc
> 
> 
> On Thu, Mar 7, 2013 at 11:04 AM, Eric Atwell <E.S.Atwell at leeds.ac.uk> wrote:
>       Morteza,
>
>       This question is timely in the UK where we are preparing for
>       REF.
>       Whatever Corpus Linguists may think, the wider academic world
>       expects
>       citations of published journal/conference papers or books. So,
>       when a
>       corpus is created, the developers should also publish a paper or
>       book on the research undertaken to develop the corpus, and this
>       is what you
>       should cite. Even if you don't directly quote from the paper,
>       you are
>       citing the academic research idea embodied in the paper.
>       Sometimes a corpus project can lead to several publications.
>       It is good practice for creators of a corpus to nominate a
>       specific paper
>       whcih should be cited by users of the corpus, e.g. on the
>       website where
>       you get the corpus from. This helps people like you who want to
>       know
>       what to cite; and it helps the corpus creators to accumulate due
>       credit for their work. For example for REF, we nominate up to 4
>       key papers for
>       assessment, so it helps if others cite these specific 4 papers.
>
>       Eric Atwell, Leeds University
> 
> -- 
> Eric Atwell, Associate Professor, Language research group,
>  I-AIBS Institute for Artificial Intelligence and Biological Systems
>  School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
>  Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
>  WWW: http://www.comp.leeds.ac.uk/eric
>       http://www.comp.leeds.ac.uk/nlp
>       http://www.comp.leeds.ac.uk/arabic
>        
>
>        
>
>       From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On
>       Behalf Of Adam Kilgarriff
>       Sent: 07 March 2013 08:37
>       To: M. Rezaei
>       Cc: corpora at uib.no
>       Subject: Re: [Corpora-List] Question: Citing Linguistic Corpora
> 
>  
> 
> Dear Morteza,
> 
>  
> 
> Yes, you definitely should cite the corpus.
> 
>  
> 
> It is always likely that your POS-tagger will have failings because of
> characteristics of the corpus it was trained on.  People should be
> able to look at it in this light, with an account of how the corpus
> was prepared, available to them.
> 
>  
> 
> Sometimes there is no obvious way to cite the corpus.  Sometimes a URL
> is best (which is what I do for example for the BNC, as the website is
> long-life and with full and good documentation, and the only
> alternative is to a technical report that no-one is actually going to
> track down).  As a producer of corpora, I aim to write them up in a
> paper that is easy to find and to read and serves as a reference.
> 
>  
> 
>  Adam
> 
> -- 
> ========================================
> Adam Kilgarriff                  adam at lexmasterclass.com                   
>                          
> Director                                    Lexical Computing Ltd          
>      
> Visiting Research Fellow                 University of Leeds     
> 
> Corpora for all with the Sketch Engine                 
> 
>                         DANTE: a lexical database for English              
>    
> 
> ======================================== 
> 
>  
> 
> On 7 March 2013 06:27, M. Rezaei <mrezaeis at mehr.sharif.ir> wrote:
> 
> Dear all,
> 
> Salam.
> 
> Suppose I use a text corpus and I extract some statistical information
> from it or I train a POS tagger based on it. Well, I have used the
> corpus, but I have not directly used the paper which describes it i.e.
> I have not quoted a paragraph from the paper in my research. Is there
> any standard style for citing the corpus itself, as a data set? Is it
> a good idea to do so? What about the corpus authors, do they prefer
> users to cite their paper rather than the corpus itself?
> 
> Looking forward to receiving your responses.
> 
> Best Regards
> 
> Morteza Rezaei
> 
> 
>

-- 
Eric Atwell, Associate Professor, Language research group,
  I-AIBS Institute for Artificial Intelligence and Biological Systems
  School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
  Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
  WWW: http://www.comp.leeds.ac.uk/eric
       http://www.comp.leeds.ac.uk/nlp
       http://www.comp.leeds.ac.uk/arabic
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list