[Corpora-List] Question: Citing Linguistic Corpora

Paul Thompson Paul.Thompson at manchester.ac.uk
Thu Mar 7 09:36:51 UTC 2013


For specific domains, it is certainly possible to publish corpora in ISI journals.

For example, I have published a couple of articles in BMC Bioinformatics about annotated corpora in the biomedical domain, and there are several other articles that I know of in this journal that describe annotated corpora.

Thompson, P., Iqbal, S. A., McNaught, J. and Ananiadou, S.. (2009). Construction of an annotated corpus to support biomedical information extraction. In: BMC Bioinformatics, 10:349
http://www.biomedcentral.com/1471-2105/10/349

Thompson, P., Nawaz, R., McNaught, J. and Ananiadou, S.. (2011). Enriching a biomedical event corpus with meta-knowledge annotation. In: BMC Bioinformatics, 12:393
http://www.biomedcentral.com/1471-2105/12/393

Best wishes,

Paul

On 7 Mar 2013, at 08:03, Marc Brysbaert wrote:

Hi,

Researchers get most credit for their work when it is published in a journal that features in ISI or Scopus, as it is then used for all types of metrics (whether you like this or not). From my own experience, I’ve noticed that it is not so easy, however, to get manuscripts on corpora (or word frequency lists) published, even though they are well cited. Does anyone have a list of ISI journals that publish information on corpora? Thus far I have published most of my findings in Behavior Research Methods, but this is aimed at a psychological audience (and hence will only accept papers that are interesting for them).

Best, marc

From: corpora-bounces at uib.no<mailto:corpora-bounces at uib.no> [mailto:corpora-bounces at uib.no] On Behalf Of Adam Kilgarriff
Sent: 07 March 2013 08:37
To: M. Rezaei
Cc: corpora at uib.no<mailto:corpora at uib.no>
Subject: Re: [Corpora-List] Question: Citing Linguistic Corpora

Dear Morteza,

Yes, you definitely should cite the corpus.

It is always likely that your POS-tagger will have failings because of characteristics of the corpus it was trained on.  People should be able to look at it in this light, with an account of how the corpus was prepared, available to them.

Sometimes there is no obvious way to cite the corpus.  Sometimes a URL is best (which is what I do for example for the BNC, as the website is long-life and with full and good documentation, and the only alternative is to a technical report that no-one is actually going to track down).  As a producer of corpora, I aim to write them up in a paper that is easy to find and to read and serves as a reference.

 Adam

On 7 March 2013 06:27, M. Rezaei <mrezaeis at mehr.sharif.ir<mailto:mrezaeis at mehr.sharif.ir>> wrote:
Dear all,
Salam.
Suppose I use a text corpus and I extract some statistical information from it or I train a POS tagger based on it. Well, I have used the corpus, but I have not directly used the paper which describes it i.e. I have not quoted a paragraph from the paper in my research. Is there any standard style for citing the corpus itself, as a data set? Is it a good idea to do so? What about the corpus authors, do they prefer users to cite their paper rather than the corpus itself?
Looking forward to receiving your responses.
Best Regards
Morteza Rezaei

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no<mailto:Corpora at uib.no>
http://mailman.uib.no/listinfo/corpora



--
========================================
Adam Kilgarriff<http://www.kilgarriff.co.uk/>                  adam at lexmasterclass.com<mailto:adam at lexmasterclass.com>
Director                                    Lexical Computing Ltd<http://www.sketchengine.co.uk/>
Visiting Research Fellow                 University of Leeds<http://leeds.ac.uk>
Corpora for all with the Sketch Engine<http://www.sketchengine.co.uk>
                        DANTE: a lexical database for English<http://www.webdante.com>
========================================
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no<mailto:Corpora at uib.no>
http://mailman.uib.no/listinfo/corpora


--------

Paul Thompson
Research Associate
School of Computer Science
National Centre for Text Mining
Manchester Institute of Biotechnology
University of Manchester
131 Princess Street
Manchester
M1 7DN
UK
Tel: 0161 306 3091
http://personalpages.manchester.ac.uk/staff/Paul.Thompson/





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130307/add15705/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list