<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#663300" bgcolor="#FFFFFF">

    <div class="moz-text-plain" wrap="true" graphical-quote="true"

      style="font-family: -moz-fixed; font-size: 14px;" lang="x-unicode">

      <pre wrap="">

Dear Colleagues

I followed the discussion about citing Language resources and it reinforced our 

views for the need to cite properly the resources our community is using.

I will not go through the importance of doing this here, I am sure we all agree 

it is critical and urgent.

We were about to announce an initiative to respond to such needs, I am happy to 

anticipate on our plans.

almost two years ago, at the FlareNET meeting 

(<a class="moz-txt-link-freetext" href="http://www.flarenet.eu/sites/default/files/S1_Choukri_Position_Paper.pdf">http://www.flarenet.eu/sites/default/files/S1_Choukri_Position_Paper.pdf</a>), I 

made a proposal on the assignment of Persistent and Unique identifiers to 

Language resources.

The idea was to go beyond the current references via URls (or DOI) to ensure 

that we have a really permanent Identifier that would cover all existing data 

sets including those not publicly available  (or not available on Internet).

The idea was discussed with major data centers managed/serving  by the NLP 

community , in particular ELRA, LDC, Oriental-Cocosda, and some major 

representatives of our field AFNLP, ISCA, EAMT, etc.

We reviewed all possibilities (a paper about this was published at ijcnlp2011 ) 

and envisaged all options

we looked at the current situation where ELRA, LDC and other data centers assign 

identifiers to resources they distribute,

for example (ELRA-W0021     refers to ICE-GB (British English component of the 

International Corpus of English) (ICE-GB) and LDC2011T13  refers to    Chinese 

Gigaword Fifth Edition

These are different "local" identifiers assigned by data centers so we looked at 

the possibility to use global identifiers beyond data centers such as URI, URN, 

IAN (International Article Number) , PMID  (Life science & biomedical), last but 

not least ISBN.

Each of these Identifiers has its advantages and drawbacks, the most common and 

attractive one , ISBN, is also closely related to "Publishers" and in some 

countries it implicitly refers to Copyright  law (with all its severe constraints).

After all these discussions, we came up with a need to adopt our own identifier 

that should be an  International Number that represent Language Resources and we 

called it International Standard Language Resource Number (ISLRN).

We discussed in details whether such Identifier should bear some semantics in 

its composition (to recognize that it refers to a textual or spoken corpus, 

lexicon, ontology, video recordings, etc.) and agreed that such definitions are 

still controversial and hence agreed to keep the ISLRN neutral while able to 

represent all Language resources with a 13 digits format:

                         ISLRN:XXX-XXX-XXX-XXX-X

examples:

772-814-696-901-0 <a class="moz-txt-link-rfc2396E" href="http://www.islrn.org/resources/772-814-696-901-0/"><http://www.islrn.org/resources/772-814-696-901-0/></a>

500-657-957-472-7 <a class="moz-txt-link-rfc2396E" href="http://www.islrn.org/resources/500-657-957-472-7/"><http://www.islrn.org/resources/500-657-957-472-7/></a>

473-117-867-197-2 <a class="moz-txt-link-rfc2396E" href="http://www.islrn.org/resources/473-117-867-197-2/"><http://www.islrn.org/resources/473-117-867-197-2/></a>

642-875-857-557-9 <a class="moz-txt-link-rfc2396E" href="http://www.islrn.org/resources/642-875-857-557-9/"><http://www.islrn.org/resources/642-875-857-557-9/></a>

ELDA, LDC, O-Cocosda and AFNLP (tbc, pending internal discussions) agreed to 

run such service on behalf f the community; other data centers will join once 

the service is in operation. These will constitute the executive committee of 

the ISLRN service.

The idea is to set up a web portal run and moderated by these organizations 

where each owner/developer/producer/ of a Language Resource can request, free of 

charge, an ISLRN for its resource. Those who need to reference such resource 

should refer to the ISLRN as we have been doing with ISBNs since mid sixties 

(see (fake) examples below).

The executive Committee will be steered by a steering committee in which all 

major organizations will be represented, we are about to issue invitations for 

a first general meeting next fall.

the structure we have suggested is:

(Steering Committee)

      |

      |

      |

      v

(executive committee)

      |

      |

      |

      v

(ISLRN service team)

The details about the modus operandi is described in a short position paper 

and the details of the arguments for such initiative have been published, both 

papers are available at:

The ISLRN proposal: 

<a class="moz-txt-link-freetext" href="http://docs.islrn.org/Proposal-ISLRN-workingpaper-v04.pdf">http://docs.islrn.org/Proposal-ISLRN-workingpaper-v04.pdf</a>

The paper from IJCNLP 2011 <a class="moz-txt-link-freetext" href="http://docs.islrn.org/ijcnlp2011-pid-v6.pdf">http://docs.islrn.org/ijcnlp2011-pid-v6.pdf</a>

We will make an official announcement by end of March 2013 and requests for 

ISLRN will start in April 2013.

All the best

<b class="moz-txt-star">Khalid CHOUKRI, ELRA on behalf of the ISLRN Executive Committee<span class="moz-txt-tag"></span></b>

P.S1. The ISLRN will make things easy, from the previous emails, whatever reference is used Impact factor and citation index will be computed on the basis of ISLRN 

Davies, Mark. (2008-) //The Corpus of Contemporary American English: 450 

million words, 1990-present//. Available online at 

<a class="moz-txt-link-freetext" href="http://corpus.byu.edu/coca/.//*ISLRN">http://corpus.byu.edu/coca/.  ISLRN</a> 642-555-213-127-4

British National Corpus, Version 3 (BNC XML Edition). 2007. Distributed by 

Oxford University Computing Services on behalf of the BNC Consortium. 

<a class="moz-txt-link-abbreviated" href="http://www.natcorp.ox.ac.uk">www.natcorp.ox.ac.uk</a> <a class="moz-txt-link-rfc2396E" href="http://www.natcorp.ox.ac.uk/"><http://www.natcorp.ox.ac.uk/></a>; 

ISLRN 143-765-223-127-3

Thompson, P., Iqbal, S. A., McNaught, J. and Ananiadou, S.. (2009). Construction 

of an annotated corpus to support biomedical information extraction. In: BMC 

Bioinformatics, 10:349 <a class="moz-txt-link-freetext" href="http://www.biomedcentral.com/1471-2105/10/349/">http://www.biomedcentral.com/1471-2105/10/349/</a>;  ISLRN:545-321-981-654-1

PS2:

  the idea of assigning ISLRN to resources should not prevent us from pushing 

our colleagues to use it ... we noticed that even when a good reference exist, 

many authors do not use it 

</pre>

    </div>

    <div class="moz-text-plain" wrap="true" graphical-quote="true"

      style="font-family: -moz-fixed; font-size: 14px;" lang="x-unicode"></div>

  </body>

</html>