[Corpora-List] Corpora Digest, XLIFF sysmposium

Mon May 30 12:46:25 UTC 2011

The 2nd XLIFF Symposium will take place on 28 September 2011, in Warsaw, 
Poland. It is the main event of the pre-conference day of TM-Europe 
2011. The symposium builds on success of the 1st XLIFF Symposium which 
was held last September in Limerick, Ireland as part of the 15th Annual 
Internationalisation and Localisation Conference organised by the LRC.

We would welcome proposals which cover but are not limited to the 
following topics:

    * The present state and future of XLIFF
    * XLIFF - What is missing
    * Analysis of XLIFF in commercial tools
    * XLIFF and other standards
    * XLIFF and the translation process
    * XLIFF, terminology and translation memory
    * XLIFF 2.0

In addition to this we would also welcome proposals for short case 
studies and suggestions for future development. During the Symposium we 
are planning to have two sessions with three short presentations which 
introduce new ideas for XLIFF or examples of how XLIFF is being used in 
practise.

We will also be hosting a question and answer session with 
representatives of the XLIFF technical committee.

If you are interested in the above topics and have knowledge and 
experience to share with your peers, potential clients, suppliers, and 
other industry experts, please save the date and submit a proposal for 
presentations and panels for 2nd XLIFF Symposium.

The deadline for submitting proposals is 10 June 2011. Registration will 
open for TM-Europe 2011 and the 2nd XLIFF Symposium in June 2011.

The programme committee for the 2nd XLIFF Symposium conference includes:

Bryan Schnabel (XLIFF TC Chair, Tektronix)
Yves Savourel (ENLASO Corporation)
David Filip (Localisation Research Centre)
Dimitra Anastasiou (University of Bremen)
Lucía Morado Vázquez (Localisation Research Centre)
Jesús Torres del Rey (University of Salamanca)
Peter Reynolds (TM-Global)

Please submit proposals for presentations and panels for 2nd XLIFF 
Symposium using the standard form http://www.tm-europe.org/XLIFFSymposium.

For more information on the conference please visit 
www.tm-europe.org/xliff 
<http://www.tm-europe.org/www.tm-europe.org/xliff> , and for information 
on the XLIFF technical committee visit 
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff .

Am 30.05.2011 12:00, schrieb corpora-request at uib.no:
> Today's Topics:
>
>     1. Re:  Anonymization tools for patient record	research	methods
>        (Uzuner, Ozlem)
>     2. Re:  question about storage of corpora
>        (Fco. Mario Barcala Rodríguez)
>     3. Re:  question about storage of corpora (Damir Cavar)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 28 May 2011 01:39:12 -0400
> From: "Uzuner, Ozlem"<OUzuner at uamail.albany.edu>
> Subject: Re: [Corpora-List] Anonymization tools for patient record
> 	research	methods
> To: Eric Atwell<csc6ea at leeds.ac.uk>, "corpora at uib.no"
> 	<corpora at uib.no>
>
> Hi Eric,
> Here are a few leads from the i2b2 de-identification challenge in 2006:
>
>          Uzuner Ö, Juo Y, Szolovits P.  Evaluating the state-of-the-art in automatic de-identification.  J Am Med Inform Assoc. 2007, 14(5):550-63. http://www.jamia.org/cgi/content/abstract/14/5/550
>          Uzuner Ö , Sibanda T, Luo Y, Szolovits P.   A De-identifier for Medical Discharge Summaries   International Journal Artificial Intelligence in Medicine. 2008; 42(1): 13-35. www.aiimjournal.com/article/SO933-3657(07)00132-7/pdf
>          Hara K. Applying a SVM based chunker and a text classifier to the deid challenge.  Online only at www.jamia.org
>          Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan M, Peshkin L, Yeh A, Hitzeman J, Hirschman L.  Rapidly retargetable approaches to de-identification in medical records.  J Am Med Inform Assoc. 2007; 12(5):564-73. http://www.jamia.org/cgi/content/abstract/14/5/564
>          Szarvas Gy, Farkas R, Busa-Fekete R.  State-of-the-art anonymisation of medical records using an iterative machine learning framewor.  J Am Med Inform Assoc.  2007; 14(5):574-80. http://www.jamia.org/cgi/content/abstract/M2441v1
>
> Thanks,
> Ozlem.
> ________________________________________
> From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Eric Atwell [csc6ea at leeds.ac.uk]
> Sent: Friday, May 27, 2011 6:12 PM
> To: corpora at uib.no
> Subject: [Corpora-List] Anonymization tools for patient record research methods
>
> We are investigating research methods for patient records.
> To be available for Corpus Linguistics analysis, patient records
> have to be anonymised, so individual patients cannot be identified.
> Can anyone point us at tools to (semi-)automate anonymization or
> deidentification of health text data (or any other text data)?
>
> I managed to find "deid" in Physionet
> http://www.physionet.org/physiotools/deid/
> Neamatullah I, Douglass M, Lehman LH, Reisner A, Villarroel M, Long WJ,
> Szolovits P, Moody GB, Mark RG, Clifford GD. Automated De-Identification
> of Free-Text Medical Records. British Medical Council: Medical Informatics
> and Decision Making, 2008, 8:32.
>
> and a survey:
> Ozlem Uzuner, Yuan Luo, Peter Szolovits. Evaluating the State-of-the-Art
> in Automatic De-identification. JAMIA Journal of the American Medical
> Informatics Association, 2007,14:550-563
>
> thanks forany other recommendations
>
> Eric Atwell, Senior Lecturer, Language research group,
>    I-AIBS Institute for Artificial Intelligence and Biological Systems
>    School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
>    Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
>    WWW: http://www.comp.leeds.ac.uk/arabic
>         http://www.comp.leeds.ac.uk/nlp
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 30 May 2011 11:39:52 +0200
> From: Fco. Mario Barcala Rodríguez 	<mario.barcala at mundo-r.com>
> Subject: Re: [Corpora-List] question about storage of corpora
> To: corpora at hd.uib.no
>
> Hi:
>
> We store texts of our corpora as XML and works fine for us for more
> than a decade.  We, then, build relational databases from them to make
> different search applications (http://corpus.cirp.es/corga and
> http://corpus.cirp.es/corgaetq)
>
> TEI (http://www.tei-c.org) or XCES (http://www.xces.org) can give you
> a start point.
>
> We made an stylesheet adaptation of an XML editor to do part of
> speech. It's not the best solution, but works for us for years. For
> searching, we build ad hoc relational database from the XML files.
>
> You can see all details and other related questions in my PhD
> work. The full pdf file (Galician language) and an extended summary of
> it (in English) can be downloaded from my home page:
>
> http://www.xente.mundo-r.com/barcala/publicacions_english.html
>
> Ask me any doubts you want
>
> Regards,
>
>    Mario Barcala
>
> On Fri, May 27, 2011 at 03:14:25PM +0200, Tine Lassen wrote:
>> Hi,
>> I am in the process of compiling a series of domain corpora, and once the
>> present text gathering phase is completed, of course i need to store the
>> texts somehow. The texts need to be annotated with e.g. parts of spech
>> and posssibly phrase boundaries for term extraction purposes.
>> My questions are: Would it be wiser to store the texts as XML or in a
>> relational database format?Does a generally accepted corpus annotation
>> XML-schema exist? And do tools for annotation of and search in such files
>> exists?How do you store your corpora?
>> Any thoughts or ideas regarding the questions are very welcome :)
>> Best,Tine LassenCopenhagen Business School
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 30 May 2011 11:55:33 +0200
> From: Damir Cavar<dcavar at indiana.edu>
> Subject: Re: [Corpora-List] question about storage of corpora
> To: corpora at hd.uib.no
>
> Hi Tine,
>
> On May 27, 2011, at 3:14 PM, Tine Lassen wrote:
>
>> I am in the process of compiling a series of domain corpora, and once the present text gathering phase is completed, of course i need to store the texts somehow. The texts need to be annotated with e.g. parts of spech and posssibly phrase boundaries for term extraction purposes.
>>
>> My questions are: Would it be wiser to store the texts as XML or in a relational database format?
>> Does a generally accepted corpus annotation XML-schema exist? And do tools for annotation of and search in such files exists?
>> How do you store your corpora?
> TEI XML, using the oXygen XML editor, and storing the XML-files in for example in BaseX is the solution. At least the editing and annotation we do so far for the Croatian Language Corpus (http://riznica.ihjj.hr/) this way. I use BaseX for my own purposes, but do plan to provide a new front-end search with it as a backend. The current online search front-end of the CLC is a manipulated PhiloLogic, that takes raw TEI XML files (see the link above for the interface).
>
> So, why bother and store all that in relational DBs? The current XML-DBs are quite efficient and fast:
>
> TEI
> http://www.tei-c.org/
>
> Philologic
> http://sites.google.com/site/philologic3/home
>
> BaseX
> http://basex.org/
>
>
> and the only commercial in this list is:
>
> oXygen
> http://www.oxygenxml.com/
>
>
> best wishes
> DC
>
>
>
> --
> Dr. Damir Cavar
> http://web.me.com/dcavar/
> mobile +49 176 60928748
> office +49 7531 885357
> private (US): +1 (734) 330-2902
> FaceTime: dcavar at me.com
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 7732 bytes
> Desc: not available
> URL:<http://www.uib.no/mailman/public/corpora/attachments/20110530/8fca0d6d/attachment.txt>
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
> 	corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
> 	corpora-request at uib.no
>
> You can reach the person managing the list at
> 	corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 47, Issue 31
> ***************************************
>

-- 
Dimitra Anastasiou, PhD
Computer Science/Language Sciences
University of Bremen
Bremen, Germany
www.d-anastasiou.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110530/6190852e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora