Dear Sabine,<br><br>thank you for your response and for your helpful hints. Our goals behind the conversion into eGXL were mainly:<br>1. to get a unification by means of XML<br>2.
to integrate the treebanks into a corpus management system where not
only treebanks but as well spoken and web corpora are stored/retrieved
by means of GXL (an XML based graph representation format).<br>
<br>Restricting on a one specific format is always bound to additional
adaptations of your application when you have to deal with a new
treebank. Thus, we tried to select a format which is generic enough to
be reused and which is suitable for treebanks. GXL is a generic graph
model which allows to represent any kinds of
corpora, since you can represent any sorts of relations in terms of a
graph. That makes GXL a useful means for corpus retrieval. Treebanks
can easily be mapped to it (since trees are special cases of
graphs). eGXL slightly modifies GXL in order to account for specifics
of
treebanks. Thus, we selected this format while it meets both
requirements - to be generic and suitable for treebanks.<br><br>In
my paper I don't provide a detailed comparison of eGXL to other
formats. However, CoNNL is referred to by comparing the treebanks,
although only indirectly. Please send me a reference to your work,
which I've missed to mention in this paper and I will consider it in my
future work. <br>
<br>Best regards,<br><br><div class="gmail_quote">On Feb 3, 2008 1:39 PM, Sabine Buchholz <<a href="mailto:sabine.buchholz@crl.toshiba.co.uk">sabine.buchholz@crl.toshiba.co.uk</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Dear Olga,<br>I think uniform formats for treebanks are a good idea and therefore read<br>your announcement, Wiki page and article with interest. However, that raised<br>a lot of questions:<br>You clearly are aware of the CoNLL-X shared task on multilingual dependency<br>
parsing, as you link to its home page from your Wiki. For that task 13<br>treebanks were converted to a uniform format, many of them among the 11 you<br>list. Our goal was probably different from yours but<br>1) Why is that work not even mentioned in the paper, let alone compared to?<br>
2) What part of the analyses you did for the paper could you not have done<br>using the CoNLL-X format?<br>You even seem to have used the CoNLL-X version of some treebanks (e.g.<br>Dutch) as the basis of your eGXL conversion (the Dutch example in your paper<br>
is in CoNLL-X and not the original Alpino format).<br>3) Why did you choose to do that? The conversion from Alpino to CoNLL-X<br>format looses some information, so why not convert from the original format?<br>Same potentially for Swedish and Bulgarian.<br>
<br>With regard to your question about other treebanks to add to your database:<br>in addition to the remainder of the 13 CoNLL-X treebanks and the new ones<br>converted for the successor (the CoNLL 2007 shared task on dependency<br>
parsing), <a href="http://en.wikipedia.org/wiki/Treebank" target="_blank">http://en.wikipedia.org/wiki/Treebank</a> lists even more treebanks.<br>But you probably already know that, you link to it from your Wiki...<br>Although I just noticed that the Romanian treebank you used is still missing<br>
from that list...<br><br>Looking forward to hearing from you,<br>kind regards,<br>Sabine Buchholz<br><div><div></div><div class="Wj3C7c"><br><br>----- Original Message -----<br>From: Olga Pustylnikov<br>To: <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
Sent: Friday, February 01, 2008 9:31 AM<br>Subject: [Corpora-List] Announcement: Release of the Dependency<br>TreebankDatabase DTDB 1.0<br><br>Dear list members,<br>I'm happy to announce the release of DTDB 1.0, a Dependency Treebank<br>
DataBase. The database consists of 11 languages which are transformed into a<br>single representation format. This format is an XML based graph model, and<br>it was designed to support the interoperability of existing corpora.<br>
The wiki <a href="http://ariadne.coli.uni-bielefeld.de/wikis/treebankwiki/" target="_blank">http://ariadne.coli.uni-bielefeld.de/wikis/treebankwiki/</a> presents<br>the treebanks and the unification format used. Details about the format are<br>
also described in:<br><a href="http://ariadne.coli.uni-bielefeld.de/pustylnikov/pdfs/acl07.1.0.pdf" target="_blank">http://ariadne.coli.uni-bielefeld.de/pustylnikov/pdfs/acl07.1.0.pdf</a><br>My question is: do other treebanks exist which are not part of the database?<br>
If you know of an existing treebank that should be transformed into the<br>unified format please, let me know.<br><br>--<br>Olga Pustylnikov<br><br>Universität Bielefeld<br>Fakultät für Linguistik und Literaturwissenschaft<br>
Universitätsstraße 25<br>D-33615 Bielefeld<br><br><a href="http://ariadne.coli.uni-bielefeld.de/pustylnikov/" target="_blank">http://ariadne.coli.uni-bielefeld.de/pustylnikov/</a><br><a href="mailto:olga.pustylnikov@uni-bielefeld.de">olga.pustylnikov@uni-bielefeld.de</a><br>
<br><br></div></div>______________________________________________________________________<br>This email has been scanned by the MessageLabs Email Security System.<br>For more information please visit <a href="http://www.messagelabs.com/email" target="_blank">http://www.messagelabs.com/email</a><br>
______________________________________________________________________<br></blockquote></div><br><br clear="all"><br>-- <br>Olga Pustylnikov<br><br>Universität Bielefeld<br>Fakultät für Linguistik und Literaturwissenschaft<br>
Universitätsstraße 25<br>D-33615 Bielefeld<br><br><a href="http://ariadne.coli.uni-bielefeld.de/pustylnikov/">http://ariadne.coli.uni-bielefeld.de/pustylnikov/</a><br><a href="mailto:olga.pustylnikov@uni-bielefeld.de">olga.pustylnikov@uni-bielefeld.de</a>