<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--

/* Font Definitions */

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Tahoma;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p.MsoAcetate, li.MsoAcetate, div.MsoAcetate

        {mso-style-priority:99;

        mso-style-link:"Balloon Text Char";

        margin:0cm;

        margin-bottom:.0001pt;

        font-size:8.0pt;

        font-family:"Tahoma","sans-serif";}

span.EmailStyle18

        {mso-style-type:personal-reply;

        font-family:"Calibri","sans-serif";

        color:#1F497D;}

span.BalloonTextChar

        {mso-style-name:"Balloon Text Char";

        mso-style-priority:99;

        mso-style-link:"Balloon Text";

        font-family:"Tahoma","sans-serif";

        mso-fareast-language:EN-GB;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-family:"Calibri","sans-serif";

        mso-fareast-language:EN-US;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:72.0pt 72.0pt 72.0pt 72.0pt;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></head><body lang=EN-GB link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Dear Albert and Adam,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I can confirm that the EuroVoc descriptor (class) <b>labels are NOT used</b> in the classification process, meaning that the translation of the <i>thesaurus</i> <i>labels</i> themselves is irrelevant. Our experiments showed that using the label during the classification process did not help, probably because class names such as ‘equality between men and women’ do not usually occur in the text. For English, only 31% of the names (labels) of the manually assigned descriptors actually occur in the indexed document. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Impact of translation quality</span></b><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>: The idea that the text translation quality (and especially the <i>consistency</i> of the translation of terms) may have an impact on the  classification performance could indeed be a reason. When the EU went from 15 to 25 (and then to 27) member states, large numbers of legal documents needed to be translated in a short time. And indeed, the 5 Slavic languages (all ‘new’ member states) are all among the less well performing ones. However, Hungarian and Lithuanian (also ‘new’ member states) are in first and second position! For Maltese, I agree: Being a Semitic language with strong Romance influences may be a challenge for consistent term translation. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Does anybody have comparative experience with document classification for Slavic, Finno-Ugric or Baltic languages? <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Maltese stop words:</span></b><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> Our classification tool JEX automatically weighs words depending on how specifically they occur in texts indexed with one class and it does thus automatically discard most high-frequency words. Using a manually compiled list of stop words (including words such as ‘paragraph’ and ‘decision’) nevertheless increases the performance. We did not use more Maltese stop words because we had never looked at the language and we first did not have any stop words. We later added 296 Maltese stop words and the performance (F1) increased from 0.4366 to 0.4500, which is still far below the value for the other languages. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Any other possible explanations or experiences?<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Greetings,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Ralf<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> bertugatt@gmail.com [mailto:bertugatt@gmail.com] <b>On Behalf Of </b>Albert Gatt<br><b>Sent:</b> 02 June 2012 15:26<br><b>To:</b> Ralf Steinberger<br><b>Cc:</b> Adam Kilgarriff; corpora@uib.no<br><b>Subject:</b> Re: [Corpora-List] Q: Classification performance across languages and language families<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Dear Ralf<o:p></o:p></p><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>I find the issue you've raised quite interesting and I too wonder why Maltese should behave so differently. Like Adam, wondered about the quality of the thesaurus at first. Perhaps that's not the reason, as you suggest. But another reason -- also related to the relatively recent development of vocabularies in certain technical areas in Maltese (Malta being bilingual, most such technical areas were written about in English) -- might be inconsistencies and/or variation in the way the documents in your set were translated, which would also affect the distribution of lexical features and the reliability with which they are associated with particular categories. I am aware of an initiative in recent years among Maltese translation bureaux to standardise some of the translations of technical terms/phrases. (One of the problems seems to have been that, because Maltese is Semitic, but has been heavily influenced by Romance, there is often more than one possible translation for a given term. Another problem is simply that translators, especially in the early days after Malta's accession to the EU, would have relied on circumlocution and similar "workarounds", before a vocabulary was gradually developed.) I guess the more recent the document collection, the more likely it would be to avoid such inconsistencies.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>I've also taken a look at your LREC paper, mainly at Table 1, where your precision/recall and other stats are reported. Here too, there are some things which I find surprising. For example, why are there only 6 elements in your stop-word list for Maltese, compared to much bigger numbers for many other languages?<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>albert<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal style='margin-bottom:12.0pt'><o:p> </o:p></p><div><p class=MsoNormal>On 2 June 2012 14:29, Ralf Steinberger <<a href="mailto:ralf.steinberger@jrc.ec.europa.eu" target="_blank">ralf.steinberger@jrc.ec.europa.eu</a>> wrote:<o:p></o:p></p><div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Dear Adam,</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Thanks for your proposal and for allowing me to clarify: EuroVoc is a <i>classification scheme</i> with exactly the same 6700 subject domain classes in all languages, i.e. each class has a numerical identifier and exactly <i>one class</i> <i>label</i> that has been translated into all 27 or so languages. Example EuroVoc categories are ‘nuclear materials’, ‘Austria’, ‘fishery management’, ‘xenophobia’, ‘budget’, ‘population statistics’, ...</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I cannot see how such a classification scheme would favour one language over another, especially as the documents are parallel translations, as well: they have the same contents in all languages. EuroVoc is in no way comparable to a resource such as WordNet, which rather lists and organises existing words of a language, with varying coverage. </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Greetings from Italy to the UK.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Ralf</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> <a href="mailto:adam.kilgarriff@gmail.com" target="_blank">adam.kilgarriff@gmail.com</a> [mailto:<a href="mailto:adam.kilgarriff@gmail.com" target="_blank">adam.kilgarriff@gmail.com</a>] <b>On Behalf Of </b>Adam Kilgarriff<br><b>Sent:</b> 02 June 2012 14:13<br><b>To:</b> Ralf Steinberger<br><b>Cc:</b> <a href="mailto:corpora@uib.no" target="_blank">corpora@uib.no</a>; <a href="mailto:clef@dei.unipd.it" target="_blank">clef@dei.unipd.it</a>; <a href="mailto:ln@cines.fr" target="_blank">ln@cines.fr</a><br><b>Subject:</b> Re: [Corpora-List] Q: Classification performance across languages and language families</span><o:p></o:p></p><div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Ralf,<o:p></o:p></p><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Please excuse scepticism, but what about the simple hypothesis that it all depends on thesaurus-quality.  My hunch would be that it started from a Germanic language, hence good performance there, and that Slavic lgs have been added more recently, so there have been less years for debugging/improving, and that there was a particularly inspired Hungarian translator!<o:p></o:p></p><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Maltese has a special problem - Maltese hasn't ever had a technical vocabulary so there was nothing the Maltese thesaurus-translators could do except make things up.<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>(Of course I'll be happy to have my hypothesis quashed by someone who knows the history of Eurovoc)<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Adam<o:p></o:p></p></div><div><div><p class=MsoNormal style='mso-margin-top-alt:auto;margin-bottom:12.0pt'> <o:p></o:p></p><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>On 2 June 2012 12:40, Ralf Steinberger <<a href="mailto:ralf.steinberger@jrc.ec.europa.eu" target="_blank">ralf.steinberger@jrc.ec.europa.eu</a>> wrote:<o:p></o:p></p><div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>A question and an invitation to discussion.<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-autospace:none'>We recently carried out <a href="http://langtech.jrc.ec.europa.eu/Documents/2012_LREC-JEX-final.pdf" target="_blank">multi-label categorisation experiments</a> on a mostly parallel set of documents in 22 languages, covering the language families Germanic, Romance, Slavic, Hellenic, Finno-Ugric, Baltic and Semitic. The document set is reasonably large (22K to 42K documents per language), using the thousands of subject domain categories from the <a href="http://eurovoc.europa.eu/" target="_blank">EuroVoc thesaurus</a>. The performance across languages was rather uniform, with the exception of the outlier Maltese, which performed considerably less well. The languages covered are Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish and Swedish. <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>To my great surprise, the highly inflected agglutinative language <b>Hungarian</b> produced the best results of all. The five Germanic languages ended up in the top ten positions, the five Slavic languages in the bottom half. The results for the other language families were less consistent. <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Q1:</b> Does anyone have an intuition how these results could be explained?<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Q2:</b> Has anyone ran similar experiments with other types of classifiers or data? Are the results similar?<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>My initial expectation had been that highly inflected languages would perform less well and that feature space reduction using lemmatisation would improve the results. However, our experiments for Czech, English, Estonian and French (described in Ebrahim et al., forthcoming) showed the contrary, rather consistently for all four languages and language families: (1) lemmatisation reduces the performance and (2) adding part-of-speech (POS) information to the word form and/or to the lemma improves the performance. <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Q3:</b> Can we conclude that: the scarcer the feature space, the better the classification performance? <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Q4:</b> If that were the case, why did Slavic languages (and Maltese) perform less well in our experiments? <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-autospace:none'>I would be pleased if you could share your own experience and/or your opinions.<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-autospace:none'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-autospace:none'>The classification tool (<a href="http://langtech.jrc.ec.europa.eu/Eurovoc.html" target="_blank">JRC EuroVoc Indexer JEX</a>) and the multilingual document set can be downloaded from <a href="http://langtech.jrc.ec.europa.eu/Eurovoc.html" target="_blank">http://langtech.jrc.ec.europa.eu/Eurovoc.html</a> . Details of our experiments are given in the two papers below.<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;margin-left:36.0pt'>Steinberger Ralf, Mohamed Ebrahim & Marco Turchi (2012). <strong><span style='font-family:"Calibri","sans-serif"'>JRC EuroVoc Indexer JEX - A freely available multi-label categorisation tool</span></strong>. Proceedings of the 8<sup>th</sup> international conference on Language Resources and Evaluation (LREC'2012), Istanbul, 21-27 May 2012. (<a href="http://langtech.jrc.ec.europa.eu/Documents/2012_LREC-JEX-final.pdf" target="_blank" title="Reference publication for the JRC Eurovoc Indexer JEX">PDF</a>)<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;margin-left:36.0pt'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;margin-left:36.0pt'>Ebrahim Mohamed, Maud Ehrmann, Marco Turchi & Ralf Steinberger (forthcoming). <strong><span style='font-family:"Calibri","sans-serif"'>Multi-label EuroVoc classification for Eastern and Southern EU Languages</span></strong>. In: Cristina Vertan & Walther v. Hahn: Multilingual processing in Eastern and Southern EU languages - Low-resourced technologies and translation. Cambridge Scholars Publishing, Cambridge, UK.<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Greetings,<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Ralf<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b><span style='font-size:9.0pt;color:#4A442A'>Ralf Steinberger</span></b><span style='font-size:9.0pt;color:#4A442A'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span lang=EN-US style='font-size:9.0pt;color:#4A442A'>European Commission – Joint Research Centre (JRC)</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span lang=EN-US style='font-size:9.0pt;color:#4A442A'>URL: <a href="http://langtech.jrc.ec.europa.eu/RS.html" target="_blank">http://langtech.jrc.ec.europa.eu/RS.html</a></span><span lang=EN-US style='font-size:9.0pt'> <span style='color:#4A442A'> </span></span><o:p></o:p></p></div></div><p class=MsoNormal style='mso-margin-top-alt:auto;margin-bottom:12.0pt'><br>_______________________________________________<br>UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>Corpora mailing list<br><a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br><a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><o:p></o:p></p></div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><br><br clear=all><o:p></o:p></p><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>-- <br>========================================<br><a href="http://www.kilgarriff.co.uk/" target="_blank">Adam Kilgarriff</a>                  <a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.com</a>                                             <br>Director                                    <a href="http://www.sketchengine.co.uk/" target="_blank">Lexical Computing Ltd</a>                <br>Visiting Research Fellow                 <a href="http://leeds.ac.uk" target="_blank">University of Leeds</a>     <o:p></o:p></p><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><i><span style='color:#006600'>Corpora for all</span></i> with <a href="http://www.sketchengine.co.uk" target="_blank">the Sketch Engine</a>                 <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>                        <i><a href="http://www.webdante.com" target="_blank">DANTE: <span style='color:#009900'>a lexical database for English</span></a><span style='color:#009900'> </span>                 </i><o:p></o:p></p><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>========================================<o:p></o:p></p></div></div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div></div></div></div></div></div></div><p class=MsoNormal style='margin-bottom:12.0pt'><br>_______________________________________________<br>UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>Corpora mailing list<br><a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br><a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><o:p></o:p></p></div><p class=MsoNormal><br><br clear=all><o:p></o:p></p><div><p class=MsoNormal><o:p> </o:p></p></div><p class=MsoNormal>-- <o:p></o:p></p><div><p class=MsoNormal>-----------------------------------------------------------------<o:p></o:p></p></div><div><p class=MsoNormal>Albert Gatt<o:p></o:p></p></div><div><p class=MsoNormal>Institute of Linguistics<o:p></o:p></p></div><div><p class=MsoNormal>Centre for Communication Technology Rm 402B<o:p></o:p></p></div><div><p class=MsoNormal>University of Malta<o:p></o:p></p></div><div><p class=MsoNormal>Tal-Qroqq Msida MSD2080<o:p></o:p></p></div><div><p class=MsoNormal>Malta<o:p></o:p></p></div><div><p class=MsoNormal> <o:p></o:p></p></div><div><p class=MsoNormal>tel: (+356) 2340 2150<o:p></o:p></p></div><div><p class=MsoNormal><a href="http://staff.um.edu.mt/albert.gatt/" target="_blank">http://staff.um.edu.mt/albert.gatt/</a><o:p></o:p></p></div><p class=MsoNormal><o:p> </o:p></p></div></div></body></html>