<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:PMingLiU;
panose-1:2 1 6 1 0 1 1 1 1 1;}
@font-face
{font-family:"Comic Sans MS";
panose-1:3 15 7 2 3 3 2 2 2 4;}
@font-face
{font-family:"\@PMingLiU";
panose-1:0 0 0 0 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:Arial;
color:navy;}
span.EmailStyle18
{mso-style-type:personal;
font-family:Arial;
color:navy;
font-weight:normal;
font-style:normal;
text-decoration:none none;}
@page Section1
{size:595.3pt 841.9pt;
margin:72.0pt 107.65pt 72.0pt 107.65pt;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-GB link=blue vlink=purple>
<div class=Section1>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>Hello Daniel,<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'><o:p> </o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>You may also want to consider the hierarchically
classified HEP corpus. It is in English (i.e. no German texts) and not about
computer science, but it is very well documented, has a good size, etc. You
find it at:<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'><o:p> </o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'> <a
href="http://sinai.ujaen.es/wiki/index.php/HepCorpus#English_version">http://sinai.ujaen.es/wiki/index.php/HepCorpus#English_version</a><o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'><o:p> </o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>Arturo Montejo Ráez (amontejo AT ujaen.es) will be
happy to help you with any questions you may have. A useful feature about this
corpus is that Arturo has already produced a number of benchmark values for categorisation
with various methods. <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'><o:p> </o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>Ralf<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<div style='mso-element:para-border-div;border:none;border-top:solid windowtext 1.0pt;
padding:1.0pt 0cm 0cm 0cm'>
<p class=MsoNormal style='border:none;padding:0cm'><b><font size=1 color=gray
face="Comic Sans MS"><span style='font-size:8.0pt;font-family:"Comic Sans MS";
color:gray;font-weight:bold'>Ralf Steinberger</span></font></b><font size=1
color=gray face="Comic Sans MS"><span style='font-size:8.0pt;font-family:"Comic Sans MS";
color:gray'> (</span></font><font size=1 color=gray face="Comic Sans MS"><span
lang=DE style='font-size:8.0pt;font-family:"Comic Sans MS";color:gray'><a
href="mailto:Ralf.Steinberger@jrc.it" title="mailto:Ralf.Steinberger@jrc.it"><font
color=gray><span lang=EN-GB style='color:gray'><span
title="mailto:Ralf.Steinberger@jrc.it"><span
title="mailto:Ralf.Steinberger@jrc.it">Ralf.Steinberger@jrc.it</span></span></span></font></a></span></font><font
size=1 color=gray face="Comic Sans MS"><span style='font-size:7.5pt;font-family:
"Comic Sans MS";color:gray'>) <br>
European Commission - Joint Research Centre (JRC)<br>
IPSC - SeS - Language Technology (</span></font><font size=1
color=gray face="Comic Sans MS"><span style='font-size:8.0pt;font-family:"Comic Sans MS";
color:gray'><a href="http://langtech.jrc.it/" title="http://www.jrc.it/langtech"><font
size=1 color=gray><span style='font-size:7.5pt;color:gray'><span
title="http://www.jrc.it/langtech"><span title="http://www.jrc.it/langtech">http://langtech.jrc.it</span></span></span></font></a>,
<a href="http://press.jrc.it/NewsExplorer/" title="http://www.jrc.it/langtech"><font
size=1 color=gray><span style='font-size:7.5pt;color:gray'><span
title="http://www.jrc.it/langtech"><span title="http://www.jrc.it/langtech">http://press.jrc.it/NewsExplorer</span></span></span></font></a></span></font><font
size=1 color=gray face="Comic Sans MS"><span style='font-size:7.5pt;font-family:
"Comic Sans MS";color:gray'>) <br>
T.P. 267, Via Fermi 1<br>
21020 Ispra (VA), <U1:COUNTRY-REGION u2:st="on"><U1:PLACE u2:st="on">Italy<br>
<br>
</span></font><font color=gray><span style='color:gray'></U1:PLACE></U1:COUNTRY-REGION><o:p></o:p></span></font></p>
</div>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'><o:p> </o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt'>-----Original Message-----<br>
<br>
</span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> I'm working on my master thesis "Accurate
Hierarchical Classification <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> using NLP Techniques". I hope to improve the
accuracy of hierarchical <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> classification on English and German corpora by
using additional <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> information extracted with aid of linguistic
tools.<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> I would like to ask where I can obtain corpora
which are already <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> classified in a hierarchy. I need several English
and German corpora. I <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> would prefer if the topics of the corpora are
about linguistic or <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> computer science.<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> Regards & Thanks,<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> <o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'>> Daniel<o:p></o:p></span></font></p>
<p class=MsoPlainText><font size=2 color=navy face=Arial><span
style='font-size:10.0pt'><o:p> </o:p></span></font></p>
</div>
</body>
</html>