[Corpora-List] Resources concerning multilabel problem

radev at umich.edu radev at umich.edu
Fri Aug 18 13:42:58 UTC 2006


Look at this paper:

http://citeseer.ist.psu.edu/8956.html

Error-Correcting Output Coding for Text Classification (1999)
Adam Berger

See also:

http://citeseer.ist.psu.edu/19268.html



> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
> Behalf Of Cecilie Desiree Widsteen
> Sent: 18 August 2006 11:09
> To: Corpora list
> Subject: [Corpora-List] Resources concerning multilabel problem
> 
>  
> 
> Hello all!
> 
>  
> 
> I am looking for resources (articles, books, webpages) concerning the 
> 
> multilabel (multiclass?) problem in the context of text classification. 
> 
> By this I mean the fact that a document can be classified into more than 
> 
> one category. Especially w.r.t. supervised learning algorithms, where 
> 
> the documents in the training set may belong to multiple classes.
> 
>  
> 
> Regards,
> 
> --
> 
> Cecilie Widsteen
> 
> Institute for Informatics,
> 
> University of Oslo
> 
> 
> --Boundary_(ID_Bwp/m+5eLKSGnHqnO7FgzA)
> Content-type: text/html; charset=us-ascii
> Content-transfer-encoding: 7BIT
> 
> <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">
> 
> <head>
> <meta http-equiv=Content-Type content="text/html; charset=us-ascii">
> <meta name=Generator content="Microsoft Word 11 (filtered medium)">
> <style>
> <!--
>  /* Font Definitions */
>  @font-face
> 	{font-family:"Comic Sans MS";
> 	panose-1:3 15 7 2 3 3 2 2 2 4;}
> @font-face
> 	{font-family:Verdana;
> 	panose-1:2 11 6 4 3 5 4 4 2 4;}
>  /* Style Definitions */
>  p.MsoNormal, li.MsoNormal, div.MsoNormal
> 	{margin:0cm;
> 	margin-bottom:.0001pt;
> 	font-size:12.0pt;
> 	font-family:"Times New Roman";}
> a:link, span.MsoHyperlink
> 	{color:blue;
> 	text-decoration:underline;}
> a:visited, span.MsoHyperlinkFollowed
> 	{color:purple;
> 	text-decoration:underline;}
> p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
> 	{margin:0cm;
> 	margin-bottom:.0001pt;
> 	font-size:10.0pt;
> 	font-family:Arial;
> 	color:navy;}
> p
> 	{mso-margin-top-alt:auto;
> 	margin-right:0cm;
> 	mso-margin-bottom-alt:auto;
> 	margin-left:0cm;
> 	font-size:12.0pt;
> 	font-family:"Times New Roman";}
> span.EmailStyle18
> 	{mso-style-type:personal;
> 	font-family:Arial;
> 	color:windowtext;}
> @page Section1
> 	{size:595.3pt 841.9pt;
> 	margin:72.0pt 107.65pt 72.0pt 107.65pt;}
> div.Section1
> 	{page:Section1;}
> -->
> </style>
> 
> </head>
> 
> <body lang=EN-GB link=blue vlink=purple>
> 
> <div class=Section1>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>Dear Cecilie,<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>We have recently made available the JRC-Acquis corpus,
> which is a multilingual (21 languages) document collection multi-labelled according
> to the Eurovoc thesaurus and aligned at paragraph level for each of the 210
> language pairs. You find it for download at:<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>      <a
> href="http://langtech.jrc.it/JRC-Acquis.html">http://langtech.jrc.it/JRC-Acquis.html</a><o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>Furthermore, in the ‘Publications’ section
> of our web site (<a href="http://langtech.jrc.it/#Publications">http://langtech.jrc.it/#Publications</a>),
> you find a number of papers on (typically multilingual) multi-label text
> categorisation applications (look mainly around the years 2002-2004), including
> the following:<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p style='margin-left:36.0pt;text-align:justify'><font size=2 face=Verdana><span
> style='font-size:10.0pt;font-family:Verdana'>Pouliquen Bruno, Ralf Steinberger
> & Camelia Ignat (2003)</span></font><font size=2 face=Verdana><span
> style='font-size:10.0pt;font-family:Verdana'>. <i><span style='font-style:italic'><a
> href="http://langtech.jrc.it/Documents/EuroLan-03_Pouliquen-Steinberger-et-al.pdf">Automatic
> Annotation of Multilingual Text Collections with a Conceptual Thesaurus</a></span></i>.
> In: Proceedings of the Workshop <i><span style='font-style:italic'>Ontologies
> and Information Extraction</span></i> at the Summer School <i><span
> style='font-style:italic'>The Semantic Web and Language Technology - Its
> Potential and Practicalities</span></i> (EUROLAN'2003). Bucharest, Romania, 28
> July - 8 August 2003. </span></font><font face=Verdana><span style='font-family:
> Verdana'><o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>The text categorisation approach described in that
> paper is used as the major ingredient in our daily news analysis system
> NewsExplorer (freely accessible at <a href="http://press.jrc.it/NewsExplorer">http://press.jrc.it/NewsExplorer</a>)
> to link related news across languages.<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>I hope this helps. All the best,<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>Ralf<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
> font-family:Arial'><o:p> </o:p></span></font></p>
> 
> <div style='mso-element:para-border-div;border:none;border-top:solid windowtext 1.0pt;
> padding:1.0pt 0cm 0cm 0cm'>
> 
> <p class=MsoNormal style='border:none;padding:0cm'><b><font size=1
> color=maroon face="Comic Sans MS"><span style='font-size:8.0pt;font-family:
> "Comic Sans MS";color:maroon;font-weight:bold'>Ralf Steinberger</span></font></b><font
> size=1 color=maroon face="Comic Sans MS"><span style='font-size:8.0pt;
> font-family:"Comic Sans MS";color:maroon'> (</span></font><font size=1
> color=maroon face="Comic Sans MS"><span lang=DE style='font-size:8.0pt;
> font-family:"Comic Sans MS";color:maroon'><a
> href="mailto:Ralf.Steinberger at jrc.it" title="mailto:Ralf.Steinberger at jrc.it"><font
> color=maroon><span lang=EN-GB style='color:maroon'><span
> title="mailto:Ralf.Steinberger at jrc.it"><span
> title="mailto:Ralf.Steinberger at jrc.it">Ralf.Steinberger at jrc.it</span></span></span></font></a></span></font><font
> size=1 color=maroon face="Comic Sans MS"><span style='font-size:7.5pt;
> font-family:"Comic Sans MS";color:maroon'>) <br>
> European Commission - Joint Research Centre (JRC)<br>
> IPSC - SeS - Language Technology (</span></font><font size=1
> color=maroon face="Comic Sans MS"><span style='font-size:8.0pt;font-family:
> "Comic Sans MS";color:maroon'><a href="http://langtech.jrc.it/"
> title="http://www.jrc.it/langtech"><font size=1 color=maroon><span
> style='font-size:7.5pt;color:maroon'><span title="http://www.jrc.it/langtech"><span
> title="http://www.jrc.it/langtech">http://langtech.jrc.it</span></span></span></font></a>,
> <a href="http://press.jrc.it/NewsExplorer/" title="http://www.jrc.it/langtech"><font
> size=1 color=maroon><span style='font-size:7.5pt;color:maroon'><span
> title="http://www.jrc.it/langtech"><span title="http://www.jrc.it/langtech">http://press.jrc.it/NewsExplorer</span></span></span></font></a></span></font><font
> size=1 color=maroon face="Comic Sans MS"><span style='font-size:7.5pt;
> font-family:"Comic Sans MS";color:maroon'>) <br>
> T.P. 267, Via Fermi 1<br>
> 21020 Ispra (VA), <U1:COUNTRY-REGION u2:st="on"><U1:PLACE u2:st="on">Italy<br>
> </U1:PLACE></U1:COUNTRY-REGION>Tel: +39 0332 78-6271<br>
> Fax: +39 0332 78-5154<br>
> Secretary: +39 0332 78-5648 or 9478</span></font><o:p></o:p></p>
> 
> </div>
> 
> <p class=MsoNormal><font size=1 face="Comic Sans MS"><span style='font-size:
> 8.0pt;font-family:"Comic Sans MS"'><o:p> </o:p></span></font></p>
> 
> <p class=MsoNormal><b><font size=1 color=red face="Comic Sans MS"><span
> style='font-size:8.0pt;font-family:"Comic Sans MS";color:red;font-weight:bold'>New
> URL:</span></font></b><font size=1 face="Comic Sans MS"><span style='font-size:
> 8.0pt;font-family:"Comic Sans MS"'> <a href="http://langtech.jrc.it/">http://langtech.jrc.it</a>.
> The previous address http://www.jrc.it/langtech will only be valid for a few
> more months.<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'> <o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span lang=EN-US
> style='font-size:10.0pt'>-----Original Message-----<br>
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On Behalf
> Of Cecilie Desiree Widsteen<br>
> Sent: 18 August 2006 11:09<br>
> To: Corpora list<br>
> Subject: [Corpora-List] Resources concerning multilabel problem</span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>Hello all!<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>I am looking for resources (articles, books, webpages)
> concerning the <o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>multilabel (multiclass?) problem in the context of
> text classification. <o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>By this I mean the fact that a document can be
> classified into more than <o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>one category. Especially w.r.t. supervised learning
> algorithms, where <o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>the documents in the training set may belong to
> multiple classes.<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'><o:p> </o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>Regards,<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>--<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>Cecilie Widsteen<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>Institute for Informatics,<o:p></o:p></span></font></p>
> 
> <p class=MsoPlainText><font size=2 color=navy face=Arial><span
> style='font-size:10.0pt'>University of Oslo<o:p></o:p></span></font></p>
> 
> </div>
> 
> </body>
> 
> </html>
> 
> --Boundary_(ID_Bwp/m+5eLKSGnHqnO7FgzA)--
> 
> 
> 


-- 
Dragomir R. Radev                                         radev at umich.edu
Associate Professor of Information, Electrical Engineering and
Computer Science, and Linguistics, the University of Michigan, Ann Arbor
Phone: 734-615-5225   Fax: 734-764-2475    http://www.si.umich.edu/~radev



More information about the Corpora mailing list