[Corpora-List] CFP: ICML Workshop - Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining

Rayid Ghani rayid.ghani at accenture.com
Tue Mar 4 00:59:42 UTC 2003


CALL FOR PAPERS
ICML 2003 Workshop (Co-located with KDD 2003)
The Continuum from Labeled to Unlabeled Data in Machine Learning and
Data Mining
August 21, 2003. Washington, DC.
http://www.accenture.com/techlabs/icmlworkshop2003/

Important Dates
Papers Due: May 1, 2003
Notification: May 25, 2003
Final Version Due: June 10, 2003
Workshop: August 21, 2003

There is a spectrum of ways to use data in machine learning and data
mining. At the one end is completely unsupervised learning or
clustering, and at the other end is supervised learning where the target
output is known for every instance.

This workshop aims to explore the space between these extremes, with
particular attention to a variety of real-world applications, and
sources of labels. Techniques that have been proposed include learning
from unlabeled data with hints, learning from unlabeled and
positive-only labeled data, learning from distantly and noisily labeled
data, combining labeled and unlabeled data with cotraining, EM and other
semi-supervised techniques, and transductive learning, where the test
data is added as an additional source of unlabeled data. The possible
sources of labels and hints are also broad. Systematic hand-labeling,
labels acquired through active learning, and hints derived from domain
knowledge are among the techniques that may be used.

Papers addressing novel types of data, methods of diagnosing when
unlabeled data will help and when it will hinder, and applying
techniques across multiple application domains and multiple levels of
supervision are particularly encouraged. Papers discussing the
acquisition of labels from real-world experts in real-world data mining
problems are also encouraged. Data mining practitioners working on
real-world problems with large amounts of captured/stored data but a
high cost labeling process are encouraged to submit problem descriptions
and possible solutions.

Workshop Format
The workshop will consist of both regular paper presentations, and
debates.

Regular Papers
Regular papers can be up to eight pages, and may address work in
progress. Papers should be in the format required for ICML submissions.
The formatting instructions can be found at
<http://www.hpl.hp.com/conferences/icml03/formats/index.html>
http://www.hpl.hp.com/conferences/icml03/formats/index.html.

Problem Descriptions from Machine Learning/Data Mining Practitioners
Papers of one to two pages describing a problem domain you have
encountered or dealt with where training data and/or labels are very
expensive or hard to obtain. The paper would present a problem
statement, give background on the domain, and list sources and amount of
available training data. We hope these types of papers will encourage
participation from people working on practical applications where
unlabeled data can potentially be valuable but is not currently
utilized. We hope to devote a session in the workshop to discuss these
problems and brainstorm possible solutions and ways to use unlabeled
data for the problems posed in these papers.

Debate Position Papers
Two-page position papers on either side of the following topics are
solicited. Accepted papers will be published in the workshop
proceedings, and authors will be expected to debate their position.
Topics not on this list are also acceptable, if you can coherently argue
both sides, or can encourage a colleague to submit the opposing
position.

   - Unlabeled data is only useful when there are a large number of
redundant features.
   - Why doesn't The No Free Lunch Theorem apply when working with
unlabeled data?
   - Unlabeled data has to come from the same underlying distribution as
the labeled data.
   - Can unlabeled data be used in temporal domains?
   - Feature engineering is more important than algorithm design for
semi-supervised learning.
   - All the interesting problems in semi-supervised learning have been
identified.
   - Active learning is an interesting "academic" problem.
   - Active learning research without user interface design is only
solving half the problem.
   - Using Unlabeled data in Data Mining is no different than using it
in Machine Learning.
   - Massive data sets pose problems when using current semi-supervised
algorithms.
   - Off-the-shelf data mining software incorporating labeled and
unlabeled data is a fantasy.
   - Unlabeled data is only useful when the classes are well separated.

Submissions should be sent by May 1, 2003 as PDF or PostScript files to
Rayid.Ghani at accenture.com.

Organizers
Rayid Ghani
Accenture Technology Labs, 161 N. Clark St, Chicago, IL 60601
rayid.ghani at accenture.com
+1 (312) 693-6653

Rosie Jones
Overture Services, 74 N. Pasadena Ave 3F, Pasadena, CA 91107
rosie.jones at overture.com
+1 (626)229-8536

Chuck Rosenberg
Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213
chuck at cs.cmu.edu
+1 (412) 268-8078

Program Commitee
Kristin Bennett, Rennselear Polytechnic Institute
Mark Craven, University of Wisconsin
Zoubin Ghahramani, Gatsby Computational Neuroscience Unit, UCL
Sally Goldman, Washington University, St. Louis
Tony Jebara, Columbia University
Thorsten Joachims, Cornell University
Stefan Kremer, University of Guelph
Bing Liu, National University of Singapore
Andrew McCallum, University of Massachusetts
Ray Mooney, University of Texas, Austin
Ion Muslea, University of California, Irvine
Kamal Nigam, IntelliSeek
Ellen Riloff, University of Utah
Dale Schuurmans, University of Waterloo
Martin Szummer, Microsoft Research, Cambridge
Sarah Zelikovitz, City University of New York
Tong Zhang, IBM Research, Yorktown Heights

Rayid Ghani
Accenture Technology Labs
312-693-6653
www.accenture.com/techlabs/ghani



More information about the Corpora mailing list