[Corpora-List] Ph.D. position: Focused Web Search
Djoerd Hiemstra
hiemstra at cs.utwente.nl
Mon Aug 7 11:54:06 UTC 2006
Open Ph.D. position: Focused Web Search
The Database Group of the University of Twente has an opening for a
fully funded four-year PhD position in the Effort project. Effort is a
joined project with the Information and Language Processing Group of the
University of Amsterdam and is financed by the Netherlands Organisation
for Scientific Research (NWO). The project will develop an approach for
combining multiple representations of web information -- such as web
directories and specialized search engines targeting a specific domain
-- in a common framework based on statistical language models. In this
framework it will be possible, for example, to derive models of the
actual language-use of web pages to distinguish between arts, business,
entertainment, education, etc. Similarly, it will be possible to derive
models of the structure of web pages to distinguish between blogs, FAQs,
personal web pages, cultural heritage pages, etc. The envisaged
techniques have to be robust to all kinds of errors, ranging from
imperfect information extraction techniques to imprecise queries
formulated by the average web search engine user. An important aspect of
any new technique in a web-setting is that they have to scale up to
terabyte-sized collections. We plan to develop so-called parsimonious
models to derive compact representations and to handle dependencies
between representations of the data. The PhD student will be involved in
designing, prototyping and evaluating such new search solutions.
Successful prototypes should be made available as open source software.
Additional information can be obtained here:
* Official announcement by the University:
http://www.utwente.nl/vacatures/vacatures_externe_werving/06-068-eng.doc
* Project web site hosted by the University of Amsterdam:
http://www.science.uva.nl/~kamps/effort
--
Djoerd Hiemstra
University of Twente
Department of Computer Science
PO Box 217, 7500 AE
Enschede, The Netherlands
URL: www.cs.utwente.nl/~hiemstra
Tel: +31 53 4892335
More information about the Corpora
mailing list