Livre: Next Generation Search Engines: Advanced Models for Information

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Wed Apr 18 16:45:52 UTC 2012

Date: Sat, 14 Apr 2012 19:15:54 +0200
From: Jouis Christophe <Christophe.Jouis at>
Message-ID: <4F89B0CA.8080007 at>
The contents of:

*Next Generation Search Engines: Advanced Models for Information

© 2012; Publication Date: March 2012; 560 pages

ISBN: 978-1-4666-0330-1; EISBN: 978-1-4666-0331-8

Published by IGIPublishing, Hershey-New York, USA

Editors: Christophe Jouis, Universite Paris III, France and
LIP6-Universite Pierre et Marie Curie, France; Ismail Biskri, Universite
du Quebec A Trois Rivieres, Canada; Jean-Gabriel Ganascia, LIP6 and
CNRS-Universite Pierre et Marie Curie, France; and Magali Roux, LIP6 and
CNRS-Universite Pierre et Marie Curie, France




Indexing the World Wide Web: The Journey So Far

Abhishek Das, Google Inc., USA

Ankit Jain, Google Inc., USA

As the World Wide Web has grown, one notes a significant change and
improvement in technologies of indexation. In this chapter, the authors
describe in detail the key indexing technologies behind today's
web-scale search engines. They are used to provide a better
understanding of how web indexes are utilized. An overview of the
infrastructure needed to support the growth of web search engines to
modern scales is also given. Finally, the authors outline the potential
future directions for search engines, particularly in real-time and
social contexts.

To obtain a copy of the entire chapter, click on the link below.


Decentralized Search and the Clustering Paradox in Large Scale 
Information Networks

Weimao Ke, College of Information Science and Technology, Drexel 
University, USA

The Web poses great challenges for information retrieval because of its 
size, dynamics, and heterogeneity. Centralized IR systems are becoming 
inefficient in the face of continued Web growth and a fully distributed 
architecture seems to be desirable. Without a centralized information 
repository and global control, a new distributed architecture can take 
advantage of distributed computing power and can allow a large number of 
systems to participate in the decision making for finding relevant 
information. In this chapter, the author presents a decentralized, 
organic view of information systems pertaining to searching in 
large-scale networks. The Clustering Paradox phenomenon is discussed.

To obtain a copy of the entire chapter, click on the link below.


Metadata for Search Engines: What can be Learned from e-Sciences?

Magali Roux, Laboratoire d'Informatique de Paris VI, France

Petabytes of data are generated by data-intensive sciences, also known 
as e-sciences. These data have to be searched to further perform 
multifarious analyses, including disparate data aggregation, in order to 
produce new knowledge. To achieve this, e-sciences have developed 
various strategies, mostly based on metadata, to deal with data 
complexity and specificities. In this chapter, Nuclear Physics, 
Geosciences and Biology, which are three seminal domains of e-sciences, 
are considered with regards to the strategies they have developed to 
search complex data. Metadata, which are data about data, were given a 
pivotal role in most of these approaches. The structure and the 
organization of metadata-based retrieval approaches are discussed.

To obtain a copy of the entire chapter, click on the link below.


Crosslingual Access to Photo Databases

Christian Fluhr, GEOL Semantics, France

For several years, normalized vocabulary has provided an unambiguous 
description of photos for users' queries. One could imagine that indexes 
are made by professionals that control normalized vocabulary. However, 
according to the author, this is only an ideal view far from the reality 
of the actual indexation process. The description of photos is done by 
photographers who have no knowledge of information retrieval or of 
normalized vocabulary. Moreover, the description does not take into 
account aspects such as semantic ambiguities, cross-lingual querying, 
etc. In this chapter, the author presents an experience in which all 
these limitations are avoided.

To obtain a copy of the entire chapter, click on the link below.


Fuzzy Ontologies Building Platform for Semantic Web: FOB Platform

Hanêne Ghorbel, University of Sfax, Tunisia

Afef Bahri, University of Sfax, Tunisia

Rafik Bouaziz, University of Sfax, Tunisia

To improve the quality of information retrieval systems, a lot of 
research has been conducted over the last decade, which resulted in the 
development of Semantic Web techniques. It includes models and languages 
for the description of Web resources on the one hand and ontologies for 
describing resources on the other hand. Although ontologies mainly 
consist of hierarchical descriptions of domain concepts, some domains 
cannot be precisely and adequately formalized in classic ontology 
description languages. To overcome those limitations, promising research 
is being conducted on fuzzy ontologies. In this chapter, the authors 
propose a definition for a fuzzy ontological model based on fuzzy 
description logic, along with a methodology for building fuzzy 
ontologies and platforms.

To obtain a copy of the entire chapter, click on the link below.



Searching and Mining with Semantic Categories

Brahim Djioua, University of Paris-Sorbonne, France

Jean-Pierre Desclés, University of Paris-Sorbonne, France

Motasem Alrahabi, University of Paris-Sorbonne, France

In this chapter, the authors present a new approach for the design of 
web search engines that uses semantic and discourse annotations 
according to certain points of view, which has the advantage of focusing 
on the user interests. The semantic and discourse annotations are 
provided by means of the contextual exploration method. This method 
describes the discursive organization of texts by using linguistic 
knowledge present in the textual context. This knowledge takes the form 
of lists of linguistic markers and contextual exploration rules of each 
linguistic marker. The linguistic markers and the contextual exploration 
rules can help to retrieve relevant information like causality 
relations, definitions of concepts or quotations, etc., which are 
difficult to capture with classical methods using keywords.

To obtain a copy of the entire chapter, click on the link below.


Semantic Models in Information Retrieval

Edmond Lassalle, Oranges Labs, France

Emmanuel Lassalle, Université Paris 7, France

In this chapter, the authors propose a new descriptive model for 
semantics dedicated to Information Retrieval. Every object is considered 
as a concept. Indeed, the model associates concepts to words. It 
analyzes every word of a document within its context and translates it 
into a concept, which will be the meaning of the word. The model is 
evaluated and documents are classified in categories by using their 
conceptual representations.

To obtain a copy of the entire chapter, click on the link below.


The Use of Text Mining Techniques in Electronic Discovery for Legal Matters

Michael W. Berry, University of Tennessee, USA

Reed Esau, Catalyst Repository Systems, USA

Bruce Kiefer, Catalyst Repository Systems, USA

In this chapter the authors discuss the electronic discovery 
(eDiscovery), which consists of the process of collecting and analyzing 
electronic documents to determine their relevance to a legal matter. At 
first glance, the large volumes of data needed to be reviewed seem to 
lend themselves very well to traditional informational retrieval and 
text mining techniques. However, the noisy and ever-changing aspects of 
the collections of documents and the particularities of the domain cause 
the results to be inconsistent using existing tools. Therefore, new 
tools that take these specific elements into consideration need to be 
developed. Starting with the history of the collection process of legal 
documents, the authors then examine how text mining and information 
retrieval tools are used to deal with the collection process and further 
propose some research directions to improve it, such as collaborative 
filtering and cloud computing.

To obtain a copy of the entire chapter, click on the link below.


Intelligent Semantic Search Engines for Opinion and Sentiment Mining

Mona Sleem-Amer, Pertimm, France

Ivan Bigorgne, Lutin, France

Stéphanie Brizard, Arisem, France

Leeley Daio Pires Dos Santos, EDF, France

Yacine El Bouhairi, Thales, France

Bénédicte Goujon, Thales, France

Stéphane Lorin, Thales, France

Claude Martineau, LIGM, France

Loïs Rigouste, Pertimm, France

Lidia Varga, LIGM, France

With the tremendous rise in popularity of social media web over the last 
few years, enterprises are showing more and more interest in the 
exploitation of opinions and sentiments expressed by the users about 
their products and services in the content of social media. Indeed, it 
contains precious and strategic data for product marketing and business 
intelligence. However, conventional search engines are inadequate for 
this task, as they are not designed to retrieve these particular kinds 
of data. Consequently, the field of opinion mining and retrieval is 
getting increasing amounts of attention. In this chapter, the authors 
present the Doxa project, a work in progress that aims to build a 
semantic enterprise search engine with integrated business intelligence 
technology and state of the art opinion and sentiment extraction, 
analysis and querying of electronic text in French.

To obtain a copy of the entire chapter, click on the link below.



Human-Centred Web Search

Orland Hoeber, Memorial University of Newfoundland, Canada

In the Internet era, searching information on the Web has become an 
essential part of the lives for many people. Research on information 
retrieval in recent years has mainly focused on addressing issues such 
as document indexation, document ranking and on providing simple and 
quick means to search the Web, in an attempt to provide fast and 
high-quality results to user queries. Despite the great progress made in 
regard to those aspects and the success of many search engines, people 
still commonly have difficulties retrieving the information they are 
seeking, especially when they are unable to formulate an appropriate 
query or are overwhelmed by results. More needs to be done to include 
the user into the search process and assist them into the crafting and 
refinement of their queries and the exploration of the results. This 
chapter discusses the state-of-the-art research in the field of 
human-centered Web search.

To obtain a copy of the entire chapter, click on the link below.


Extensions of Web Browsers Useful to Knowledge Workers

Sarah Vert, Centre Virtuel de la Connaissance sur l'Europe (CVCE), 

In this chapter the author illustrates the customization of the web 
browser from the perspective of users who work at any of the tasks of 
using, planning, acquiring, searching, analyzing, organizing, storing, 
programming, distributing, marketing, or otherwise contributing to the 
transformation and commerce of information. In fact, the browser and its 
various possible parameterizations seem to be an important factor that 
allows a user to better meet its task. An analysis of the customization 
of web browsers for knowledge workers is proposed. It demonstrates that 
a browser offering the possibility of add-ons is an application that is 
highly adaptable in meeting the specific requirements of its users.

To obtain a copy of the entire chapter, click on the link below.


Next Generation Search Engine for the Result Clustering Technology

Lin-Chih Chen, National Dong Hwa University, Taiwan

When using search engines, users tend to input very short and thus often 
ambiguous queries. Therefore, identifying the correct user's search 
needs is not always an easy task. In order to solve this issue, the next 
generation of search engines will assist the users in dealing with large 
sets of results by offering various post-search tools such as result 
clustering, which has received a lot of attention recently. It consists 
of clustering search results into a hierarchical labeled tree so the 
users can customize their view of search results by navigating through 
it. In this chapter, the author presents WSC, a high-performance result 
clustering system, based on a mixed clustering method and a genuine 
divisive hierarchical clustering algorithm to organize the labels into a 
hierarchical tree. The author also shows that WSC achieves better 
performances than current commercial and academic systems.

To obtain a copy of the entire chapter, click on the link below.


Using Association Rules for Query Reformulation

Ismaïl Biskri, University of Quebec at Trois-Rivieres, Canada

Louis Rompré, University of Quebec at Montreal, Canada

To express their needs, users formulate queries that often take the form 
of keywords submitted to an information retrieval system based either on 
a Boolean model, on a vector model, or on a probabilistic model. It is 
often difficult for users to find key words that express their exact 
needs. In many cases, the users are confronted on the one hand with a 
lack of knowledge on the subject of interest in their information search 
and on the other hand with biases that may affect the results. Thus, 
retrieving relevant documents in just one pass is almost impossible. 
There is a need to carry out a reformulation of the query either by 
using completely different keywords, or by expanding the initial query 
with the addition of new keywords. In this chapter, authors present a 
semi-automatic method of reformulation of queries based on the 
combination of two methods of data mining: text classification and 
maximal association rules.

To obtain a copy of the entire chapter, click on the link below.


Question Answering

Ivan Habernal, University of West Bohemia, Czech Republic

Miloslav Konopík, University of West Bohemia, Czech Republic

Ondr(ej Rohlík, University of West Bohemia, Czech Republic

In order to provide a more sophisticated and satisfactory answer to 
informational needs, question answering systems aim to give one or more 
answers in the form of precise and concise sentences to a question asked 
by a user in natural language, instead of only a set of documents as a 
result to a query as in a traditional retrieval information system. 
Therefore, Question Answering systems rely heavily on natural language 
processing techniques for syntactic and semantic analysis and for the 
construction of appropriate answers. This chapter presents the state of 
the art in the field of question answering, within which the authors 
cover all types of promising QA systems, techniques and approaches for 
the next generation of search engines, focusing mainly on systems aimed 
at the (semantic) web.

To obtain a copy of the entire chapter, click on the link below.


Finding Answers to Questions, in Text Collections or Web, in Open Domain 
or Specialty Domains

Brigitte Grau, LIMSI-CNRS and ENSIIE, France

This chapter is dedicated to factual question-answering in open domains 
and in specialty domains. In querying a database, it is expected that 
factual questions will yield short answers that give precise 
information. However, with a web environment, topics are not limited and 
knowledge is not structured. Finding answers requires analyzing texts. 
In fact, the problem of finding answers to questions consists of, in 
this context, extracting a piece of information from a text. In this 
chapter, the author presents question-answering systems that extract 
answers from web documents in a fixed multilingual collection.

To obtain a copy of the entire chapter, click on the link below.


Context-Aware Mobile Search Engine

Jawad Berri, College of Computing and Information Sciences, King Saud 
University, Saudi Arabia

Rachid Benlamri, Lakehead University, Canada

The recent emergence of mobile handsets as a new means of information 
exchange has led up to the need for information retrieval systems 
specialized for mobile users. Lately, a lot of efforts have been put 
into the development of robust mobile search engines capable of 
providing attractive and practical services to mobile users, such as 
tools that provide directions to business locations according to the 
user location or voice speech search that uses speech recognition 
technologies. However, the capabilities of current mobile search engines 
are still limited. In particular, enhancements are made possible by 
exploiting information about the current context of the users and 
providing this to search engines to improve the relevance of the 
results. In this chapter, a context model and an architecture that 
promote the integration of contextual information are presented through 
a case study.

To obtain a copy of the entire chapter, click on the link below.


Spatio-Temporal Based Personalization for Mobile Search

Ourdia Bouidghaghen, IRIT-CNRS-University Paul Sabatier of Toulouse, France

Lynda Tamine, IRIT-CNRS-University Paul Sabatier of Toulouse, France

The explosion of information available on the Internet and its 
heterogeneity has considerably reduced the effectiveness of traditional 
information retrieval systems. In recent years, much research has been 
devoted to develop contextual information retrieval technologies. 
Moreover, from the proliferation of new means of communication and 
information access, such as mobile devices, have emerged new needs in 
IR. In this chapter, the authors discuss this specific issue with 
respect to mobile information retrieval, followed by a presentation of a 
model of spatio-temporal-based personalization for mobile search, using 
contextual data such as location and time in order to dynamically select 
the most appropriate profile from a given situation. Each profile 
contains user interests learnt according to searches in past individual 
explorations. They also propose a novel evaluation scenario for mobile 
search based on diary study entries.

To obtain a copy of the entire chapter, click on the link below.



Studying Web Search Engines from a User Perspective: Key Concepts and 
Main Approaches

Stéphane Chaudiron, University of Lille 3, France

Madjid Ihadjadene, University of Paris 8, France

In this chapter, the user perspective is highlighted. Some recent 
challenges in search engine evolution change users' information 
behavior. The authors identify four major trends in the "user-oriented 
approach" that focus respectively on strategies and tactics, cognitive 
and psychological approaches, management, and consumer and marketing 
approaches. However, the authors note that there is a need to better 
understand the dynamics and the nature of the interaction between Web 
searching and users. Also, other aspects such as ethics, cultural 
issues, growing social networks, etc. need to be considered.

To obtain a copy of the entire chapter, click on the link below.


Artificial Intelligence Enabled Search Engines (AIESE) and the Implications

Faruk Karaman, Gedik University, Turkey

Nowadays, search engines constitute the main means of classifying, 
sorting, and delivering information to users over the Internet. As time 
progresses, advances in Artificial Intelligence will be made and thus 
new artificial intelligence technologies will be developed to enhance 
the sophistication of the search engines. This future generation of 
search engines, called artificial intelligence enabled search engines, 
will be compelled to play an even more crucial role for information 
retrieval, but this will not be without any consequences. Through this 
chapter, the author analyzes the concept of technological singularity, 
discusses the direct and indirect impacts of the development of new 
technologies and artificial intelligence, notably regarding search 
engines, and proposes a four-stage evolution model of search engines.

To obtain a copy of the entire chapter, click on the link below.


A Framework for Evaluating the Retrieval Effectiveness of Search Engines

Dirk Lewandowski, Hamburg University of Applied Sciences, Germany

The evaluation of information retrieval systems and search engines in 
development or already on the market is a crucial process for the 
improvement of the quality of the search results. Quality measures for 
most evaluations consist of calculating precision and recall using a set 
of ad-hoc queries and assume that common users examine every result 
returned by a search engine in the same order they are presented. While 
this may be true in some contexts, it has been shown that it is not 
necessarily the case in Web searches, where modern Web search engines 
present results in various and enriched forms and where the users are 
typically interested only in a few highly relevant results and examine 
them as they see fit. Therefore, there is a need for new extended 
evaluation models for Web search engines. To this end, the author 
proposes a framework for evaluating the retrieval effectiveness of 
next-generation search engines.

To obtain a copy of the entire chapter, click on the link below.


Hardcover Price: $195.00

Online Perpetual Access Price: $295

Print + Online Perpetual Access Price: $390

Available for purchase on IGI Global's Web site at:

Also available through major online book retailers such as Amazon and 
Barnes & Noble

This book is also included in the IGI Global aggregated 
*"InfoSci-Books"* database:


Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list