The Search Engine Meeting

April 23-24, 2007
(Preconference Workshops: Sunday, April 22)
2007 Links
Daily Schedule
Day One
Day Two
Past Shows




General Conference - Day One: Monday, April 23, 2007
PreConference Day One Day Two
Collaborative Web Search: Social & Personal
9:00 am – 10:00 am
Barry Smyth, University College Dublin

This presentation focuses on how so-called personalization techniques are being used in response to the information overload problem and the experiences gained and lessons learned when it comes to the deployment of these techniques. In particular, it focuses on the personalization of Web search, taking special care to consider the important privacy issues that such personalization brings to the fore. These issues motivate a unique approach to personalized, social Web search -- Collaborative Web Search (CWS) -- which focuses on the delivery of personalization at the level of a community of like-minded searchers. In particular, we describe how the past searching behaviour of communities of like-minded searchers can be harnessed to improve search results in the future by allowing a search engine to adapt to the particular needs of a specific community of searchers. We also describe how different communities of searchers can usefully collaborate and co-operate. The presentation considers the core techniques that have been developed, key evaluation results, and the lessons that have been learned from early-stage deployments in the field.

Personalizing Search: Promise and Pitfalls
10:00 am – 10:30 am
Jayendu Patel, ChoiceStream

In the 1990s, online search addressed the need to retrieve content quickly from a rapidly growing inventory of worldwide-web pages. It relied on mechanically matching a search query to words on the pages, though accommodating fuzzy queries and manually created taxonomies. Developments in the early 2000s focused on sorting the mechanical matches by a measure of overall relevance of content, with Google's PageRank method becoming the most successful commercial solution. Search circa 2007 is broaching the chasm to personalized relevance of search results. This includes surfacing of simple-to-use contextually-relevant controls that are increasingly more accurate and effective. The new search paradigm accounts for personal interests and intentions whenever reliably inferable/explicit profiles are available. Lessons from the development toward this search paradigm are shared, both within the context of applications to general search as well as to the narrower context of eCommerce.

A Dynamic Methodology for Improving the Search Experience
10:30 am – 10:55 am
Marcia Kerchner, MITRE Corp.

In the early years of modern information retrieval, the fundamental way in which we understood and evaluated search performance was by measuring precision and recall. In recent decades, however, models of evaluation have expanded to incorporate the information seeking task and the quality of its outcome, as well as the value of the information to the user. We have developed a systems engineering-based methodology for improving the whole search experience. The approach focuses on understanding the users’ information-seeking problems, understanding who has the problems, and applying solutions that address these problems. This information is gathered through ongoing analysis of site usage reports, satisfaction surveys, Help Desk reports, and a working relationship with the business owners. A case study for a major US government public website is described in this presentation.

Break & Exhibition
10:55 am – 11:15 am
Can Text Mining Save Enterprise Search?
11:15 am – 12:45 pm
Moderator: Stephen E. Arnold,
David Bean Ph.D., Attensity
Brad Allen, Siderean Software
Justin Langseth, Clarabridge

The media and developers use the word "search" with considerable freedom. Mention the notion of relevance, and people point to the magic of Google and then criticize the search system in a commercial enterprise. Search vendors have embraced text mining as the solution to the problems of "traditional search", and it is becoming clear that precision and recall have been buried under a snowstorm of facets, concepts, taxonomies and metadata.

Systems and software that discover "hidden nuances" give the appearance of revealing more from the content, but at the same time, these systems provide few tools to identify and locate data sets that disappear, manipulate versions of content objects, locate a specific document that is known to have been indexed but cannot be located using key words or concepts, or understand why groups of documents are reported as being related but are not, at least from the user's point of view.

In this panel session, introduced and commentated by Stephen Arnold, three search industry leaders examine where we are going and suggest a pathway to the next era of enterprise search.

Lunch & Exhibition
12:45 pm – 2:15 pm
Knowledge-Oriented Navigation in Textual Content
2:15 pm – 2:45 pm
Pascal Coupet, TEMIS Inc.

Search is all about answers. Beyond matching documents, users are looking for contextualized and localized information, usually encapsulated in a few sentences locked in documents. Search engines are focusing on the ideas of document retrieval and keyword or pattern matching. And while faceted navigation provides some enhancements, the user's experience remains document-centric. Knowledge-oriented navigation offers an innovative way to access information beyond documents.

This presentation describes a new platform built on IBM's UIMA framework, combining deep annotation of textual content and advanced analytics capabilities to offer knowledge-oriented navigation. Using the IBM's open source project enables this platform to seamlessly integrate academic and commercial analytics and provide a brand new user experience for information retrieval and discovery. Using case studies and real-life examples, the presentation discusses how knowledge-centric navigation can give the "Google user" (non-expert user) access to advanced methods of analysis, without their traditional drawbacks.

Enabling Effective Human-Machine Interaction
2:45 pm – 3:15 pm
Francois Bourdoncle, Exalead

In this information-intensive era, it is nearly impossible to avoid interaction with machines. Machines are our information gatekeepers. As a result, the process of information search and access has become an essential interaction technology. Unfortunately, for many users this process is frustrating, especially when they do not know exactly what they are searching for, nor where to search. We have begun to see a rise in the number of social search engines which offer a level of human-human interaction through a machine. But the problem with social search is that in certain environments, in a business, for example, users depend on objective, organic search results.

This presentation offers a technical perspective on the latest technologies to facilitate effective human-machine interaction that permit a more natural search experience for the user. It will also offer insight into first-hand experience in redesigning the search engine to allow users to interact dynamically with the search engine so they can search by serendipity. In addition, the presentation discusses the benefits and challenges of the technologies employed, including entity extraction, real-time indexing, taxonomies and navigation.

Shifting the Long Tail: The Thick Neck of Customer Intent
3:15 pm – 4:00 pm
Edwin Cooper, InQuira, Inc.

Visitors to company websites enter a large number of unique search queries. With so many possible queries, it is not feasible to tune results for each query individually. At the same time, there are usually a small number of overall intents shared by the majority of website visitors. By using natural language technology to recognize the overall intent of each visitor, we make it feasible to have a powerful impact on user experience, with a minimal investment in time and resources. By knowing the intent of the visitor, the maintainer of the website can gain understanding of user needs, focus content creation, and affect the user experience in exactly the place where a change will have the highest impact.

This presentation describes the methods by which industry-specific sets of intents are created, the way in which they are measured for accuracy, and how they are being used today by several leading enterprise websites.

Break & Exhibition
4:00 pm – 4:30 pm
From Broad to Vertical Search: Retrieving Information from the Deep Web
4:30 pm – 5:00 pm
Olivier Scheffer, Digimind

Querying the Deep Web usually requires the manual programming of descriptive files and connectors for each search engine. This presentation describes new technology that uses an algorithm that automatically connects to selected search engines and extracts the results by learning search engines' queries and extraction rules. The advanced algorithm is based on mathematical topological and probability theories from the artificial intelligence research field. The other part of the automatic connector is the results extraction algorithm, used here to recognise and extract the results on any results page automatically.

Pro-Active Question-Answering
5:00 pm – 5:30 pm
Elizabeth Liddy, School of Information Studies, School of Information Studies at Syracuse University

Question-answering is the well-known application that goes one step further than document retrieval and provides the specific information asked for in a natural language question. Question-answering provides not simply a listing of URLs that have all or some of the keywords input by the user -- and links to pages that contain undifferentiated types of information that the user must then search individually to find the information they need -- but rather, question-answering recognizes the specific aspect of the topic or relationships between concepts that are being asked about and provides either short answers or answer-providing snippets that address the specific question that was asked.

QA is proving to be a highly desirable capability, with particular utility in the enterprise setting of closed-domain question-answering. We have extended the capability to retrieve answers to questions to include the inverse capability. That is, the system tracks who has been asking what questions, and then matches recently acquired documents or newly produced reports to the standing information needs of users within an enterprise whose history suggest that the information in a new report would be useful. This capability is similar to what years ago was referred to as "Selective Dissemination of Information", but is now done with a great deal more precision due to richer NLP-based representations of both queries and new documents, and more sophisticated matching algorithms.

Large Scalability Applications of Natural Language Processing
5:30 pm – 6:00 pm
Dr. Yves Schabes, Teragram, A SAS Company

It is now obvious that the natural language text is available to anybody at a scale that nobody could image only ten years ago. But this is not the only dimension of language scalability. Language is now analyzed at a very crude level and any attempt to beyond this simplicity involves the scalability of the language model. The scale of the two dimensions is what makes the solution to this problems challenging. Although the processing speed of CPUs has increased in an exponential way, the scalability requirements of the natural language processing applications have grown accordingly. In this presentation, we illustrate some techniques and applications of natural language processing at a very large scale. New applications of large scale natural language processing techniques will most likely be underlying the next generation of information retrieval applications. Scalability requirements cannot be an afterthought. Every single NLP module must be designed from the beginning to address the required scale and speed.

Conference Mixer Cocktail
6:00 pm – 8:00 pm


© 2009 - 2018, Information Today, Inc. Privacy/Cookies Policy