|
|
|
XML Retrieval: Problems and Potential
|
9:00 am
–
9:30 am
Charles Clarke, David R. Cheriton School of Computer Science, University of Waterloo
While XML is not an ideal vehicle for capturing and exploiting document structure in search, it does provide a common ground for addressing a number of related retrieval problems across unrelated document types and collections. Using examples of retrieval over collections of books and journals, this talk outlines methods for focused retrieval: returning the right parts of documents, not just the right documents. In the case of books and journals, these parts may range from paragraphs and pages to entire volumes. The talk also discusses the evaluation of these focused retrieval methods. In particular, the talk describes INEX (the INitiative for the Evaluation of XML Retrieval) an ongoing forum for evaluating focused retrieval. Now in its sixth year, INEX annually brings together an international group of researchers to compare methods using common test collections.
|
|
Beyond Search: Big Money and User Dissatisfaction as Catalysts for Next-Generation Search Solutions
|
9:30 am
–
10:00 am
Stephen E. Arnold, ArnoldIT.com
There are more than 200 companies offering "search" solutions. Some of these are newcomers unknown in the US. Paris-based PolySpot and Budapest-based Tesuji are just two. The $1.2 billion buy out of Fast Search & Transfer, more than Google's advertising billions, has ignited consolidation in search. It is not just money. New research funded by the U.S. government reveals that three out of five enterprise search system users are dissatisfied or very dissatisfied with their present search system. A new study for the Gilbane Group identifies facets of the search business that receive scant attention:
- Universities are funding search ventures. The goal is not technology transfer. The objective is to create value and, hence, revenue beyond the traditional research grant and licensing models.
- Newcomers are making inroads against far larger vendors. Companies such as Coveo, Exalead, ISYS and Siderean Software are tallying double digit growth by finding new markets and capturing business from far-larger, higher-profile vendors. Companies like Bitext in Madrid generate revenue by providing established vendors with a way to add natural language processing to their ageing systems, as dtSearch has done with its Bitext relationship.
- The shift to rich text processing via semantic and statistical techniques is now taking place. Although slow to take off, the "assisted navigation" interface pioneered by Endeca is now giving way to an information dashboard. Endeca's lock on the point-and-click interface and "suggestions" is being challenged by dozens of companies.
- Search is no longer an option. It is an expected component of other enterprise applications. As a result, search-and-retreival, findability solutions, and social search are part of the standard enterprise software vendor's product functionality. IBM, Microsoft, Oracle and SAP are pushing downmarket.
The cumulative effect of these trends has significant implications for vendors, procurement teams and users. The two principal changes catalyzed by these trends are that users want increasingly intelligent systems, thus triggering significant opportunities for vendors and an increasing flow of funds into new technologies that will ensure a fast-changing, unstable market for the foreseeable future
|
|
Search as a Mode of Learning: Requirements for Next Generation Search Systems
|
10:00 am
–
10:30 am
Steven Forth, Monitor Group Amelia Newbury, Monitor Group
In addition to hearing, watching, experiencing and other popular modes of learning, search is an important and fundamental mode for understanding complex subject areas. Learning in complex fields can be understood as the building of concept maps through the exploration of a knowledge space. As learning is often a social act, and learners need to be able to communicate and share their concept maps, there are important social and communication issues at stake here as well. Search provides a compelling way to explore multi-dimensional spaces. This is both a common and growing use of search systems. But the use of search for learning and social learning impose new requirements on search system. These needs go beyond the simple optimization of 'findability' of a known piece of information or even a sampling of possibly relevant results. When social and communication aspects are factored in, the current generation of search systems do not provide adequate support for learners. Among other things, search systems need to factor in the "white spaces" in the concept maps and influence the presentation of results to fill these spaces. This presentation looks at a number of common patterns in search to see how these support learning, and then develops a set of requirements for a search system that provides better support for social learning in complex fields. eMonitor's experiments in the use of semantic constructs to organize learning and performance content to improve searchability will also be discussed.
|
|
Conference Break
|
10:45 am
–
11:15 am
|
|
Blending Retrieval and Categorization Technologies in a Document Recommender System
|
11:15 am
–
11:45 am
Peter Jackson, The Thomson Corporation
The task of recommending documents to knowledge workers differs from the task of recommending products to consumers. Variations in search context can undermine the effectiveness of collaborative approaches, while many professionals function in an environment where the open sharing of information may be impossible or undesirable. There is also the 'cold start' problem of how to bootstrap a recommendation capability in the absence of current usage statistics. We describe a fully fielded system called ResultsPlus, which uses a blend of information retrieval and machine learning technologies to recommend secondary materials to attorneys engaged in primary law research based on document metadata. Rankings of recommended material are subsequently enhanced by incorporating both historical user behavior and document usage data.
|
|
Forget “One Size Fits All,” Search is an Iterative Process
|
11:45 am
–
12:15 pm
Terry Clift, ISYS Search Software
Taking a high-level, strategic approach to enterprise search might make sense in specific cases, but it should not be done at the exclusion of tactical deployments that pay immediate dividends while the “big rollout” is still in the configuration stage. Search is an iterative process that, when done correctly, lives and breathes and conforms to the requirements of various user communities over time. Understanding these users, their needs and environments, and the intended goals for search, lays the crucial foundation for any successful implementation. Vendors and customers only set each other up for failure when they assume environments are rigid and that search must have all of the answers from the beginning.
This presentation discusses the iterative approach to search and illustrates the best short-term and long-term strategies for bringing enterprise search into an organization. The process begins with how to identify where search can benefit immediately. The presentation then outlines steps to rolling out search implementations for broader requirements and how to generate lasting, long-term gain without sacrificing the short term.
|
|
Semantic Retrieval: Making the Computer do the Heavy Lifting
|
12:15 pm
–
12:45 pm
Roger Bradford, Agilex Technologies
This presentation covers the range of modern applications of semantic processing to information retrieval. The emphasis is on techniques that reduce the cognitive load on the user. Techniques covered include conceptual retrieval, clustering and categorization, on-the-fly taxonomy generation, and text mining. Examples are taken from applications in industry and government. These applications include patent analysis, legal data discovery, and counter-terrorism analysis. The discussion includes multi-lingual and cross-lingual applications. Although the emphasis is on text, extensions to audio and video data are included. New results are presented that demonstrate the applicability of these techniques to very large document collections.
|
|
Lunch Break
|
12:45 pm
–
2:15 pm
|
|
Search Trails - Back to the Future
|
2:15 pm
–
2:45 pm
Nigel Hamilton, Trexy
Each day millions search for the same things and often find themselves repeating their own searches. Would it not be good if we could harness this collective effort and remember the searches and the web pages visited to find information?
This presentation explores how new social search tools impact and assist the online searching community. Trexy.com remembers search trails and shares them anonymously with other searchers. Search trails are the pathways users make when searching on engines such as Google, Yahoo and MSN. But what is the optimal trail for a given search? How can we pass useful trails onto one another? Can search trails help users to pinpoint information? The presentation looks at the technical developments that have led to how we currently view, retrieve, and remember information online.
|
|
Using Information Retrieval and NLP techniques to drive Business Intelligence
|
2:45 pm
–
3:15 pm
George Chitouras, Text Analytics Product Group, Business Objects
While traditional Business Intelligence has transformed business using structured information from operational applications and transaction systems, there is a huge source of information that has by and large been ignored: people’s thoughts and opinions, found in communications such as emails, web pages, reports, surveys, customer relationship management note fields, contracts, blogs, wikis, and reports. Whether it is customer complaints, employee feedback, analyst opinions, or competitors' intentions, this potentially valuable information lies hidden in unstructured text sources.
This presentation proposes that the artifacts of text analytics, when used in the aggregate, can drive business intelligence dashboards and measure “sentiment” as it relates to products, companies or marketing initiatives.
|
|
Combining Semantics and Keyword Approaches to Enable Flexible Enterprise Search
|
3:15 pm
–
3:45 pm
Sam Chapman, University of Sheffield
Keyword search has issues, in that returns are not suitable for many business uses, reliable quantitative returns are impossible to obtain due to the uncertain relevance of any query return. More of an issue is that textual information in specialised domains is often repetitive, and the context of information is paramount to its meaning. In such circumstances standard keyword approaches are not the best method to use relevant information. Semantic approaches offer a method to alleviate this issue by capturing "knowledge" according to a pre-assigned structure (ontology classes and relations). Although these techniques are proven to be helpful in answering precise queries, the complexity of how knowledge is searched and its rigid organisation can sometimes constrain a user, especially considering that not all possible "knowledge" is encoded into a re-usable structured form. This presentation outlines a flexible approach combining both Keyword and Semantic approaches for specialised domains where the user can easily switch between, or use, both approaches together within a degree of variably structured and unstructured query to locate the information needed for quantitative analysis. The presentation focuses on a number of specific examples where this simple patented approach is used in large scale enterprises.
|
|
Conference Break
|
3:45 pm
–
4:15 pm
|
|
The Next Step in the Confluence of Search and Business Intelligence
|
4:15 pm
–
4:45 pm
Jeff Fried, BA Insight
Enterprise search (with a heritage in serving ad hoc queries on unstructured data) and BI (traditionally focused on structured inquiry into structured data) have been coming together. A range of capabilities combining search and BI are available and in use. Text mining, search-based “everyday analytics”, and search integrated in BI new technology that merges traditional database and traditional search cores is coming out of the lab, providing a next step in the search/BI space. This presentation outlines the internal architecture and data management approach of this next-generation core search technology.
|
|
Better Annotations for Text Mining: Using a Knowledge Server
|
4:45 pm
–
5:15 pm
Pascal Coupet, TEMIS
Simple entity recognition is becoming more and more popular to improve user search experiences. We are now used to seeing personal names, places and others automatically recognized in texts. These new dimensions can be used for facet navigation, hyper linking and several statistic analysis types.
However, quality becomes quickly an issue in production systems because of ambiguities and naming variation. Normalization and disambiguation are a necessity for high quality systems. The presentation discusses the next generation of entity recognition system which addresses these issues in a customizable way from one customer to another, based on a dedicated knowledge server which stores known entities for a specific project, associated with disambiguation methods and normalization methods. Its contents evolve according to historical annotations and allow customers to correct mistakes that will not be made again by the system. This is a key element in providing strong customization capability for each customer without modification to core annotator products.
|
|
Conference Mixer Cocktail
|
5:15 pm
–
7:00 pm
|
|
|
|
|