The Search Engine Meeting

April 23-24, 2007
(Preconference Workshops: Sunday, April 22)
2007 Links
Daily Schedule
Day One
Day Two
Past Shows




General Conference - Day Two: Tuesday, April 24, 2007
PreConference Day One Day Two
Full text search using open source Lucene/Solr
9:00 am – 9:45 am
Dr. Marc Krellenstein, Lucid Imagination

Open source tools, once used mostly by developers, non-profits and small organizations interested in cutting costs, have crossed over in recent years into mainstream use by large companies, governments and commercial product and service offerings, though their appropriate use still requires attention to the risks and benefits of open source and of the particular tool for a given application. Lucene, a high quality open source search engine freely available for download under the Apache license (, is a good example of an open source tool that can deliver significant benefits and for which many of the risks of open source are minimized. Originally developed in the late 90s by Doug Cutting, it is today a full-featured and scalable product with a large and active group of developers. It provides excellent search precision, low index overhead and cross-platform index portability, and is offered as a 100% Java library with additional ports to C++, .NET, Python, Perl and others. Solr, developed and contributed to open source by CNET, is an enterprise search server based on Lucene that includes caching, replication, faceted browse and a web administration interface. Recent years have seen significantly increased adoption of Lucene in enterprise search and for embedded use in products and services, including use by HP, FedEx, Akamai, DSpace, IBM/Yahoo, Wikipedia, Healthline, Webmail, CNET, Lookout (acquired by Microsoft) and Monster.

Secrets of the High-Powered Business: Staying Ahead with Enterprise Search
9:45 am – 10:30 am
Cyrus Mistry, Google LLC

Enterprise Search is a massive domain involving complex algorithms, usability considerations, speed and scalability requirements, security considerations, and many others. Fortunately, industry research exists which can help simplify the space, prioritize the issues, and solve the problems in the most efficient way. This presentation will shed light on recent developments in enterprise search and reveal how smart companies will take advantage of these advances to propel their enterprise forward.

Break & Exhibition
10:30 am – 11:00 am
Making the Most of Business Intelligence Tools with Enterprise Search
11:00 am – 11:30 am
Terry Clift, ISYS Search Software

A recent IBM study showed that nearly 80 percent of an organization's information is unstructured, meaning this content is not tied to Business Intelligence (BI) reporting tools and is therefore left unanalyzed and unknown. Enterprise search is quickly becoming the answer, doing for unstructured content what BI has done for structured data. Given the potential benefits users can derive from these two technologies, both search and BI vendors are working to develop tighter integration, yet a comprehensive solution that addresses a range of users’ needs is still far off. So what do organizations do in the meantime to locate, understand and leverage their valuable corporate knowledge? Put simply: they make enterprise search do more than merely locate a vacation policy on the corporate intranet. A common enterprise search misconception is that it only provides access to the information you know is there. And while it is true a standard full-text query will connect you quickly to known content, enterprise search is now going beyond these basics to enable text mining and discovery via the advent of entity extraction, automatic classification, search analytics and contextual searching.

This presentation illustrates how enterprise search software can supplement BI and deliver on the promises of text mining and analysis, expertise location and e-discovery, all of which become vital to a company's ability to highlight and leverage existing corporate knowledge.

Enterprise Search: Bringing Order from Chaos
11:30 am – 12:00 pm
Jeff Fried, Intersystems

The classification and management of data – both structured and unstructured – is a critical bridge that can carry storage users to document or file-level information lifecycle management (ILM). By and large, companies practicing ILM today tend to focus more on tiered storage, using broad classification, such as by application, to move data to the type of storage most appropriate for its value. ILM based on file-level or event context-based classification remains something of a dream for storage users. Enterprise search and data classification is helping to make that dream a reality.

In addition to enabling the contextual classification of data, enterprise search can provide users with myriad benefits, including more efficient utilization of storage resources, management of archived data for compliance needs, as well as consolidated operational and disaster recovery practices.

This presentation provides attendees with an overview of the role and importance of data classification and how the broader category of enterprise search can help businesses deliver on the so far unfulfilled promises of ILM.

Beyond Search: Visualizing Emerging Intelligence
12:00 pm – 12:45 pm
Lou Paglia, Dow Jones & Company

This presentation discusses the current state of search, the advantages to text mining in extracting meaning from unstructured data as well as the future of search such as a move towards a role-based search environment, which will likely be one of the biggest technology trends to affect the enterprise. The concept of “role-based” search is about systems intelligent enough to understand the totality of what you do: your industry, your job and the daily tasks you undertake, and then help you accomplish those specific things more effectively. Effective role-based search applications will use technologies that uncover trending, comparison, discovery and determination of sentiment, which will then feed into applications that present the information using visualization and analytics. The session will also address business searching and how search networks will realign themselves to help all types of professionals find better information, faster.

Lunch & Exhibition
12:45 pm – 2:00 pm
Where Do Search and Business Intelligence Meet Today? Where Tomorrow?
2:00 pm – 2:30 pm
Steve Papa, Endeca Technologies

The Search and Business Intelligence markets made some surprisingly swift, significant movements towards each other in the past year. At industry analyst IDC, the Content team led by Sue Feldman and the BI team led by Henry Morris released a series of research on the fast emergence of "unified access" applications that are blurring the boundaries between these two fields of information access. Within months, BI giants Business Objects, Cognos, and Oracle all announced major new search features and products.

This convergence opens a rich area of study. At the outset, it touches the antipodes of unstructured versus structured content, atomic views versus aggregate views, and casual users versis trained users. For early adopters of unified access applications, it is already introducing functionality previously unavailable in either market. This presentation shows examples from high seat-count deployments where enterprise users are finding the information they need in unprecedented ways when Search and Business Intelligence converge.

The Next Generation of Vertical Search: The Web You Trust
2:30 pm – 3:00 pm
Claude Vogel, Convera

Professional search, whether it is research, analysis or open discovery, is an important trend of web search. Current search engines provide horizontal access to multiple sources, but the burden of sorting out these sources remains a concern. Transforming the web into an "authoritative web" would be an interesting way to solve these issues: Semantic indexing would provide a semantic signature for web documents which could be verticalized to fit with a specific professional perspective. Additional mining tools could leverage this semantic added-value and provide an even more rewarding search experience. This presentation explains how this new vertical search technology works, and how it can open up new possibilities for research and education networks -- from increased learning loyalty to potential revenue sources.

Federated Search of Geospatial Data -- A Case Study: Google Earth, Map Servers and CAP Alerts
3:00 pm – 3:30 pm
Mr. Sol Lederman, Federated Search Blog

The proliferation of geospatial data and ways to visualize it (eg, Google Earth, map servers) creates new opportunities in the federated search industry. While geospatial information sources abound, the greatest utility often arises not from the sheer volume of available data, but from the intelligent search, filtering, aggregation and simultaneous display of the most relevant data derived from different sources.
This presentation addresses how lessons learned from federated search can be applied to geospatial applications. We provide a case study of Deep Web Technologies' experience federating standing queries using the OASIS standard Common Alerting Protocol (CAP). CAP alerts enable the exchange of emergency alert and public warning information. The geospatial application considered harvests federated data and serves the data in real time, projecting the data on to several mapping systems (Google Earth, and Minnesota MapServer). Our experience should prove instructive for those looking at aggregating and visualizing any type of alert data.

Break & Final Exhibition
3:30 pm – 3:45 pm
Search with attitude: When 10 million Results are Actually Useful
3:45 pm – 4:15 pm
Stavros Macrakis, Gerson Lehrman Group

General Web search engines index billions of pages so searchers can scan a handful of general-purpose results for any given query. Ranking and relevance are based on the “neutral” judgment of the Web at large. But content publishers have an audience and an attitude. Publishers project that unique attitude – their editorial voice – through the selection and presentation of their own content and third-party content. To survive against general-purpose search, they must also project this editorial voice through a distinctive search experience. Publishers can do this creating Web collections defined by a point of view, not simply by reductive taxonomic categories. We will discuss how multiple criteria can be combined to extend this judgment to millions of pages and show how publishers are taking advantage of this.

From Keywords to Concepts: Making the User’s Life Easier
4:15 pm – 4:45 pm
Roger Bradford, Agilex Technologies

This presentation describes how use of conceptual representations of text (as opposed to keywords) can aid users in finding and using information. The structure of the presentation is as follows:

  • An introductory slide providing a brief overview of techniques for conceptual representation of text.
  • Multiple slides, each showing an example of how conceptual representation saves time and effort on the part of users and improves search results.
  • A concluding slide indicating future capabilities which can be anticipated based on current research.

The example slides are dawn from existing applications, including an enterprise-level HR application in a Fortune 500 company, an intelligent push application for one of the largest commercial content aggregators, and a high-volume government data mining application. Topics covered in the example slides include:

  • Dealing with variants of entities, such as people’s names.
  • Finding information of interest in foreign-language documents, without having to translate those documents.
  • Topic categorization with measured accuracy rivaling human performance.
  • On-the-fly generation of taxonomies of search results.
  • Generation of alerts based on non-obvious relationships.
  • Automated identification of possible use of aliases.
  • Federation of queries across disparate databases.
  • Simultaneous analysis of structured and non-structured information.
Meeting Wrap-up Panel: What we Liked. What we Learned
4:45 pm – 5:00 pm
Susan E. Feldman, Synthexis
Stephen E. Arnold,

Susan Feldman and Stephen Arnold, two expert industry commentators, reflect on what was said during the two days of the 2007 Search Engine Meeting and, with the help of the audience, draw some lessons and conclusions.



© 2009 - 2018, Information Today, Inc. Privacy/Cookies Policy