Category Archives: search

information retrieval, desktop search, personalized search, text analysis, search engines

String Processing and Information Retrieval Conference

The SPIRE 2006 (String Processing and Information Retrieval) confernce looks great, it’s like a giant grep-fest.

Since my all-time favorite O’Reilly book is Mastering Regular Expressions, this has got to be my kind of conference.

Alexa's Public Crawler Database

What a great idea Alexa (Amazon.com): the Alexa Web Search Platform, computing and storage resources for rent to analyze large percentages of the entire Web. The opening of this to anyone with an analytics or business idea is certainly a Web 2.0-kind of thing. Outsource your data collection and hardware to analyze it.

Now why not a program for academic research access to the data stores?

WWW2006 Workshop – Logging Traces of Web Activity

I am one of the organizers for the WWW2006 Workshop – Logging Traces of Web Activity: The Mechanics of Data Collection at the WWW2006 Conference in Edinburgh, Scotland in May 2006.

We invite position papers for the WWW 2006 workshop Â‚Ã„ÃºLogging Traces of Web Activity: The Mechanics of Data CollectionÂ‚Ã„Ã¹. Many WWW researchers require logs of user behaviour on the Web. Researchers study the interactions of web users, both with respect to general behaviour and in order to develop and evaluate new tools and techniques.

Traces of web activity are used for a wide variety of research and commercial purposes including user interface usability and evaluations of user behaviour and patterns on the web. Currently, there is a lack of available logging tools to assist researchers with data collection and it can be difficult to choose an appropriate technique. There are several tradeoffs associated with different methods of capturing log-based data. There are also challenges associated with processing, analyzing and utilizing the collected data.

This one day workshop will examine the trade-offs and challenges inherent to the different logging approaches and provide workshop attendees the opportunity to discuss both previous data collection experiences and upcoming challenges. The goal of this workshop is to establish a community of researchers and practitioners to contribute to a shared repository of logging knowledge and tools. The workshop will consist of a panel discussion, participant presentations, demonstrations of logging tools and prototypes, and a discussion of the next steps for the group. Participation is open to researchers, practitioners, and students in the field.

The deadline for workshop proposals is January 10, 2006. I hope to see you there.

New Book: Theories of Information Behavior

I am remiss in mentioning that a new book, Theories of Information Behavior, I have written a chapter for is finally out.

From the blurb:

This unique book presents authoritative overviews of more than 70 conceptual frameworks for understanding how people seek, manage, share, and use information in different contexts. A practical and readable reference to both wellestablished and newly proposed theories of information behavior, the book includes contributions from 85 scholars from 10 countries. Each theory description covers origins, propositions, methodological implications, usage, links to related conceptual frameworks, and listings of authoritative primary and secondary references. The introductory chapters explain key concepts, theory, method connections, and the process of theory development.

Check out the Table of Contents (pdf file). (I’m the last chapter in the book, it’s funny that the chapters are organized alphabetically by the title of each chapter.)

Amazon.com link to Theories of Information Behavior. American Society for Information Science & Technology Member Price is 20% off now.

Yahoo! Publisher's Guide to RSS : Promote Your Feed

Some simple tips from Yahoo about how to Promote Your Feed, there’s even a little bit of Information Architecture advice too.

SIGIR 2006 Call for Papers

The ACM Special Interest Group for Information Retrieval (SIGIR) has thier SIGIR 2006 Draft Call for Papers out already. The conference will be in Seattle next August.

SIGIR is one of the best academic conferences to keep up with what’s new and what’s possible for Web search and increasingly, in Desktop search and mobile device search. For 2006 I expect we will see more about vertical search and even blog search too as well as some new insights into user behavior for IR.

Call for Papers: WWW2006 Conference

New notice for participation at the 15th Annual World Wide Web conference in Edinburgh, Scotland (one of my favorite cities).

I will be a reviewer again this year in the Browsers and User Interface track, where there are usually a number of amazing systems and interfaces. Here’s some text describing the track:

The Browsers and User Interfaces track at WWW’2006 focuses on promoting novel research directions and providing a forum where researchers, theoreticians, and practitioners can introduce new approaches, paradigms, applications, share their knowledge and opinions about problems and solutions related to accessing and interacting with data , services, and other humans over the Web. We invite original papers describing both theoretical and experimental research including (but not limited to) the following topics:

Browsers and user experience on mobile devices

Browser interoperability

Novel client-side applications

Multimodal interfaces, including speech interaction

Information visualization on the Web

Multilingual Web content design

Novel browsing and navigation paradigms

Web interaction with the real world, including robotics and sensor networks

Adaptive Web displays and Web personalization

Ubiquitous web access, shared displays, and wearable computing

Web usability and user experience

Web accessibility

Web-based collaboration and collaborative Web use

Web-logs and online journalism

Hope to see you there.

Study of Yahoo and Google Indices

A fresh approach at some analysis of which search engine has a more comprehensize index: A Comparison of the Size of the Yahoo and Google Indices. It would be interesting to see this study at another order of magnitude, perhaps with MSN included. What I like best is that the study authors released the code for the tests. I seem to be finding that more academics are providing code to let others attempt to verify their study firsthand, build on the study to make relatable comparisons, and most importantly to prodive the opportunity for peer review of the code logic of what the study claims.

Aardvark, my new favorite Firefox Extension

Take a look at the Aardvark Firefox Extension, it’s the live action equivalent of the View-Page Source menu command. What a great way to learn how a page is coded, especially combined with the ever-popular Web Developer Extension by Chris Pederick.

OK, while we’re at it, don’t forget the best extension ever – adblock.

The New New Portal

The ingenuity of various independent developers in conjuction with simple scripting, open source databases and XML data formats such as RSS are making old school (1994-1997) portals nearly obsolete. Take this great idea that annotates a prototypical New York Times front page with links to related blog posts (and other feeds) : The Annotated NY Times – About

Throw in Bloglines with its easy to use, Web-based interface for any number of RSS feeds and very soon, a few personal tweaks with greasemonkey, not to mention integrating your own personal blogosphere view using Technorati tags or even more personally oriented, pluck with its client interface/information dashboard++ and you can kiss your portal application providers goodbye.

ORACLE’s recent buyout of Peoplesoft may not be so smart in the long, long run when every business unit, not to mention employee, can crank out structured data feeds, tweak simple logic to act on other’s sources and keep up to date with everything in the organiztion with just a few clicks on everyone’s favorite orange button: .

Don Turnbull

Consulting Research Computer Scientist