Category Archives: tech

General technology issues

An Eye Tracking Study on camelCase and under_score Identifier Styles

Programmers sit around and discuss many things about the methods and practices of writing code. One ageless discussion is how to name things. There are a few studies that review naming conventions and most of them just focus on doing it consistently across a language, group or organization.

This study, now a few years old doesn’t really come to any overwhelming conclusion that would persuade me to abandon underscores (or dashes) for CamelCase but is worth noting.

From the Abstract:

An empirical study to determine if identifier-naming conventions (i.e., camelCase and under_score) affect code comprehension is presented. An eye tracker is used to capture quantitative data from human subjects during an experiment. The intent of this study is to replicate a previous study published at ICPC 2009 (Binkley et al.) that used a timed response test method to acquire data. The use of eye-tracking equipment gives additional insight and overcomes some limitations of traditional data gathering techniques. Similarities and differences between the two studies are discussed. One main difference is that subjects were trained mainly in the underscore style and were all programmers. While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly.

via IEEE Xplore Abstract – An Eye Tracking Study on camelCase and under_score Identifier Styles.

Also available via the author’s web site.

Interesting statistics about the iTunes Store Terms and Conditions

The iTunes Store terms and conditions has about 17,637 words or about 26 generously large Web browser screen fulls. It has about 1744 unique words, 779 sentences, a lexical density of 16 percent and a readability score of 12.7 (which means it requires a greater than high school reading level, but not a law school graduate reading level).

Here’s a word cloud of the text from the fine people at Wordle:

GoogleBot vs iTunes Preview

Maybe Michael Bay should direct this next battle of the robot titans: Googlebot vs the Apple iTunes Web Servers – Dark of the Web?

It seems that as of today, the Apple iTunes Preview Web servers are not playing well with the googlebot. Take a look at the top three SERP descriptions for Angry Birds, one of the most popular iOS apps.

Google Search Engine Results Page for Angry Birds with useless description metadata

Maybe the Apple Webmasters need to get a plucky action hero to improve snippets with a meta description makeover.

"What is" Instant Search with Google and Bing

It seems there are some rather large differences in what I get when Bing and Google do their instant search term suggestions for something as vague as “what is”:

what is bing instant search suggestions what is google instant search suggestions

These results may also imply something about how much data has been gathered and is used for my personalized versions of both searches too. However, I don’t remember searching for any of these as either a "what is" search or for any of the other search terms.

Parliament? Gout? Gluten? The Illuminati? Strange indeed, but perhaps the makings of a great mystery-thriller novel!

Methodologies for Understanding Web Use with Logging in Context

Methodologies for Understanding Web Use with Logging in Context

[PDF]

Don Turnbull

Abstract

This paper describes possible approaches of data collection and analysis methods that can be used to understand Web use via logging. First, a method devised by Choo, Detlor, & Turnbull (1998, 1999 & 2000) that can be used to offer a comprehensive, empirical foundation for understanding Web logs in context by gaining insight into Web use from three diverse sources: an initial survey questionnaire, usage logs gathered with a custom-developed Web tracking application and follow-up interviews with study participants. Second, a method of validating different types of Web use logs is proposed that involves client browser trace logs, intranet server and firewall or proxy logs. Third and finally, a system is proposed to collected and analyze Web use via proxy logs that classify Web pages by content.

Excerpt

It is often thought that in some configurations, client browsing application local caching settings may influence server-based logging accuracy. If it is not efficient to modify each study participant’s browser settings (or that temporarily modifying participants browser settings for the study period affects true Web use) a method of factoring in what may be lost due to local cache may be applied. … By tuning intranet server logging settings and collecting and analyzing these logs, some initial measurement of the differences that client browser caching makes in accurate firewall logs can be made. Comparisons to access on the organizations intranet Web server logs such as total page requests per page, time to load, use of REST or AJAX interaction and consistent user identification can be made to the more raw logging from the firewall logs collected

Update

What’s novel about this paper is the introduction of using different datasets to validate or triangulate the veracity and accuracy of log data. Often, logs are collected and processed without context to explain subtle interaction patterns, especially in relation to user behavior. By coordinating a set of quantitative resources, often with accompanying qualitative data, a much richer view of Web use is achieved. This is worth remembering when relying on Web Analytics tools to form a picture of a Web site’s use or set of Web user interactions: you need to go beyond the basic statistical measures (often far beyond what typical log analysis software provides, certainly by their default reports) and design new analysis techniques to gain understanding.

Keywords

browser history, firewall logs, intranet server logs, web use, survey, questionnaire, client application, webtracker, interview, methodology, logs, server logs, proxy, firewall, analytics, content classification, client trace, transaction log analysis, www

Cite As

Turnbull, D. (2006). Methodologies for Understanding Web Use with Logging in Context. Paper presented at the The 15th International World Wide Web Conference, Edinburgh, Scotland.

References in this publication

  • Auster, E., & Choo, C. W. (1993). Environmental scanning by CEOs in two Canadian industries. Journal of the American Society for Information Science, 44(4), 194-203.
  • Catledge, L. D., & Pitkow, J. E. (1995). Characterizing Browsing Strategies in the World-Wide Web. Computer Networks and ISDN Systems, 27, 1065-1073.
  • Choo, C.W., Detlor, B. & Turnbull, D. (1998). A Behavioral Model of Information Seeking on the Web — Preliminary Results of a Study of How Managers and IT Specialists Use the Web. Proceedings of the 61st Annual Meeting of the American Society of Information Science, 290-302.
  • Choo, C.W., Detlor, B. & Turnbull, D. (1999). Information Seeking on the Web – An Integrated Model of Browsing and Searching. Proceedings of the 62nd Annual Meeting of the American Society of Information Science, Washington, D.C.
  • Choo, C.W., Detlor, B. & Turnbull, D. (2000). Web Work: Information Seeking and Knowledge Work on the World Wide Web. Dordrecht, The Netherlands, Kluwer Academic Publishers.
  • Cuhna, C.R., Bestavros, A. & Crovella, M.E. (1995). Characteristics of WWW Client-Based Traces. Technical Report #1995-010. Boston University, Boston MA.
  • Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin 51(4), 327-358.
  • Jansen, B. J., Spink, A. & Saracevic, T. (2000) Real life, real users, and real needs: a study and analysis of user queries on the Web. Information Processing & Management, Volume 36, Issue 2, pp 207-227.
  • Jansen, B. J. (2005) Evaluating Success in Search Systems. Proceedings of the 66th Annual Meeting of the American Society for Information Science & Technology. Charlotte, North Carolina. 28 October – 2 November.
  • Kehoe, C., Pitkow, J. & Rogers, J. (1998). GVU’s Ninth WWW User Survey Report. http://www.gvu.gatech.edu/user_surveys/survey-1998-04.
  • Pitkow, J. and Recker, M. (1994). Results from the first World-Wide Web survey. Special issue of Journal of Computer Networks and ISDN systems, 27, 2.
  • Pitkow, J. (1997, April 7-11). In Search of Reliable Usage Data on the WWW. Sixth International World Wide Web Conference Proceedings, Santa Clara, CA.
  • Rousskov, A. & Soloviev, V. (1999) A performance study of the Squid proxy on HTTP/1.0. World Wide Web., 2, 1-2, pp 47 – 67.

Publications that cite this publication

Related Articles

Recommended Reading

Jansen, B.J. and Ramadoss, R. and Zhang, M. and Zang, N. (2006). Wrapper: An application for evaluating exploratory searching outside of the lab. EESS, p 14.

Rating, Voting & Ranking: Designing for Collaboration & Consensus

Rating, Voting & Ranking: Designing for Collaboration & Consensus

[PDF]

Don Turnbull

Abstract

The OpenChoice system, currently in development, is an open source, open access community rating and filtering service that would improve upon the utility of currently available Web content filters. The goal of OpenChoice is to encourage community involvement in making filtering classification more accurate and to increase awareness in the current approaches to content filtering. The design challenge for OpenChoice is to find the best interfaces for encouraging easy participation amongst a community of users, be it for voting, rating or discussing Web page content. This work in progress reviews some initial designs while reviewing best practices and designs from popular Web portals and community sites.

Excerpt

…Tim O’Reilly proposed the phrase “architecture of participation” to describe participatory Web sites and applications that encourage user-driven content, open source contribution models and simple access via APIs. So why are so many of these sites and applications under-designed at the interface and interaction level, not to mention having vaguely architected overall structure? Many of these sites are relying on the (initial) enthusiasm of users or their compelling features to keep and encourage participation. However more attractive and functional interfaces with clear labels, (usability) tested interfaces, finely crafted workflows and consistent interaction models would both keep early adopters involved and allow for easy bootstrapping for late-comers. When designing participatory, community-oriented sites, designers shouldn’t have to re-invent everything from scratch.

…popular community sites feature common interface elements and functionality:

  • Overall voting and rank status easy to read
  • Dynamically updated interaction
  • Thumbnail, abstract or actual content of item on same page as voting interface
  • Rating information for community at large for the item
  • Suggestions or lists for additional items to rate
  • Textual description of (proposed) item category with link to category
  • Links to related and relevant discussions about item (or item category)
  • Standard interface objects (where appropriate) to leverage existing Web interaction (e.g. purple & blue links colors, tabbed navigation metaphor, drop-down lists)
  • Show history of ratings or queue of items to vote on
  • Aggregate main page or display element that shows overall community ratings (to encourage virtuous competition for most ratings)
  • Task flow for voting or rating clear with additional interactions not required (e.g. following links)

…In addition to dynamic voting status, there is some consideration of simplifying the voting to include “allow” vs. “block” ratings only. Design issues such as the colors of the buttons may also overly influence certain votes.

Basic Voting Interface and Voting History
As part of each user’s own customized portal page, a history of recent votes is prototyped to give users the ability to remember their past votes and see the status of pending items in consideration.

Keywords

information interfaces: Graphical User Interfaces, user interfaces, reputation systems, social computing

Cite As

Turnbull, D. (2007). Rating, Voting & Ranking: Designing for Collaboration & Consensus. Paper presented at the Association of Computing Machinery Computer Human Interface Conference (SIGCHI), San Jose, CA.

References in this publication

Publications that cite this publication

  • Galway, D. (2008) Real-life Rating Algorithm [PDF].

Related Articles

Recommended Reading

Building Web Reputation Systems by Randy Farmer and Bryce Glass at Building Web Reputation Systems: The Blog.

Quantitative Information Architecture recommended reading

Here is a brief list of recommended books from my Quantitative Information Architecturetalk at the 2010 Information Architecture Summit that review many aspects of quantitative thinking (both good and bad) that relate to using mathematical methods to as a toolkit for information architecture issues.


Quantitative Information Architecture Books

Many of these books are non-fiction favorites. I’ve used them in courses I’ve taught, relied on them for research ideas and used them to convey how quantitative innovation is pursued.

  1. The Control Revolution: Technological and Economic Origins of the Information Society by James Beniger. Nearly encyclopedic in its coverage of the Industrial Revolution’s impact on creating the Information Age, where economic forces accelerated collecting, storing and capitalizing on data. Particularly interesting (truly!) are insights about the railroad industry and information technology (e.g the telegraph).
  2. Against the Gods: The Remarkable Story of Risk by Peter L. Bernstein. Just thinking about this book makes me want to read it again. It’s a swashbuckler of a story of the history of people using mathematics to tame the world. (Well, at least to me.) Bernstein’s style is surprisingly readable with narratives that keep you engaged.
  3. Excel Scientific and Engineering Cookbook by David M Bourg. A great (but aging) overview of doing statistics in spreadsheets, including regression and time series analysis. Not for beginners, but a good reference and reminder of the power of Excel for almost all manner of analysis. (The only downside to Excel is its limit for working with very large datasets.)
  4. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century by David Salsburg. Another fun read, a glance through the history of some of the more famous statisticians (my favorite being Andrey Nikolaevich Kolmogorov and a partial history of Soviet science).
  5. Super Crunchers: Why Thinking-By-Numbers is the New Way To Be Smart by Ian Ayres. The most readable (and current), with some basic introductory ideas presented in the context of how organizations such as Netflix, Southwest Airlines – and of course Google use numbers and industries including baseball and wine-making are impacted by quantitative work.
  6. The Rise of Statistical Thinking, 1820-1900 by Theodore M. Porter. This book is mostly thematic, covering the rise of statistics and their influence in the social sciences. A bit dry (and poorly typeset) but a foundational study. (Feel free to rely on the Index to jump around to people or topics you might be more interested in.)
  7. When Information Came of Age: Technologies of Knowledge in the Age of Reason and Revolution, 1700-1850 by Daniel R. Headrick. This book was a quick read, suggesting a number of common themes such as the rise of the Age of Reason and the parallel development of scientific instrumentation. As empirical sciences progressed, a resulting increase in collected data brought forth the origins, expansion and professionalization of many kinds of information systems including graphs, maps, encyclopedias, the post office and insights of key scientists of the age (e.g. Carl Linnaeus). Not as grand in scope as other recommended books, but focuses more clearly on types of information that are often the focus of IA efforts.
  8. Men of Mathematics by E.T. Bell. A somewhat stilted (written in the 1930’s) biographical walk-through of many storied mathematicians (i.e the people’s names you hated to hear in 10th grade Geometry), that reveals the history of quantitative analysis and the intellectual vigor (did I just say that?) of those like Gauss or Lagrange. Even if the math itself is not your normal interest, this book is an index of obsession, diligence and ingenuity.
  9. The History of Statistics: The Measurement of Uncertainty before 1900 by Stephen M. Stigler (not shown). I have not finished this book, and there is a lot in it that I do not have much interest in, and have put it down several times (it is a bit dry). However, the integration of how different statistical measures were built progressively is interesting. Also, one of the better sets of discussion about Karl Pearson.


Quantitative Information Architecture Books

    Two books illustrate the downfall of quantitative hubris (among other things) and both are fun to read.

  1. When Genius Failed: The Rise and Fall of Long-Term Capital Management by Roger Lowenstein. This book narrates the catastrophic failure of Long-Term Capital Management, the fabled sure-bet genius-powered hedge fund that boasted two Nobel laureates among its partners and how they nearly crashed the entire world financial system with this overconfidence in 1998.
  2. Too Big to Fail: The Inside Story of How Wall Street and Washington Fought to Save the Financial System—and Themselves by Andrew Ross Sorkin. A detailed (600 page plus) report of the nearly minute to minute recent financial crisis and an indictment of over-reliance on trusting abstract mathematics without (any?) explanation or validation. Worth remembering when confronted with abundant or seemingly infallible data-driven results that we should not be intimidated and remember to ask Why? and How?.

Personalized Search

Personalized Search: A Contextual Computing Approach May Prove a Breakthrough in Personalized Search Efficiency

[PDF]

James Pitkow, Hinrich Schuetze, Todd A. Cass, Rob Cooley, Don Turnbull, Andy Edmonds, Eytan Adar, et al.

Abstract

A contextual computing approach may prove a breakthrough in personalized search efficiency.

Excerpt

Contextual computing refers to the enhancement of a user’s interactions by understanding the user, the context, and the applications and information being used, typically across a wide set of user goals. Contextual computing is not just about modeling user preferences and behavior or embedding computation everywhere, it’s about actively adapting the computational environment – for each and every user – at each point of computation. (p 50)

The Outride system was designed to be a generalized architecture for the personalization of search across a variety of information ecologies.(p 52)

Search Engine - Average Task Completion Time in Seconds

While the results may seem overwhelmingly in favor of Outride, there are some issues to interpret. First, some of the scenarios contained tasks directly supported by the functionality provided by the Outride system, creating an advantage against the other search engines. Indeed, Outride features are specifically designed to understand users, provide support by the conceptual model and tasks users employ to search the Web, and to contextualize the application of search. This is the goal of contextual computing and why personalizing search makes sense.

Second, while the use of default profiles could have provided an advantage for Outride, it also could have negatively influenced the outcome, as the profile did not represent the test participants’ actual surfing pat- terns, nor were the participants intimately familiar with the content of the profiles. Third, some of the gains are likely due to the user interface since the Outride sidebar remains visible to users across all interac- tions, helping to preserve context and provide quick access to core search features. For example, while search engines require users to navigate back and forth between the list of search results and specific Web pages, Outride preserves context by keeping the search results open in the sidebar of the Web browser, making the contents of each search result accessible to the user with a single click. Still, the magnitude of the difference between the Outride system and the other engines is compelling, especially given that most search engines are less than 10% better than one another. (p 54)

Keywords

information retrieval, search, information seeking, relevance feedback, personalization, contextual computing, user interfaces, search process

Cite As

Pitkow, J., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., et al. (2002). Personalized Search: A Contextual Computing Approach May Prove a Breakthrough in Personalized Search Efficiency. Communications of the ACM, 45(9), 50-55.

References in this publication

  • Anderson, J.R. Cognitive Psychology and Its Implications. Freeman, San Francisco, CA, 1980.
  • eTesting Labs. Google Web Search Engine Evaluation; www.etestinglabs.com/main/reports/google.asp
  • Pirolli, P. and Card, S.K. Psychological Review 106, 4 (1999), 643–675.
  • Gerard Salton , Michael J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, Inc., New York, NY, 1986

Publications that cite this publication

Quantitative Information Architecture at the 2010 Information Architecture Summit

I am presenting on two different topics at the 2010 Information Architecture Summit in Phoenix this week.

The first talk is a set of ideas related to the work I’ve been doing recently, building data structures, crafting algorithms and designing user experiences that are powered by quantitative data.

Quantitative Information Architecture – Don Turnbull, Ph.D.

10:30 – 11:15AM on Saturday, April 10 in Ellis

Why quantitative information architecture? Why now?

You don’t have to be RainMan or Stephen Hawking to use numbers to get things
done. Quantitative methods are applicable for IA thinking be it for hypothesis
generation, instrumentation, data collection and analysis of information at
scales never before possible with insights that are comparable over time,
generalizable and extensible.

Quantitative skills can allow IAs to interpret and analyze others’ designs and
research more readily, as well as combine methods and models for meta-analysis
to help IAs move from description to prediction in designing and developing
future interfaces and architectures.

This presentation will review why you should use quantitative methods and
discuss both foundational and emerging ideas that are applicable for content
analysis, behavioral modeling, social media usage, informetrics and other
IA-related issues.

The twitter hashtag for this talk is #quantia. Feel free to send me questions directly via twitter/donturn too.

Quantiative Information Architecture slide deck from the 2010 IA Summit