Sean Nolan, chief architect of Microsoft HealthVault, commented on my post of a few days ago regarding Microsoft's HealthVault strategy. He gave me a pointer to an entry on his own blog to clarify how they are tracking the the provenance of data, or its "pedigree" as he refers to it (And Now for a Little Usability, April 17, 2008). One of the comments on the post got me looking more closely at HealthVault, and thinking about how it can help with the current dismal state of the art regarding the search for consumer health information on the Web.
My thesis work tapped into a huge body of server log data that accompanied a Microsoft Research grant to my thesis advisors, Lada Adamic and Suresh Bhavnani, as part of the research program leading up to last summer's Microsoft Live Search Summit. The data set provided a remarkably clear picture of search behavior "in the wild". The bottom line: the quality of the result sets returned by major search engines from the queries of consumer health information seekers was questionable at best.
This came as no surprise, for two reasons. First, consumers seeking any kind of information, even about a subject as critical as a mortal illness, tend to submit terse, vague queries. Search engines are hard put to discern the user's intent. Their general-purpose algorithms do a pretty good job of bringing relevant information into view, but it is not uncommon for outdated or unscientific information to appear in the midst of (or even ahead of) authoritative results.