Sean Nolan, chief architect of Microsoft HealthVault, commented on my post of a few days ago regarding Microsoft's HealthVault strategy. He gave me a pointer to an entry on his own blog to clarify how they are tracking the the provenance of data, or its "pedigree" as he refers to it (And Now for a Little Usability, April 17, 2008). One of the comments on the post got me looking more closely at HealthVault, and thinking about how it can help with the current dismal state of the art regarding the search for consumer health information on the Web.
My thesis work tapped into a huge body of server log data that accompanied a Microsoft Research grant to my thesis advisors, Lada Adamic and Suresh Bhavnani, as part of the research program leading up to last summer's Microsoft Live Search Summit. The data set provided a remarkably clear picture of search behavior "in the wild". The bottom line: the quality of the result sets returned by major search engines from the queries of consumer health information seekers was questionable at best.
This came as no surprise, for two reasons. First, consumers seeking any kind of information, even about a subject as critical as a mortal illness, tend to submit terse, vague queries. Search engines are hard put to discern the user's intent. Their general-purpose algorithms do a pretty good job of bringing relevant information into view, but it is not uncommon for outdated or unscientific information to appear in the midst of (or even ahead of) authoritative results.
Second, commercial entities bolster their findability on the Web by paying for information about the keywords people use in their queries, and through the application of well-known search engine optimization techniques, do the best they can to obtain a high position in search results. Non-profit, governmental, and academic websites pay less attention to search engine optimization, and their position in results often reflects this neglect. Sadly, though, the non-profit, governmental, and academic sites have been shown in comparative expert evaluations to be the right place to find objective, up-to-date, empirically sound information.
Don't get me wrong: many commercial sites do a pretty good job of providing high-quality information, but their commercial nature inherently calls into question their objectivity.
Is position important? You have no idea how important (or at least I didn't, prior to my analytical work). The top ten search results for every health topic I looked at received more than 99% of the actual clicks. Moreover, the top five results received more than 80% of clicks for almost every topic, and no less than 75% for any topic I investigated.
This could indicate that the quality of the top five results was remarkably high, and for some topics this was the case. A more likely explanation is that users generally don't scroll: on the most common screen resolutions, five results is all you will see without scrolling. Ten results is all you see without choosing the link to the second page of results, which is apparently a rare occurrence.
The Live Search data set was from May 2006, prior to Microsoft's integration of the MedStory acquisition and early in the development of HealthVault, so things may have changed over time. I'm gathering data on the behavior of six search engines, including the four most popular general-purpose search engines and two that are health-specific, and will be reporting on my results eventually, after I get them published. I don't have access to click data, so my analysis will have a different flavor, but it should be directly relevant to this subject.
Search engines could do a much better job if they knew more about the information seeker, and that's where HealthVault comes in. In natural language processing, context is key, and HealthVault's standards-based representation of the consumer's health record provides a reliable source of objective information about the person's health and demographics. Granting the search engine access to the data should and no doubt will be via the consumer opting in to such access, so privacy should not be too much of an issue.
HealthVault can't address the other side of the equation, namely the identification of high-quality health information, but from what I saw last summer of the work in progress at Microsoft Research, efforts are underway to address that side of the problem as well; unfortunately I don't know enough about the present state of the art to comment on that facet.
Sean and his team, along with their counterparts at Google Health, are holding onto the tail of the tiger here, and I expect they will have a wild and rewarding ride for years to come. The fact that both groups are closely associated with major search engines is reson for considerable hope. If search engines work on using contextual information from sources like HealthVault and Google Health to better interpret the seeker's intent, the quality gap will begin to close. I'm eager to see where this takes us over the next few years.
Comments