The Single Box Humanities Search
I recently polled my graduate students to see where they turn to begin research for a paper. I suppose this shouldn’t come as a surprise: the number one answer—by far—was Google. Some might say they’re lazy or misdirected, but the allure of that single box—and how well it works for most tasks—is incredibly strong. Try getting students to go to five or six different search engines for gated online databases such as ProQuest Academic and JSTOR—all of which have different search options and produce a complex array of results compared to Google. I was thinking about this recently as I tested the brand new scholarly search engine from Microsoft, Windows Live Academic. Windows Live Academic is a direct competitor to Google Scholar, which has been in business now for over a year but is still in “beta” (like most Google products). Both are trying to provide that much-desired single box for academic researchers. And while those in the sciences may eventually be happy with this new option from Microsoft (though it’s currently much rougher than Google’s beta, as you’ll see), like Google Scholar, Windows Live Academic is a big disappointment for students, teachers, and professors in the humanities. I suspect there are three main reasons for this lack of a high-quality single box humanities search.
First, a quick test of Google Scholar and Windows Live Academic. Can either one produce the source of the famous “frontier thesis,” probably the best-known thesis in American historiography?
Clearly, the usefulness of these search results are dubious, especially Windows Live Academic (The Political Economy of Land Conflict in the Eastern Brazilian Amazon as the top result?). Why can’t these giant companies do better than this for humanities searches?
Obviously, the people designing and building these “academic” search engines are from a distinct subset of academia: computer science and mathematical fields such as physics. So naturally they focus on their own fields first. Both Google Scholar and Windows Live Academic work fairly well if you would like to know about black holes or encryption. Moreover, “scholarship” in these fields generally means articles, not books. Google Scholar and Windows Live Academic are dominated by journal-based publications, though both sometimes show books in their search results. But when Google Scholar does so, these books seem to appear because articles that match the search terms cite these works, not because of the relevance of the text of the books themselves.
In addition, humanities articles aren’t as easy as scientific papers to subject to bibliometrics—methods such as citation analysis that reveal the most important or influential articles in a field. Science papers tend to cite many more articles (and fewer books) in a way that makes them subject to extensive recursive analysis. Thus a search on “search” on Google Scholar aptly points a researcher to Sergey Brin’s and Larry Page’s seminal paper outlining how Google would work, because hundreds of other articles on search technology dutifully refer to that paper in their opening paragraph or footnote.
Most important, however, is the question of open access. Outlets for scientific articles are more open and indexable by search engines than humanities journals. In addition to many major natural and social science journals, CiteSeer (sponsored by Microsoft) and ArXiv.org make hundreds of thousands of articles on computer science, physics, and mathematics freely available. This disparity in openness compared to humanities scholarship is slowly starting to change—the American Historical Review, for instance, recently made all new articles freely available online—but without a concerted effort to open more gates, finding humanities papers through a single search box will remain difficult to achieve. Microsoft claims in its FAQ for Windows Live Academic that it will get around to including better results for subjects like history, but like Google they are going to have a hard time doing that well without open historical resources.
UPDATE [18 April 2006]: Microsoft has contacted me about this post; they are interested in learning more about what humanities scholars expect from a specialized academic search engine.
UPDATE [21 April 2006]: Bill Turkel makes the great point that Google’s main search does a much better job than Google Scholar at finding the original article and author of the frontier thesis:
[...] a system that can learn from relevance feedback. Digital humanists need tools that go beyond the single box search. And we’re probably going to have to write them [...]
[...] that they can be very limiting – without the user realizing that they are limiting. Since, as Dan Cohen has said, Google is the first resource that most students go to for research (at least, the graduate [...]