New search interface for Proquest Historical/New York Times

The ProQuest Historical Newspapers database which provides us with access to the full-text archives of the New York Times (1851-2007) has a new search interface. Let us know what you think of the changes.

Take a look and find articles about LaGuardia Community College from 1981, where you will learn that the school's guiding educational philsophy was "teach, apply and reinforce" and that we were part of the renewed interest in philosophy courses way back then.

Getting arrested for using JSTOR

Aaron Swartz, Reddit co-founder and open access activist, got arrested for downloading too many academic journal articles (over 4 million) from JSTOR while on-campus at MIT. He faces up to 35 years in prison and $1 million in fines for what the US attorney calls stealing or data theft. David Segal, executive director of Demand Progress says, "it's like trying to put someone in jail for checking too many books out of the library."

Checking out e-books from the Internet Archive

The Internet Archive has made some progress in its e-book lending program by partnering with over 1,000 libraries worldwide. Their offerings can be browsed on the OpenLibrary.org site.

Books without “authors”

So, who really wrote that book? Google's attempts to promote the concept of "authorship" in their search results reveal how misleading and illusory attribution can be online. Consider the case of the physical book publisher Alphascript Publishing. If you look up what appears to be their most prolific set of "authors," Frederic P. Miller, Agnes F. Vandome and John McBrewster, on Amazon, you'll see that they have "written" around 20,587 book titles. However, the Amazon book description does clearly state: "Please note that the content of this book primarily consists of articles available from Wikipedia or other free sources online." These print-on-demand books are "scraped" from free Wikipedia content found online.

The same scam is happening with Kindle e-books. In an article from The Guardian by John Naughton ("Now anyone can 'write' a book. First, find some words...") he points out that while liberation from traditional publishing gatekeepers allows for innovative and necessary self-publishing, it also allows for new kinds of spam content multiplying at an unmanageable rate. For example, one e-book "author" (who may or may not be a real person) has created over 2,800 Kindle e-books and put them up for sale. What we have here is a variant of the content farms that Google has been trying to downgrade in their search results. The distinctions between the "book" and web content are becoming increasingly blurry.

Finding out more about authorship in search results

Google is starting to display more about the authors of content that comes up in web searches by displaying images and links to created author profile pages next to the results.

This looks like an attempt to both make content more discoverable and reduce the amount of "scraped" or copied webpages that continue to plague Google search results. This author attribution relies on creators or publishers to opt-in via authorship markup - The New York Times, CNET, Washington Post, and other Google sites like YouTube and Blogger will reflect these changes.