More on Google Books

I started to tack on some other thoughts and info on Google and Microsoft’s book search tools on my last post but thought it would be better to break out this part from the news about the new “My Library” tools that Google Books just launched.

If you hadn’t already noticed, Google Books Search has become an incredibly valuable research tool, and one that’s very useful for genealogists. They’re giving away access to the same public domain data other companies are trying to sell us.

They’re constantly adding new books as they continue scanning libraries around the country and many of these are in the public domain and are fully readable and downloadable for free.

Google Books offers a huge library of history and genealogy data that can be keyword searched for free. It also means that books that previously could only be seen by driving to distant libraries or by paying reprint or CD-ROM companies for copies of long out of print books can now be searched from home. Many of these books are not well indexed in printed form but now you can find any name in the book, in theory.
A few weeks ago Google Books added an “Accessibility” feature allowing you to see (and copy) text versions of the book images and this is wonderful.

In practice Google seems a little sloppy in the OCR conversions. How could a title like
PRUNSYLNANIA ARRHINES by MATTHEW S. QUAY - 1876 slip by spell checks? I’ve also found cases where a keyword search found once instance of “Graham” in a book and my reading the book found other instances, or missing pages.

When you click on Google’s Accessibility/text feature you may see that some of the text is garbled but you can easily cut and paste from the text and save a lot of time in transcribing excerpts.

For example, here’s a book of Chester County PA tax records that is just begging to be diced up into USGENWEB/PA Roots/Rootsweb pages for each township. Most of the work already done, it just needs to be copied, reformatted and carefully cross-checked for accuracy.

I had some issues with columns and poor OCR conversions in Google Books last time I checked. The downloadable PDF files didn’t seem to include the OCR text layers and this means they aren’t searchable offline, and aren’t indexed by Google Desktop. But I’m sure we’ll get there eventually.

I’m hoping that as Google adds new books the PDF text layer will be there. I’d also like to see them re-compile the old PDFs, and to use a more consistent file naming system that clearly identifies a book as volume 1, 2 or Series 3, book 9.

While they’re at it they could ad a custom hidden text data table for Google Desktop to compile a “My Desktop Library” index of local PDF Google books sortable by title, author, year, etc. That could easily be exported to be “mashed” into other useful tools.

Which reminds me, how about adding OCR interpretation to Google Desktop’s PDF scanning, Google? You clearly do OCR conversions on the fly for PDF to HTML conversions in Google web searches and it would be nice if my downloaded Google Books library could be indexed locally.

Google Books also offers no way to report problems with specific books. The many books with partial or missing pages and no way to report it other than the feedback comments suggests they just don’t care. I don’t get it. They’ve digitized some of the best libraries in the country but made it look like a rush job. Surely they want to fix the problems. Let us flag them for you.

Google still doesn’t do a great job of clearly identifying editions and books with similar titles and doesn’t include relevant volume info in the search summary snippets or many of the About this Book pages. It really should show volume info on the header with the truncated titles by the author info.

For example, see the many volumes of Pennsylvania Archives that all try to save as the same book title and that don’t clearly state volume and book number in the About this Book pages. There are at least 10 series of these books and over 100 volumes. Try finding the Sixth Series, Volume XII on the search results above. This volume at least includes the series/volume info on the About page so maybe they’ve started adding more info in more recent scans.

Anyway, despite any shortcomings I love Google Books. I suspect they’re going to change our world and the way many of us interact with books. There’s a lot of needles in those haystacks and Google Books Search acts like a magnet to pull them out for you.

Leave a Reply

You must be logged in to post a comment.

Navigation

Search

Archives

September 2007
M T W T F S S
« Jul   Oct »
 12
3456789
10111213141516
17181920212223
24252627282930

Other

Syndication