Wilkommen, Bienvenu, Welcome... Sziasztok!

Welcome to The Lotus Position, an intermittent collection of extempore navel gazings, ponderings, whinges, whines, pontifications and diatribes.

Everything is based on a Sample of One: these are my views, my experiences... caveat lector... read the Disclaimer

The Budapest Office - Castro Bisztro, Madach ter

The Budapest Office - Castro Bisztro, Madach ter
Ponder, Scribble, Ponder (Photo Erdotahi Aron)

Monday 12 January 2009

Google Desktop Review

I finally got fed up with Vista's built in indexing and thought who better to index my files than the Masters of the Search Universe, i.e. Google.

So I installed Google Desktop (5.8.0809.23506) and what an eye opening experience it has been.

I was aware of potential security/privacy issues and so made very sure not to enable the "Search Across Computers" option that transfers my data to Google so that it can be accessed elsewhere - if if only ostensibly by me. I also very carefully avoided enabling
the "
Enable Enhanced Content Indexing" check-box: all I wanted was basic content indexing, not the "backup and view previous versions", web-history search "and more", or thumbnails of images. Just the basic index my data files stuff please Mr Google.

I was surprised at what wasn't found... and then I discovered that unless "Enhanced Context Indexing" is enabled Google Desktop doesn't seem to index at all! On Vista it uses the built in (if enabled) Microsoft Index - so, of course, when I turned off Vista's indexing I got nothing at all! Even when it was on I would get results like this...

I searched for "LQG" and got 10 results; by summary at the top they were described as "1 email... 1 file... 1 other... no chats... etc.", but looking at the list itself it showed 10 items which by icon were: 1 pdf, 2 zip, 7 emails... despite the fact that I had told Google Dekstop NOT to index emails. In other words, when it's not doing its own indexing it doesn't know which way's up.

Because I couldn't find files I knew existed, I of course referred to the online help. Here it says:







But look at the prefs...

Do you see an item called "Enable content indexing..." No, of course not. They can't keep their Help (sic) aligned with the application. No wonder I couldn't find the right thing to enable to actually get my files indexed.

But do you also notice that I said don't index C:? You can tell Google Desktop not to index certain locations - and you can specifically tell it to index others.

But what is not clear is that the Don't overrides the Do... so when I wanted to index only Documents, which has been relocated to another partition (say, X:) I said "don't index X:" "do index X:\Documents" - and of course got nothing. I then had to specifically NOT index lots of other folders on X: to leave, by omission, only Documents. Not clever.

However, I did in the end enable "Enhanced" indexing. But now I discover so many other things that are inadequate to appalling I am going to remove it, but I thought you should know the (further) gory details.

Firstly, if you want to rebuild the indexes you have to Uninstall and Reinstall Google Desktop. Yes, that's what it says here. That's mind bogglingly antediluvian for starters.

I also have lots of very long documents - in the case of "IT", the book, currently running to over half a million words if you include all the annotations etc. - but does Google Desktop index them properly? Does it hell! "Up to 10,000 words" - usually less "to save space" are indexed. That is utterly useless. I have the space, I want an index! Do I get a choice about trading space for completeness? No. Obviously they've taken a leaf out of Microsoft's book and decided that's not a choice I could sensibly want. Unfortunately, the limitation on document sizing is buried in the Help (sic) file so I doubt that many people are aware of just how incomplete their searches might be. Thanks to Mark Russinovich for bring the partial indexing to my attention, but since his blog entry was written in 2005 I thought I'd check - and the answer is still the same. Three years later, storage getting cheaper and bigger every day and still Google Desktop does not do what it says on the imagined tin.

With regards to the privacy concerns etc., Google Desktop does apparently have some sort of facility to encrypt the indexes but the check-box doesn't seem to be enabled and there's nothing here about why that may be. Anyway, I have cunningly redirected (using Tweak GDS, or a simple registry hack) the Google Desktop index location to place it on a TrueCrypt encrypted drive... taking care of course that the encrypted drive is mounted before Google Desktop starts - otherwise it reverts back to its default location without any notification or "please locate" dialog.

However, while trying to work out why Windows Defender is suddenly taking so long to scan the system I looked in C:\Users\Me.MyMachine\AppData\Local\Temp and found gigabytes of what looked like temporary ZIP extractions. Yes, I had told Google Desktop to search Zip files but I didn't know it was going to dump the contents back onto disk (and not delete them when finished) in what I suspect is an unencrypted location even if Index Encryption could be turned on.

Other miscellaneous shortcomings:
  • There's no filename option, so you can't restrict your search to documents of known name.
  • There's no "OR" or "NOT" operator for boolean searches, though you can prefix a search terms with "-" to exclude it
  • oh, I give up, it just isn't good enough
In other words, unless you just want to index lots of personal type emails and you're not concerned about privacy don't waste your time with Google Desktop: it only does partial searches, you can't filter by filename, its not well documented, and for a "product" now several years old and, strangely for Google no longer in beta, it still feels like an early beta product.

Google Searching on the Web: Hero
Google Desktop Search: Zero

I wonder if that old Outlook add-on... what was it called... LookOut is still around? I actually used that on a TB RAID server at work a few years back and it indexed EVERYTHING (that it could read)

Google Desktop: Time Wasting Stuff (the time I spent downloading, installing, trying to configure, debug, test, etc. etc. etc. before giving up.)

2 comments:

Anonymous said...

I hate to crow and have wondered about Google's efforts and privacy. In the end after reading about the struggle between Microsoft and Google I'm glad that the dear old Mac (SmackBook) et al is actually a proper system where things just work (including the Mk1 Human) as a whole. Thank the whotsits for Spotlight and the various plugins you can get to allow indexing of all sorts of stuff. It's just nice not having to struggle or think how to make it do what you want.

Blue Hill Escape said...

This is a good article, and sums up my experience. You have little control over the indexing, and whereas I installed google desktop to search through vast collections of .pdfs I could rarely, if ever, find the things I needed, even though I new they were there. Instead I would get lists of irrelevant program files.