Monday, January 16, 2006

Why Google Wants Your Data

I was configuring my Gmail account for POP access today when the thought occurred to me: "Why does Google let me do this? Gmail makes the company money only when users click on ads. When I access my email via POP, I never even visit the site."

Perhaps they think POP access will draw enough new users to make up for the lost revenue. After all, even the POP people will probably use the web client at times when they are away from their desks.

Personally, I think Google is just looking ahead -- to the era of personally-tailored search results.

You can't get there by merely indexing every web page and every book ever written, as useful as these habits are. No, to make the next leap in search, Google must get to know you personally. Google therefore hopes to learn your tastes and your tendencies, or at least have the tools in place to do so later. What better way to do this than with products useful in their own right, like Google Toolbar, Google Desktop, and Gmail​?

If you ask a good friend whether you should see "Disappointing Sequel III" this weekend, she'll give you an intelligent answer. She won't just tell you if she liked it, but will use her knowledge of your tastes to venture whether you would like it. "I didn't care for it," she'll say, "but you will probably love it."

Now, imagine your friend is a search engine. If you type 'fedora', your friend would know you are, in fact, one of those few twisted souls more interested in hats than operating systems.

Google, on the other hand, will talk about Linux until about the 20th link. That's because, right now, it just plays the percentages by looking at the links between pages. If you mean what most web page makers do when you search for 'fedora', you are probably satisfied with the results.

As you may know, Google already records every search query you enter and links it to your IP address. They don't yet use your search history to improve your search success. There are privacy concerns, after all. Imagine your horror if your geeky friend came over to your desk, and, googling 'fedora' on your computer, saw a bunch of links about early twentieth-century head coverings come up.

Google could certainly do more to get to know you. Suppose, for a moment, that you're ok with this.

If you let the search engine monitor your surfing habits, it would know which sites you visit and for how long. This would not only give it a better picture of your overall habits, but would let it know what you're working on right now. If you're uncharacteristically looking at pages about Linux, it might correctly assume that a search for 'fedora' should ignore hats, just this once.

The engine gets similar benefits by reading your mail, your instant messenger transcripts and your half-forgotten, half-baked novel.

Now, give it a little more access. Let it measure how fast your computer is and how big your hard drives are. Let it watch which applications you use, and how you use them. Let it hear what music you listen to. Let it tally up how fast you type, and which words you misspell most often. Let it see your digital photos.

How is it supposed to make use of this information, you might ask? By finding patterns among users.

Some of these patterns might make a kind of sleuthy sense to us: Perhaps it guesses you like hats -- not because of your previous searches, but because it has discovered that users who, like you, listen to country music and take pictures of horses, tend to spend more time on hat pages than on Linux pages.

Some patterns it uncovers might seem bizarre: Maybe it finds that people who listen to country music and like hats express the most love online to Mac users who read the Onion and take photos with greater-than-average red eye. Google Date, here we come.

Pattern-seeking software has another name: expert system.

Medical expert systems help doctors diagnose illnesses that fit complicated patterns of symptoms and test results. Outside their area of expertise, expert systems are useless. But, within it, they are savants. Sifting through mountains of data, they look for, and find, patterns that humans never would. Sometimes these are statistical artifacts that don't hold up against future data. Sometimes the conclusions hold up surprisingly well, for reasons we simply haven't figured out yet.

People like me refer to expert systems as a type of narrow AI. Google, or any aspiring rival, must see their products as examples of narrow AI in the very wide domain of answering questions. If a search engine or any other expert system can widen itself sufficiently, it ceases to be narrow and becomes a general AI. Better hope it knows how to 'not be evil'.

With this longer view in mind, we should not be surprised by observations that Google is collecting or retaining more data than would seem necessary -- or even profitable. Likewise, we can expect Google to continue churning out products that give users immediate benefits while giving their clusters more data to crunch. We can even know when Google is about ready to serve us the Next Big Thing: when it provides, behind a personal log-in screen, a beta search tool that asks if it can access your other Google tools to give you better results.

Whether you use the improved search will be up to you. Technically, Google should be able to do all this without violating your privacy. But if you're the kind of person who doesn't even want a program looking at your honeymoon photos, you will probably pass.

What were you two doing with that fedora, anyway?


Post a Comment

<< Home