In late July, AOL made a data set of 20 million search queries by 658,000 of its users publicly available. These searches were conducted between March and May 2006, and users were not asked for their permission to release this data. The data sample represents about 1.5% of AOL’s total users in May 2006 and about 0.33% of total searches conducted on AOL in the time period in question.
The data had not been sufficiently anonymized, so that theoretically individual users can actually be identified. AOL user names had been replaced with a random ID number, but the searches can by analyzed by user ID number, so that you can get a picture of all of the searches that an individual person conducted. Since people often search for their own name or include social security numbers in searches, it is possible to trace a search trail back to an individual.
The search trails can be embarrassing for the people who searched or can even indicate the possibility that illegal or criminal activity may have been or could be committed. So really, the data can be potentially explosive!
AOL’s original intent was to release this data to provide a data set as a tool for the research community wanting information on search behavior. After all, researchers and marketers would love to have better information on search patters, amongst other purposes in order to be able to target advertising offers better.
After the enormity of the privacy breach was picked up and reported in blogs, AOL was quick to take down the data and apologize, but the damage was done – hundreds of people had already downloaded the data set, and duplicates of the data set were available on other websites.
Why is AOL’s Search Data Blunder Such a Big Deal?
Well, because the fundamental issue of online privacy is at the core of the problem. The release of personally identifiable data without permission constitutes a breach of privacy rights. It also raises the question of what data companies should be allowed to store about their users, and for how long this data may be stored. The level of security surrounding storage of such data is also under scrutiny – how easy is potential abuse, and how likely is another screw-up or malicious breach of privacy?
There are strong conflicting interests at play – privacy, civil liberties, security, crime prevention, and lastly commercial interests of marketing companies and advertisers.
Is There Anything Good About the Search Data Blunder?
Interesting, AOL’s slice of search data reveals how much there is still to learn about behavioral patterns in searching:
- Is everyone really searching for themselves or could they be searching on behalf of others?
- Over how much time can a search pattern stretch? How do you communicate to a searcher who is searching related keyword terms over periods of several months or longer?
- Are you really sure a search trail can be attributed to one single person? What about multiple people using the same computer?
- How do you best guess the intent behind very broad keyword searches?
As a marketer, I find these questions fascinating, and I can see how better understanding of these kind of behavioral issues could lead to the development of more relevant (for the users) and effective (for the advertiser) advertising products.
As a search engine user, I understand that a huge amount of data is collected about my search behavior every single day. I still don’t like the thought that “Tell Me What You Search and I’ll Tell You Who You Are” could be applied to me personally. The thought of this information in the wrong hands is quite chilling.
What Will Be The Fallout For AOL and Others?
It remains to be seen what long term damage this search data blunder may or may not do to AOL. The blunder has, with some delay, caused some heads to roll internally: This Monday, it was reported that Chief Technology Officer Maureen Govern and two other employees have been suspended by AOL. Two bodies involved in civil liberties and privacy advocacy, the Electronic Frontier Foundation and the World Privacy Forum, also filed complaints against AOL last week over AOL’s violation of the privacy of its users.
The risk of search data either unintentionally or maliciously landing in the wrong hands has definitely been highlighted – and this should not only concern AOL, but also Google, Yahoo!, MSN and any other online entities collecting and storing behavioral data of users.