Innovation: Shrewd search engines know what you want

For better or worse, search engines have become the gateway to the web. They help users to find information, advertisers to sell products – they even help hackers in their shady pursuits. So they are also the focus of intense research to variously help or hinder those parties.

The cut-throat competition among search engines has ensured the operators are keenly attuned to keeping their visitors satisfied. But serving up handy links is a tricky business, especially when we searchers often use ambiguous terms.

Demographic data can help, say Ingmar Weber and Carlos Castillo at Yahoo Research Barcelona, Spain. For example, they say that when US women type in the search term "wagner", they are most likely to be thinking of the 19th-century German composer. US men, on the other hand, may well be thinking about the makers of spray painters.

By giving a search engine some basic demographic information, such as age, gender and educational background, it is possible to boost the engine's chances of identifying user intent correctly, say Weber and Castillo. That personal information can be gleaned when people sign up to the other services, such as email, that search engines provide.

To check their theory, the researchers analysed data collected from Yahoo account holders through its search engine over a 12-month period. They then identified ambiguous search terms by looking at searches for which the top few results concerned wildly divergent concepts, and recorded which result the user chose to click on. By running those ambiguous search terms through their demographically modified search engine, they managed to get the chosen link to appear as the top-ranked result 7 per cent more often than in the standard Yahoo search.

Follow the cursor

Analysing users' mouse clicks like this is one way search engine operators can evaluate the quality of their system. The best way to find out what a computer user is interested in, however, is to use eye tracking to see what a person looks at on screen. Unfortunately, eye-tracking hardware is expensive, and few people use it.

However, the mouse serves analysts' needs almost as well, say Qi Guo and Eugene Agichtein at Emory University in Atlanta, Georgia. Recent studies suggest a surprisingly high number of us – 70 per cent – use the mouse cursor to help our eyes follow text on screen. Tracking the cursor offers a cheap way to check how people read a search results page – vital information for companies keen to target their adverts at people who are likely to respond.

Not everyone who searches for information on the latest mobile phone is keen to buy one – and if they're not shopping, the sponsored adverts that appear at the top of the results page are wasting the advertisers' money, say Guo and Agichtein. If users become accustomed to ignoring such adverts, they may continue to do so even when they are ready to buy, they speculate.

The pair built a web browser add-on that tracks the cursor, and tested it on 10 volunteers, asking them to conduct searches that were either looking for information or for something to buy. The volunteers' mouse movements revealed their intent with 96 per cent accuracy. Because users often reword their query several times in a search session, working out their intention can help determine whether or not it's appropriate to display ads in later searches, say Guo and Agichtein.

Malicious intent

But search engines have a dark side too. They form a vital part of hackers' toolkits. For instance, once a potential website vulnerability emerges, a quick web search can gather together a list of all sites which have that security flaw in their web code. That's because as the search engines index the web, they record pages' source code and make that information searchable.

How better to hunt down hackers than by setting the search engines themselves on them, asks John John at the University of Washington in Seattle and Microsoft Research Silicon Valley. With colleagues, John has developed Searchaudit, a system that uses search engines – and the hackers themselves – as guides to malicious sites and forums.

Searchaudit begins with a pool of known malicious queries and then trawls search engine logs to identify the small number of users who searched for those terms. Then it looks at all the other searches that those users requested, and in this way finds more malicious queries, identifies more would-be hackers and uncovers yet more dodgy queries.

The team set Searchaudit loose on Microsoft Bing's search logs for a three-month period. Using just 500 initial queries gleaned from one hacker website, the software detected 4 million malicious queries – identifying some threats even before they had been circulated on hacker websites.

Network security is an arms race, with advantages gained on one side quickly neutralised by the opposition. But because so much of that race is conducted through search engines, John and colleagues argue that hackers would find it very hard to combat the use of those very search engines to reveal suspicious behaviour –

No comments:

Post a Comment