Web start-up unveils semantic Wikipedia search tool

Eric Auchard

13 MAY 2008, 06:00

4 min read

Powerset on Sunday unveiled tools for searching Wikipedia that use conversational phrasing instead of keywords, marking the first step of its challenge to established web-search services such as Google. Powerset’s technology breaks down the meaning of words and sentences into related concepts, freeing users from always needing to type the exact words they want to find.

The closely watched Silicon Valley start-up is offering a way of searching millions of entries in Wikipedia’s online encyclopaedia, helping users find detailed answers to questions rather than isolated links that require further research.

For example, a user who wants to know how many wives King Henry VIII had (six, or two, depending on your definition of marriage) can find an answer via Powerset’s service at http://tinyurl.com/5qpcr9/.San Francisco-based Powerset is looking to leapfrog the current generation of services that rely on keyword searches such as Google, Yahoo!, Microsoft and IAC InterActiveCorp’s Ask.com.

”Wikipedia is becoming a microcosm of the most useful parts of the web,” said Greg Sterling, an internet analyst with Sterling Market Intelligence. ”This offers a powerful way to find what you are looking for against this subset of the web.” While still a far cry from letting users search the world wide web, Powerset is using Wikipedia as a trial showcase for how its technology can be used to search a vast number of other websites using natural language phrases or questions.

Over time, it aims to partner with other high-quality data sites where information can be organised in a question-and-answer form that lends itself to Powerset search techniques. Examples might include financial or patent filings, the CIA Factbook or Wikipedia-inspired clones, company officials said.

Powerset, which can be found at http://www.powerset.com/, looks beyond words to try to understand conceptual relationships that get closer to what a user may be searching for. It analyses each sentence and whole documents to do so.

Powerset plans eventually to make money selling advertising alongside its search services. But for now, the 60-employee company consists almost entirely of computer scientists and linguists. It has no advertising staff and only a handful of marketing and support staff. Sterling said it is likely to take years for Powerset to be able to search the web on the scale Google now does using statistical ranking techniques to find relevant web links.

”What I don’t know is how Powerset will perform on the wide open web. In a sense, this is a massive prototype using the relatively structured information of Wikipedia. It is difficult to compare to what Google has built,” Sterling said.

Sterling said a bigger danger to Google would be if rival Microsoft were to acquire Powerset and incorporate it into other search technologies it has. Recently, Microsoft backed off a $44-billion bid for Yahoo! to create a formidable rival to Google in web search and online advertising. ”This could become the basis of a Google-killer,” Sterling said. ”Someone like Microsoft might want to buy Powerset.”

Spokespersons for Microsoft and Powerset declined to comment on rumours of a potential tie-up between the two companies.

Fun with ‘Factz’

Powerset offers richly annotated ways for searching inside Wikipedia entries to find related concepts. Called ”Factz”, these related ideas generate outlines, summaries and automated answers to users’ questions. ”Our system is a little more forgiving,” Scott Prevost, general manager of Powerset, said in an interview on Sunday. ”It is not looking for hard-word matches. We are not searching for exact words, but concepts,” he said.

The two-and-a-half-year-old start-up licensed natural language processing technology and related machine processing methods developed over three decades at the Xerox PARC research centre in Silicon Valley to create new consumer web-search services. With tacit approval of the non-profit Wikimedia Foundation, the organisation behind the Wikipedia, Powerset officials said they are hosting a copy of Wikipedia’s 2,5-million English-language entries on its own computers. This lets Powerset make links across the breadth of Wikipedia data.

”What Powerset is doing is offering readers a natural-language search interface, and we think that is an interesting experiment,” Mike Godwin, Wikimedia Foundation’s general counsel, said in response to an emailed question about how the two organisations would work together. In addition to Wikipedia, Powerset’s new service also searches a related database called Freebase created by MetaWeb, another web search start-up. After decades of research and debate, natural language processing is finally poised to go mainstream, predicted Barney Pell, co-founder and chief technology officer. ”2008 is the year that semantic and linguistic technologies cross over into widespread consumer use,” he said. — Reuters