Google and the future of search: Amit Singhal and the Knowledge Graph

Tim Adams

With Knowledge Graph, Google has radical plans to transform the way we search the internet … again.

The Google headquarters in Mountain View, California. (Kimihiro Hoshino)

Where will search go next? One answer to that question was provided by the billionaire double act of Sergey Brin and Larry Page, Google's founders, in 2004, when pressed about their vision of the future by the former Newsweek journalist Steven Levy.

"Search will be included in people's brains," says Page of their ambition. "When you think about something and don't really know much about it, you will automatically get information."

"That's true," Brin says. "Ultimately I view Google as a way to augment your brain with the knowledge of the world. Right now, you go into your computer and type a phrase, but you can imagine that it could be easier in the future, that you can have just devices you talk into or you can have computers that pay attention to what's going on around them."

Page, generally the wilder thinker, was adamant, though. "Eventually, you'll have the implant, where if you think about a fact, it will just tell you the answer."

Nine years on, Brin's vision at least is already a reality. In the past few years, a great advance in voice-recognition technology has allowed you to talk to search apps — notably on iPhone's Siri as well as Google's Jelly Bean — and Google Now, awarded 2012 innovation of the year, will tell you what you want to know — traffic conditions, your team's football scores, the weather — before you ask it, based on your location and search history.

Page's brain implants remain some way further off, though both Google founders have lately been wearing "Google Glass" prototypes, headbands that project a permanent screen on the edge of your field of vision, with apps — cameras, search, whatever — that responds to voice-activated commands. Searching is ever more intimately related to thinking.

In this sense, the man who is these days in charge is called Amit Singhal. Aged 44, head of Google Search, he is a boyishly enthusiastic presence, who has taken over responsibility from Brin for writing and refining the closely guarded algorithm — more than 200 separate coded equations — that powers Google's endless trawl for answers through pretty much all of history's recorded know-ledge. In the past 12 years, he has never stopped finding ways to make it ever smarter and quicker.

Google's Mr Search has come a long way to get here. He started out in a village in Uttar Pradesh in India, in a home that for the first eight years of his life possessed no screen at all. When one arrived in 1977, a black-and-white television, it carried for Singhal, he tells me, all the magic of prophecy.

"There were two kinds of programmes," he says. "Programming for local farmers and reruns of American series such as Star Trek."

You don't really have to think too hard to imagine which of these programmes Singhal chose.

He started studying the idea of search as a graduate in the United States in 1991, the year the world wide web began making its connections. He did a PhD and then ended up in the Bell laboratories at AT&T. It was only when he came to Google in the millennium year that he experienced "a strange kind of discontinuity". Everything that had seemed like science fiction all his life was suddenly within his compass.

To prove that point, Singhal takes his Android smartphone out of his pocket and, like Captain Kirk, talks into it. "Google: What is the population of London?" he asks. "The population of London was 8.174 million in 2011," the carefully conversational voice replies. "How tall is Justin Bieber?" he wonders.

"Justin Bieber is 5ft 7in tall." Singhal looks at me with childlike glee. "If I had gone to sleep 20 years ago and you had woken me up today and I heard that, I would be thinking, yes! And where do I sign up to fly to another galaxy?"

What he is demonstrating, however, he insists, is still just the beginning. Google search is, he says with evangelical zeal, on the threshold of another epochal change in its fast-forward evolution. Having searched for a decade or so using the original brilliant principle of hierarchies of web-based links, the great primary-coloured knowledge domination machine has, Singhal says, "begun to learn how to understand the real world of people, places and things".

To answer his question about Justin Bieber, Google already has to know quite a lot. It has to know Justin Bieber is a person and that tallness means height.

"So you have already got to get to the semantics of what is being asked. But even that is not enough. Because beyond that there is this huge mass of unstructured text that we know as the web. And you cannot properly understand what was asked for without really understanding how you are going to go about answering it."

Until now, Google has been an unprecedented signposter of know-ledge. It has not "known" the answer to anything itself but it has had an awfully clever way of directing you to exactly the place you can find out. In some senses, that attribute is in the process of changing.

This year, Google will roll out what it calls its Knowledge Graph, the closest any system has yet come to creating what Tim Berners-Lee, originator of the web itself, called "the semantic web", the version that had understanding as well as data, that could itself provide answers, not links to answers.

The Knowledge Graph is a database of the 500-million most-searched-for people, places and things in the Google world. For each one of these things, it has established a deep associative context that makes it more than a string of words or a piece of data.

Thus, when you type "10 Downing Street" into Google with Knowledge Graph, it responds to that phrase not as any old address but much in the way you or I might respond — with a string of real-world associations, prioritised in order of most frequently asked questions.

Five years ago, when John Battelle wrote his book The Search, which is still the definitive history on the subject, he concluded by imagining a future directly out of Isaac Asimov's science fiction. "All collected data had come to a final end. Nothing was left to be collected. But all collected data had yet to be completely correlated and put together in all possible relationships. A timeless interval was spent doing that."

Knowledge Graph, you might say, is the beginning of that "timeless interval". Google has already come closer than anyone could ever have imagined to the "nothing was left to be collected" part of that equation. It is in searchable possession not only of the trillions of pages of the world wide web, but it is well on the way to photographing all the world's streets, of scanning all the world's books, of collecting every video uploaded to the public internet, mostly on its own YouTube.

In recent years, it has been assiduously accumulating as much human voice recording as possible, in all the languages and dialects under the sun, in order to power its translation and voice recognition projects. It is doing the same for face recognition in films and photographs. Not to mention the barely used possibilities of the great mass of information Google possesses about the interests and communications and movements and search history of just about everyone with a phone or an internet connection.

This data has been collected not just for the purpose of feeding it back to us as accurately as possible, but also for the wider purpose: of teaching Google how to think for itself.

Singhal has worked with what he calls "signals of salience" for the past dozen years, finding ever more accurate text and link-based methods of making searches happen. But also, crucially, as these signals have become ever more sophisticated, Singhal and his team have been able to "observe the whole world interacting with the data, and with that we were able to begin to do something else, which was to begin to make the computer understand the context of what was being asked".

The way in which this is done is quite simple. Search analysis is divided into "long clicks" and "short clicks".

A long click represents a satisfied customer. A user performs a search, clicks through on a result and remains on that site for a long time. They do not come back to the result set immediately to click on another result or to refine their query. A short click is the opposite of a long click. It occurs when a user performs a search, clicks through on a result and quickly comes back to the result set to click on an alternative result. It represents a minor failure.

We may think we are learning all the time from Google, but by virtue of this ongoing trillion-click analysis, it is learning far more from us.

In this way, as far back as 2002, Singhal introduced a refinement based on Ludwig Wittgenstein's theory on how the meaning of words is always influenced by context. Searches for ambiguous terms began to look beyond the search terms for other related words. So a phrase such as "hot dog" would be understood in relation to mustard and baseball games, not overheated canines. "Nuance," he says now, "is what makes us human."

I imagine, I say, that along the way he has been assisted in this work by the human component. Presumably we have become more precise in our search terms the more we have used Google? He sighs, somewhat wearily. "Actually," he says, "it works the other way. The more accurate the machine gets, the lazier the questions become. So actually our lives get harder."

He had to work especially hard to correct and understand spelling errors and analyse synonyms. And all along the dream has been the old Star Trek one of providing the right answer to what you think you want to know even if you do not know quite how to phrase the question. To work like a mind works, in other words.

"The end game of this is: we want to make it as natural as possible a thought process," he says. "We are maniacally focusing on the user to reduce every possible friction point between them, their thoughts and the information they want to find." Getting ever closer to Page's brain implants, in effect.

Knowledge Graph is the first real demonstration of that prowess. It started a few years ago when, Singhal says: "We ran into this tiny company called Metaweb, which had, through a symphony of machines and humans, begun to perfect a system to present real-world people, places and things in a computer memory. The method seemed scalable. So we bought this company."

By that point, Metawab had stored 12-million reference points. Over the past two years, in its characteristic style, Google has quickly accelerated that to "over 570-million references with 18-billion factual connections between them". (This is a sizable number: by point of comparison, the English version of Wikipedia has about four million pages.) Google is in the process of launching Knowledge Graph in seven languages and aims to exhibit the same local intelligence in each.

Knowledge Graph's project manager is Emily Moxley. She talks me though some of this intelligence. It goes quite a long way beyond being able to distinguish between an English query for football scores and an American one.

"In Japan, for example," she says, "our analysis shows that people want to know quite a lot about the blood type of film stars", so that will be a prioritised part of the instant Knowledge Graph in that part of the world.

Likewise, Japanese Googlers seemed short-click frustrated that the search for sumo wrestling data was not as accurate as it might be. "We worked on rectifying that," Moxley says. "We thought at the very least we should be able to answer a certain depth of queries."

What kind of depth?

"Somewhere at least in the most popular tens of millions," she says.

More than that, Singhal wants to be sure that all aspects of the data are properly in harmony with your desires.

"If you wanted to find out about Dr Martin Luther King's 'I have a dream speech'," he says, "you might want the text, you might want a picture of him, but we guess that what you really want is a video clip of him delivering the speech — so how to get that to the top of your search."

Again Knowledge Graph can deliver that; it starts to know what you want to know.

In talking to Singhal, it is quite easy to get caught up in the utopian possibilities of the technology and quite easy, of course, to forget that Google has also created wealth faster and more efficiently than any company in history; that it is probably the most effective generator of advertising dollars ever invented; and that a great deal of what it knows about us we might well want it not to (an unease that might grow by association now that Facebook has announced a search engine of its own data, one that promises to be even more intimate in its revelation of personal history than Google has ever dared to be).

All the Google employees I speak to adopt the same kind of reflexive flinch if you hint at any of this, if you suggest that their motives for all this data gathering, this knowledge sharing, might be anything other than pure.

It is the same kind of "Why wouldn't you trust us?" flinch that has powered the company's growth through loyalty and that sees it refuse to reveal its own intimate search history even when threatened, as it is currently by the European Union, to prove that it does not artificially weight algorithmic results in favour of its own products and commercial partners.

Singhal rejects all of this. He winces when I ask: "What's in it for Google?"

"We are a search people," he says. "The thing that motivates me is to build a search engine that will outdo all my previous creations. Simple as that."

Further, he believes, as a statement of faith, "that all information is empowering". But what about the less measurable ways that the ease of search has changed our lives? I ask. What about the ways in which it has diminished the excitement of serendipity, the way that it has made the personal experience of a chance encounter with knowledge so much rarer?

Singhal has been working on that. The Knowledge Graph will still return the results it thinks you most likely need, but down the list it will have a randomised element; it will have chance built into it, another way it might mimic the way we think.

His current obsession is in behavioural psychology; he has become an avid student of the work of Daniel Kahneman.

"I just love the way it details how human beings feel when faced with choices and decisions, what makes you run away when someone offers you 32 chocolates to choose from, but which satisfies you when they only give you one chocolate."

How, I wonder, will Google incorporate that knowledge in its unending search?

"I don't know exactly yet …" Singhal says brightly, leaving you in no doubt of what might be his organisation's guiding mantra: he will soon. — © The Observer

Although this feature has been made possible by the Mail & Guardian’s advertisers, content and photographs were sourced independently by the M&G supplements editorial team.

Topics In This Section


blog comments powered by Disqus