Google just recently announced a knew knowledge engine that allows users to uncover an absolutely critical piece of information — an actor’s Bacon Number — as a demo of their new “Knowledge Graph” capabilities. Simply type in Bacon Number [actor name] and it will spit out the actor’s Bacon Number and how it was calculated. This has been met with a fair bit of sneering on the web. Examiner.com sees this in the context of the internet as whole providing “endless time consuming and time wasting opportunities like Farmville, Facebook, Twitter and now, the Google Bacon tool.” Over on the Webmasterworld Google SEO forum, there was a fair bit of simple-minded sneering at the uselessness of this and how Google would be better of focusing on “real” search. The New York Times facetiously opened their article with
In a sign of how committed Google is to helping people answer their most pressing and urgent questions, the search engine has unveiled a way to find famous people’s Bacon number.
Funny stuff. But all missing the point.
The New York Times writer is one of the few who, in fact, gets it. The author concludes with this little bit of the bigger picture:
The bigger point, Google says, is that the game demonstrates the promises of a piece of its search technology called the Knowledge Graph. It is a database that maps facts and how different people, places and things relate to one another.… Google has been working on the Knowledge Graph since 2010, and in May it added related facts about people, places and things on the right-hand side of search results. After Larry Page became chief executive of Google last year, he changed its mission from search to knowledge because he said he thought finding a Web page was too narrow. He wanted to help people understand the world. Starting with Bacon numbers.
Why Does the Google Bacon Number Calculator Matter?
Judging by the comments of some in the SEO community, it doesn’t. What they seem to miss is that any complex system — and finding patterns in loosely structured data is a complex system — needs simple test cases and playgrounds. The Wright brothers began their quest for manned, motorized flight by testing early wing designs on small gliders, many just flown as kites.
This didn’t appear as a great innovation — kites were an old technology. But this was a crucial step in understanding the wing designs that would allow them to build a airplane powered by a gas engine and capable of carrying a passenger. By the time they flew their famous figure-8 around Kitty Hawk, it was fairly clear that a revolution was in the making. But it started with their kites.
The Google Bacon Number Calculator is not the plane that flew the figure-8. It’s the kite. It does something that we can already do. It’s significance is in providing a test case for a new way to do that same thing. In this case, that same thing is calculate a Bacon Number, a trivial and relatively useless task. The naysayers are stuck focusing on the task, not the method. But the innovation is in the method. The innovation is connecting knowledge without being told exactly what connections to make, but simply to make those connections based on a generalized method. Once computers get good at that, the world changes. Literally.
What about Wolfram Alpha?
Most definitely, the Google Knowledge Graph is cut from a similar cloth to that of the Wolfram|Alpha Computational Knowledge Engine. W|A is cool, even useful, but it’s also obviously limited. It seems to do pretty well with mashing up structured data, but it’s a far way from being able to crawl the web and make connections. Make no mistake, Google is miles from that too, but that appears to be where they are heading and, unlike W|A, they might actually have the resources to get there.
In other words, at this point, W|A still needs too much hand holding and feeding to be a game changer. The question is how much hand holding does Google need to produce the Bacon Number Calculator? If it’s a smart system that’s figuring this out from a bunch of data and a simple algorithm, then I think this is huge. When I say a “simple” algorithm, by the way, I really mean an “abstract” algo, one that doesn’t depend too much on the actual data set, but one that could with a simple tweak calculate the Redford Number and with a different data set calculate an Erdos number (or even the Bacon-Erdos Number, which some people have).
Is the Future of Search Keyword-Based?
It seems obvious that the answer to this is a resounding “No!” but the snarky reactions to the Google Bacon Number Calculator highlight how locked into that mindset people still are. Over on Webmasterworld, user sanatapaws writes “rather than search for what im typing they are looking for more and more deep relationships and watering the serps down with those… clever but its not search.”
Really? Connecting information and finding patterns is killing SERPS? Finding deep relationships is clever but not search?
Let’s back up a bit here. Keyword-based searches are what you do because you can’t do anything else. You do them because your system is so dumb that all it can do is match
(this is a bit of a joke – put it into a binary decoder).
Think about this for a second — I’m a historian working with obscure texts with unfixed spelling. We have a huge corpus of texts (1000s of pages) our research group has collected and transcribed over the last 25 years since 1987. Much of this material is in a database that allows us to pull out some types of information (where the query fits the data structure), but not others. I have a special weapon in pulling out obscure connections in our texts. I am using PowerGREP, a powerful regular expression search tool. This ultimately simple and limited ability to stem searches has given me a huge advantage over my keyword-searching colleagues. It allows me to find relationships nobody else can just because I can search 100 texts at once and stem for 200 patterns in a given search and stem for much more complex patterns (Jean separated from Louise by less than 50 characters).
Ultimately, though, this is still incredibly dumb search, that of an automaton incapable of understanding my question and its context. There are a lot of questions I can’t answer using that method. But here’s the thing — if I find another historian who has read and studied a corpus and I have a chance to actually have a conversation with her, she will understand the meaning of my question. I do not have to even be close on the terminology because that person has access to all sorts of deep relationships that are not based on matching
Unfortunately, just stem matching the second simplest imaginable set of patterns (I tell it exactly what alternatives to try). Google “Did you mean?” and synonym matching is perhaps the third simplest set (I don’t tell it how to stem, but it’s sort of smart enough to stem itself based on dictionary and thesaurus lookups). All of this is primitive. It will be laughable in 30 years.
From Keyword Matching to Understanding
When computers move to understanding rather than pattern matching — and that’s a continuum we started down a while ago, but we’ve only take three baby steps on a 10,0000-mile journey — the potential for extracting data from a data set is game changing. It is, effectively, a shift from data retrieval to knowledge expansion and that may ultimately be the most significant shift in human knowledge since the invention of writing.
Already, we see important implications. In the Webmasterworld thread on the Bacon Number Calculator, Robert Charlton highlighted the connection between the Bacon Number Calculator and innovations in Google Image Search based on Knowledge Graph data. Images set in a context (place, time, subject) are much easier to process and “understand” than images divorced from context. The Knowledge Graph is ultimately aimed at providing that context. That understanding of context therefore dramatically simplifies your image recognition algo. If you have a picture of a steel lattice tower, knowing the photo is from France or Nevada changes everything for your image recognition software. On a more sophisticated level, knowing a photo is from a natural area or a city, knowing it’s from 2012 or 1912 and so on simplifies your image recognition problem a lot.
Why the Future of Google Depends on the Bacon Number Calculator
Right now I as the searcher do most of the work. I choose the keywords, refine them, find alternatives, sift through the results, add terms to remove irrelevant results and so on until I find the pattern the search engine needs to find the information I want, then I as searcher make the connections, filter the information and arrive at conclusions. Like a three year-old with limited cultural context, Google can merely respond to my keywords and spit out solutions — often a vomit of completely irrelevant and unconnected content that just happen to contain, somewhere, the two words that I have in my query. At this point, Google might even have a rudimentary sense of context. As I say, this is incredibly primitive.
Google knows this (and Bing and Blekko and Facebook do too). They also know something else: as computing horsepower increases, keyword-matching companies will become the wheelwrights of the information age. It will eventually become so cheap to crawl the net and crank out matches to keyword-based searches that nobody will need Google. The first decent quantum computer running Grover’s algorithm will allow for ridiculously fast searches of huge collections of unstructured data. Goodbye keyword-matching companies!
Right now, there are only two players that matter in search and it’s going to be that way for the near future because of the capital involved and that won’t change immediately because computing power increases very slowly, stuck with Moore’s Law of a mere doubling roughly every eighteen months. The slow pace with which computer speed increases us blinds us to the long-term implications of its ultimate exponential growth.
Compare this to the speed of DNA sequencing, which has superexponential growth. If you plot Moore’s Law on a logarithmic scale, it appears linear, but the increasing speed of genome sequencing still looks exponential. In other words, computing speed is only increasing as the log of DNA sequencing speed and this is why I say it is “slow” and why we can be blind to the effect of those changes. It is easier to imagine what massive increases in power mean by looking at DNA sequencing than at computing and search.
Right now, search with the laughable Bacon Number Calculator and dumb keyword matching is basically in the world of the Human Genome Project and Craig Venter’s Celera Genomics — only two players were involved and it cost three billion dollars to get the first DNA sequence. There is a complicated history to how DNA was actually used. The Celera project made complete sequences of five individuals, while the HGP took a different approach not directly comparable to a single sequence from a single individual, so the $2.7B from the govt and $300M from Celera resulted in a “first” sequence, but lots of other data for that $3B.
Still, the cost of that has decreased over a million-fold in ten years. Within ten years, it is likely to cost a few hundred dollars to have your genome sequenced and everyone cool will be doing it. At that point, it is a boutique industry that can’t really support just two giant players with massive investments in capital and massive payrolls… like Google and Bing.
The gold in the gene sequencing will not be in churning out sequences for everyone and his brother for a few hundred dollars a pop. It will be in the exabytes and zettabytes of data that will have been created, Big Data that allows researchers to tease out more and more information. The limit on that data teasing now is that gene sequencing is advancing so much faster than computing that they don’t have the computing horsepower to crunch all that data. In the long run it isn’t the sequencing where the value is and it’s not even the simple pattern matching within sequences where the value is. Where the value lies is in making connections that a human can’t see, can’t even ask about because he doesn’t know what to ask and because his dumb keyword-matching computer can’t see deeper patterns.
Ultimately, that’s where the Google Knowledge Graph is trying to go — it’s trying to go to a place that the boutique search shop with an entry-level quantum computer won’t be able to touch. It’s trying to get to the point where you feed in exabytes or zettabytes of data and you don’t just stupidly match patterns based on words a person puts in, the algo actually pulls out patterns that people don’t know to ask for and uses those patterns to inform the answers to questions, to literally discover new knowledge rather than to merely find old knowledge. To think of it in historian terms for people old enough to recognize this reference, it is the difference between a researcher and a card catalog. The card catalog is useful, but it can never by itself make the connections to allow it to answer a question that has never been asked.
If Google fails to be the one that figures this out, they will eventually become irrelevant to “search” whatever that means. But Google isn’t staking the bank on search. They may be a VR glasses and self-driving car company in the future and not even have a search engine at all. But let’s say that’s the case — it is the Knowledge Graph, not the keyword matching, that will allow Google to have self-driving cars and full VR glasses. If the car and the glasses are to truly succeed, they will depend completely on a deep and powerful Knowledge Graph, a refined understanding of context, a detailed vision of the world, a type of “search” that goes way beyond text, beyond images, beyond the web.
To get an idea of where Google Glasses could lead, read Vernor Vinge’s Rainbows End, a novel set in the not-so-distant future where everyone is “wearing”, meaning using augmented reality devices. The world has a knowledge depth provided by real-time search as you walk through it, the physical world can be reskinned like your computer applications of today and more. Achieving this means developing a map of the world that is even more detailed than the Borgesian 1:1 map of the world. That, ultimately is Google’s goal (that last link is a must read article for everyone who plans to live another twenty years).
If Google fails to come up with a 1:1 or perhaps 10:1 map of the world, a deep reality available to cars and glasses, it will ultimately become irrelevant as a search company too. That’s our future and the trivial Bacon Number Calculator drawing so many sneers and snarky remarks on the web is just the latest Wright brothers’ kite. In and of itself, an irrelevant, trivial toy. Compared to this, the 720-kilopixel (yes, kilopixel) Mavica from Sony was a mature product in the history of digital photography, coming 12 years after the invention of the CCD. And yet, personally, I missed that one. I couldn’t see the significance, because I didn’t realize that it was the first plane. All the less could I see the significance of the CCD (that is, the kite). So the Bacon Number Calculator is not the first plane and it’s not the Mavica. It’s the kite. A nothing, a cipher, a frivolity. But also the future of an industry.
Why the Bacon Number Calculator May Not Matter After All
So that’s all great. Google (or someone) links up all sorts of amorphous data, scrub it, clean it, analyze it, build an algorithm that can connect dots and names and pixels. Genius. But what if Google simply doesn’t have access to the most important data in the world? What if Google can link up data so that when you read a book, there’s no need for an annotated edition, because Google can understand not only quotes and link those to the source, but can also understand allusions and oblique references and link those to the source? Let’s say it can do this — an amazing and miraculous thing — but what it can’t do is access your Facebook stream and know which posts you liked and what your closest friends think is the best beer on the planet? So there is a long-term problem there for Google and an opportunity for Facebook that they are planning to make the most of. Google is always limited and Google Plus… what was Google Plus again? Did it have something to do with boxes of friends? Anyway, it may be that the social component is more important than the knowledge graph component, but ultimately I don’t think so.
So it is quite possible that if Google connects up all the world’s publicly available knowledge but doesn’t win in the quest to link up all the private data, it will not become a main public-facing “answer engine”. It may, instead, become the wholesale data supplier for all the public-facing engines which, by the way, would make it almost more important and more of a juggernaut than it is now.
Looking to the Future
I can’t even adequately describe how primitive I feel today’s keyword-based search is. I believe that in 30 years, when children watch old movies from 2010 and people are doing keyword-based searches, you will have to explain to them what it was and why anyone would want such a thing. You will have to explain why the type of search we have today was even useful. You’ll have to explain to them that Google once ran something called a search engine and people actually cared. They will look at you dumbfounded and ask how you ever found anything. And the elder geek in the room will say “Yeah, I remember when they all laughed at the Bacon Number Calculator”.