The official Web site for OpenCola -- one of the most interesting software start-ups in years -- claims that Cory Doctorow is "Chief Evangelist" and "Spokesmodel" for the company, which rolls out its new OpenFolders collaborative filtering environment this month. But Doctorow also happens to be an accomplished science-fiction writer, winning last year's prize for best new talent at the Hugo Awards (the Oscars of the sci-fi community). Interestingly enough, his latest, as-of-yet-unpublished book, Down and Out in the Magic Kingdom, documents a future where Walt Disney World is run by an elaborate collaborative filtering system. The overlap between Doctorow's day job at OpenCola and his night job writing fiction makes for a unique combination: It's almost like William Gibson launching a company that sells neural plugs and stim-sims. I first came across Doctorow at this year's PC Forum, where he delivered a hilarious on-the-cuff running monologue during the Peer-to-Peer panel. OpenFolders belongs squarely to the illustrious tradition of group AI programs like Ringo, Firefly, Alexa, or the recommendation engines at Amazon: software that taps into the collective intelligence of large groups of people by observing and learning from patterns in their behavior. Unlike its predecessors, OpenCola is also in the soda business -- albeit somewhat ironically. You can order the OpenCola soft drink from their Web site, and true to the Open Source tradition, they've published the "source code" for the beverage under the GPL. Cory and I talked late last month about the goals of OpenCola, their distinctive take on how to measure relevance, the "all your base are belong to us" craze, new compensation models for info mavens, and his love-hate relationship with sugar water. -- Steven Johnson
FEED: So first things first: Give us the "elevator pitch" for OpenCola. CORY DOCTOROW: Our one sentence cut-line is "It's Tivo for the Internet." The idea is that you have a folder on your desktop, you put some things in it you like, and it will fill up with things that you'll probably like. It figures out what you'll probably like by finding peers in the network who have taste similar to you and telling you what they think is good. The software fetches documents from peers and from various Internet servers (Web servers, databases), puts them under the noses of people that it thinks will like them, then watches what the peers do when they get them: Do they attend to them as though they like them, or do they throw them away? These implicit, observed decisions are aggregated and the result is a "relevance-switched" network where documents automatically migrate to the attention of people who'll probably like them, based on human decisions. That it's in a nutshell. FEED: One of the things that really struck me about it was the mechanism you guys are talking about for relevancy relationships. There's the tried and true key-word matching routines that I gather you do some of. But we all know there's a level beyond just overlap in words that connects us to things that we might be interested in. DOCTOROW: We started off really focused on relevance, and we were looking around at what was happening. And at that time, somewhere around there, Autonomy launched and got a lot of ink about using Bayesian relevance matching, which some of our research people had been talking to us about for a while. It's Bayesian statistical relevance matching -- it's really old stuff. The math is about three hundred years old. And it's pretty widely regarded as a really good way of figuring out what's relevant to what. But over time, there's been a shift, I think, in the way that we look at relevance. Relevance is a good way to figure out whether something that you're looking at is like something else that you've looked at before. Relevance is not a good way to figure out whether something that you're looking at is something that you'd be interested in. Relevance is only good at exploring the domain of things that you know that you don't know. The things you don't know you don't know is a much more interesting domain. The relevance that we're interested in is: Do you and some other person, who knows about something that you don't know about, generally agree -- and if we can figure that out, then the real relevance question that we're looking at is: "What is that person like?" So what we're building is a tool that for someone who wants to dig deeper into a domain that they're interested in, we'll identify the places where that information comes from, so you and I both find articles that are good at feedmag.com, our computers notice this fact, and our computers team up to spider feedmag.com and find new documents, and then rank the documents that are found based on their similarity to previous documents that we've enjoyed. But it tweaks that ranking by finding people who we agree with on the network and seeing whether or not they've already reviewed those documents, and what opinions, if any, they've generated through the process of looking at them. That's the really important part. And if you and I discover that we both like feedmag.com, and I've got a site that you've never heard of, say, I've got my blog boingboing.net in my settings file -- this is a way for you to discover that boingboing.net is a great place to find good and interesting information. FEED: And you can also negatively correlate off of that too, yes? So if you constantly disagree with somebody -- DOCTOROW: That's right. Then they become less relevant to you. If the software thinks the document is sixty percent relevant and someone who you have no fundamental accord with think that it's one hundred percent relevant, we don't tweak the score at all. But if there's someone who you have accord with, then the score changes based on how they've reacted to the document if they've seen it.
FEED: How would you compare it to the models used by some of the original companies in this space, like Firefly? DOCTOROW: Well, there are a couple of ways in which I think that we've overcome some of the problems with Firefly. The example of Tivo for the Internet is not because Tivo still exists and Firefly and Ringo don't; the reason that we use Tivo is that Tivo has got a complete feedback loop. Tivo shows you a program; you watch the program; if you don't like the program you delete the program or you give it a thumbs down and it notices that. In the Ringo world, and I loved Ringo -- I mean half the music in my collection came out of Ringo, but only because I'm a really, really dedicated info whore, an obsessive compulsive manic hand-washer when it comes to finding new music. And, in the Ringo world, what you had to do was remember all the bands that you liked, remember how you felt about a bunch of things. Tell the software about that, wait for the software to bring you back a list of recommendations, go out and buy the music, remember how you feel about the music, and then tell the software about it. And that feedback loop has fundamentally broken. Even on Amazon it's fundamentally broken. People who like this book also like this book. Well, then you have to buy the book. And you have to read the book and then you have to accurately report how you felt about the book and -- FEED: It's not even people that read this book, it's people who bought this book. So we don't even know if they liked it or not. DOCTOROW: I think -- and this is a subject on which reasonable people can disagree -- but I think that implicit information that's gathered about how people use information is much more valuable than explicit. The great example of that is the Nielsen ratings. It used to be that the Nielsen ratings were generated with journals. And the average American household according to Nielsen watched Masterpiece Theater and Sesame Street. Then they switched to set top boxes so they could actually monitor what people watched. And what they discovered was that the average Nielsen family watches naked midget wrestling, "My Twelve-Year-Old Daughter Flaunts Her Breasts" on Jerry Springer, and America's Funniest Botched Cosmetic Surgery. The rise of trash TV over the last decade can in large part be accounted for by the fact that Nielsen now rates shows based on what people actually watch, as opposed to asking people what shows they like. The problem with that is that it's a closed loop. No one can create a show and insert it into the Nielsen ratings, so what you end up with in that system is just more of the same. But with folders, you end up with more difference because the pool of documents is nearly infinite. I mean, anyone can create a new document and insert it into the network just by reading it, and when they do that, their software observes that they've read it, and observes how they reacted to it. And if anyone comes along and says, "What's good today?" the software will tell them about that document. FEED: This is the thing that I've been thinking about for a long time. Do systems like this lead towards kind of a more diverse gene pool of ideas or -- DOCTOROW: -- or do they lead to The Daily Me. I don't think it's The Daily Me. I think The Daily Me is predicated on the idea that we'll someday have great artificial intelligence that can pick out among all the documents, just the ones that you'll be interested in. And the way that they'll do that is by finding out what your tastes are and reinforcing it. I'm a bit of a punter when it comes to AI. I dropped out of an AI program, but never finished one. [laughs] When it comes to AI, I am largely of the opinion that great AI consists of aggregated human decisions, not machine generated decisions. And that, in a great AI -- that great AI that's composed of aggregated human decisions, will by definition, be diverse. If what you're considering as your primary relevance criteria is "who else likes this?" not "what does this look like?" then you'll get a significantly more diverse pool of stuff. I go out and I read a book and I want to read another book just like it, because I want to find out more about the subject -- that's a useful thing. And chances are, if you and I are very similar and we're reading a lot of similar books that most of the stuff that we'll recommend to each other is similar. But no two people are alike. You and I have a bunch of interests in common that we've discovered, but I'm sure that we have some interests that are diverse -- that if we were to share with each other, we would find that some of them actually were things we didn't know about but we'd be interested in. FEED: I totally agree with you, and I'm also fairly optimistic about this stuff, but I guess the counterargument is that network effects start to happen, because certain ideas or documents or songs get recommended by other people and they kind of take off from below, with a positive feedback loop. And by the end you end up with everybody or a huge amount of people agreeing that this one document is completely interesting because they've all been kind of referring it on to their friends or neighbors in the space. DOCTOROW: But it's not a zero sum game. FEED: Right. DOCTOROW: This notion of "mind share" as something like "audience share" is ridiculous. Think about the "All Your Base Are Belong To Us" phenomenon. I remember when All Your Base came out. You know, the first I saw of it was Tim O'Reilly has strung up a 802.11 base station through the hotel that the peer to peer conference was in. So, we were sitting around at breakfast and I fired up my browser and I followed a link to All Your Base. And I showed it to the people at my table, and all day we were talking about All Your Base. But we were also going to amazing panels, talking about interesting stuff, getting on with our lives and so on. And it's not as if All Your Base has blotted out everything else -- it wasn't an All Your Base mania. It wasn't like we became hard-core Deadheads following the All Your Base crew around from city to city. FEED: Hey, it's still going [laughs] -- it might still do that. DOCTOROW: There aren't that many phenomena like that anymore. And more and more they're created like 'N Sync. And they come and go. It seems like created mania has a much shorter life span. Much more perishable than the real thing. I mean, there are still Deadheads, and there isn't The Dead anymore.
FEED: Tell me about your idea for new ways of compensating people who have a lot of influence in the system. DOCTOROW: We believe that all networks are very 80/20. Most people are not voracious consumers of information. Most people really do want to just find something that is generally a good source of information and consume, not go out and find new stuff and bring it in. So most people want to find the magazine that has the best editorial slant for them and read that as opposed to seeking out from a million diverse sources all the information they can and compiling their own magazine. That there are people out there who want to compose their own magazine, who want to gather their own information and make a lot of decisions about it -- it can't be denied. And those people in a network that's based on recommendations will end up with a great deal of influence because they'll be the source of all the new stuff. Those people -- the sources for all the new stuff -- could presumably, with our cooperation, figure out ways to peddle [their] influence, essentially. The interesting thing about this is that since the system is blind, since all the system knows is that, first, there is a user who generates good decisions for you, and secondly, that being in alignment as a temporary thing which is checked and rechecked based on your behaviors -- then if someone out there is being paid to make recommendations for things that are unworthy, the system observes that. And so someone being paid to game the system is indistinguishable from your taste and their taste diverging, or from them going insane, or from them using a random number generator, or what have you. As long as they're generating good recommendations, you'll be part of their audience. When they stop generating good recommendations, you're not part of their audience anymore. So those people could presumably peddle their influence, and only do so honestly, or at least consistently to people who want to get their media in front of a large number of people. If they start altering their opinions to make more money, they'll lose their audience. FEED: Obviously, there's an objection that people will often make, a first-blush response where they'll say, "Oh, this is worse than product placement. I mean, here you have people buying the influence and other consumers don't realize that these people are being paid." But I think that kind of critique just doesn't understand how these kinds of self-regulated systems really work. Because there is that extra level of feedback that's constantly going in and evaluating the recommendations. You can only game the system for a short period of time, and then you lose the influence that you've earned. DOCTOROW: It's like eBay. If you cheat someone on eBay, if you have one negative comment, if you have 150 positive feedbacks leading up to last week and two feedbacks that say, "This guy is a rip-off artist, and he charged me $1,000 and shipped me a box of human excrement" -- FEED: [laughs] And that's not what you were expecting. DOCTOROW: And that's not what I ordered -- because that would be a violation of eBay's terms and services, then no one would be willing to buy things from you anymore because you've suddenly gone insane. Right? And for us, that honesty metric is not put in there to keep people from gaming the system, it's put in there because it's an acknowledgment that our tastes converge and diverge over time. Just because you were a good recommender for me last week doesn't necessarily mean you'll be a good recommender for me this week. FEED: Final question, do you actually drink the cola? DOCTOROW: I did drink the cola. I drink less of the cola than I did before. I was never a giant cola drinker anyway. FEED: [laughs] But you had to for team spirit? DOCTOROW: Yeah. Well, no. I've got the classic entrepreneur's thing, I've got terrible acid reflux. FEED: [laughs] Oh, God! DOCTOROW: From living on the road and eating bad meals and not sleeping enough. And having lots and lots of stress. So, there's a limit to how much incredibly sweet caffeinated acidic beverage I can consume in a given day anyway, and tend to reserve that for more efficient caffeine delivery systems like giant frozen mochas. But the cola's pretty fucking good. I mean, I could drink the cola. I don't drink as much of it as I did, but I could. When it first shipped in I was so totally stoked to have the cola that I drank an unbelievable amount of it over the course of a couple of weeks. But I drink a lot less of it now. It's pretty good cola, though. Back to Article
|