![]()
FEED: What has changed since you started Alexa in your perception of what the overall Web looks like? You probably know more about the overall distribution of things -- because of what Alexa does -- than just about anyone. Are you surprised by what it looks like?
BREWSTER KAHLE: Continuously amazed, surprised, bewildered by what's going on -- it's completely fun and interesting to just be in the soup. By being here at Alexa, we've got the biggest collection of what the current Web looks like now, and in the past, as well as where millions of people are surfing. We don't know who's who and we don't care, but we can get kind of an idea of how the use of the Net evolving, as opposed to what's just on the Net. So some of the things that I find really astounding is this graph that came out of the Alexa Research Group, which is a graph of the number of different Web sites and then what percentage of traffic is going to those top-end Web sites. And it's amazingly linear.
[Jumping out of his chair to draw on the whiteboard.] So if you put it on a semi-log graph where on the x-axis there's 10 Web sites, 100 Web sites, 1,000 Web sites, 10,000 Web sites, 100,000 -- and you put percentage of all traffic worldwide on the y-axis. (We're using the 500,000 people that use Alexa on a day-to-day basis.) If you have 20 percent, 40 percent, 60 percent, 80 percent, 100 percent -- it's amazingly linear. So the top ten Web sites get 20 percent of the traffic on the Net. The top 100 get 40 percent. The top 1,000 get 60 percent. The top 10,000 get 80 percent and then it tails off, because by this way of counting what a Web site is there are about 7 million Web sites. But it's almost linear. Just astounding.
Now why is that interesting? I think there's three interesting things about the graph. First, there's the concentration. Second, there's the long tail. And third, that it's flat. Now, why those three? The top ten Web sites -- by controlling 20 percent of what everybody on the Net sees -- is an astounding concentration of power that we probably haven't seen since the Roman Empire, in the sense that this is worldwide -- we have as many people using our panel in Japan as MediaMetrics has in the United States. So worldwide people are looking at 10 Web sites. Those companies have an astounding ability to put things in front of people that can influence them -- whether they do or not, who knows -- so it's not like there's just CBS, NBC, and ABC in the United States. These top ten are worldwide. So it's an astounding concentration. And, you know, you can extend that group -- maybe not just the top ten, but the top hundred is forty percent -- that's a lot. There's a tremendous concentration.
Then there's the long tail. That's a sign that there are people who do make niches out of things -- the top one millionth Web site might be the absolute best Brazilian stamp collector site. That there are niche players that are still important, that are down in the hundred thousand to million ranking of Web sites, which is the kind of the original dream of the Web -- that, if you have something good to say, you'll find your audience, and they'll find you.
FEED: That's the story of FEED!
KAHLE: (laughs) Exactly. FEED is an example of something that probably couldn't have existed in the land of the print distribution nightmare. So the tail is astoundingly long. So we're not in a world where you have to be in the top ten or you lose completely. The other point is that the line is flat -- now, why is that interesting? It means that there's class mobility, that if you are the 100,000th Web site there's nothing really startling to stop you from becoming the top 100th. You're just have to be better to more people. So by having a flat curve means that there's class mobility, I think. And we've studied Web sites that have broken into the top 100. And where did they come from? There are some portals in foreign countries that are breaking into the big time. So -- Italy and Korea have three portals break into the top 100 in the last eight months -- coming from pretty far down. So welcome to the Net, Italy. We can see them turning on, and we can see when different countries really start to rival other countries in terms of their Net penetration.
FEED: One of the things that's always been amazing about Alexa, and I think that people are increasingly realizing the power, is not just that you're able to see all this information about traffic patterns but that information slightly processed is being fed back to the users.
KAHLE: It's a big give and take.
FEED: And is there more stuff you'd want to do in that way? In a way, that's kind of what cities do: They say, "Look, there's this pattern over the last 50 or 100 years of all the artists tending to move to this neighborhood." What a city does is make that pattern visible to people who are visiting for the first time -- and that pattern changes their behavior accordingly because they either want to be around the artists or they want to avoid them. Can the Web do more of that sort of thing?
KAHLE: Oh, yes. Auto-cataloging is the only way to scale. It costs forty-five dollars to catalog a book in terms of just taking author, title, when it was copyrighted, what subject index should it go into. Forty-five dollars! The Web is about twenty million different sites in terms of content areas that sort of make sense to catalog -- and it's growing at an astounding rate. That would mean if you tried to catalog it by hand and tried to scale it the size of the Net, that you'd have to spend almost a billion dollars to catalog the Web today. And a year from now, you'll have to spend another billion dollars. And it won't be up to date. So we needed new techniques. The search engine work that was done in the sixties by Gerald Saltman and Mike Lesk -- phenomenal work. It's been doing great. But if we're really going to get an idea of what the Net looks like -- when every suburb of Denpasar is on the Web and that their soccer team schedules are on the Net, and you're trying to find where the game is, how are you going to find it by typing a few key words? You're not. You're going to have to have these tools that go and say: "these are the suburbs of Denpasar on our Web sites." It has to be automated, otherwise my worst nightmare is it all becomes five thousand channels of nothing on the Web.
FEED: Do you think that Yahoo gets this?
KAHLE: I like Yahoo a lot. I think Yahoo has done a consistently great job of packaging the Web experience and making it accessible to everybody. And they've stayed pure to the Web in many of the aspects that I love about the Web, like directing people off-site. It doesn't feel like I'm in a cage when I'm on Yahoo. They do inject technology into their world, but they seem to be mostly a portal, a medium. And as these technologies mature, I'm sure they'll start leveraging it if it's useful to people. But they don't seem to quite take advantage of trying to trap you in Yahoo world as much as one could imagine.
FEED: What are you doing to support all this? What's the infrastructure now that you have?
KAHLE: Most of the machines are actually in this building. All the service machines are in a colocation facility. So that those are more reliable in the sense that you don't get hit by the blackouts that can knock out the couple connections that we have into this building. So the data mining goes on here, but the actual service is operated out of the colocation facilities. And it's just banks and banks and banks of machines.
We now have about thirty terabytes of archival material that we data mine. And that's 1.5 times the size of all of the books in the Library of Congress. So we're now at an interesting point, we're now beyond the largest collection of information ever accumulated by humans. We've gotten somewhere! [laughs] We use as our original inspiration the Library of Alexandria. Because they were the first people that tried to collect it all. And they started to actually understand the intersection between completely different self-consistent belief systems. They knew what the Egyptians, Romans and Greeks, Hebrews, Hittites, Sumerians, Babylonians -- they knew the mythologies, because they had it all in one place. And they had the scholars to stare at it and try to make the disjunctions conjunctions and start to get an idea of what humans are. The dream is that we're in another one of those positions. They got up to five hundred thousand books. Of course, they were scrolls. The Library of Congress -- the largest library now -- is seventeen million. Only thirty four times more than what we had in 300 B.C. It indicates that the technology hasn't scaled. But now we've broken through into a new technology that allows us to bypass the Library of Congress in very little time, and the sky's the limit. What can we discover about ourselves as a species? As different peoples? Are we couch potatoes or do we actually have independent will? Do we have interests that go beyond the fifteen demographics of slotted marketing hell? And what we're finding is, people are interesting, diverse and peculiar. They are constantly looking for new things that are of interest to them.
FEED: How do you measure that?
KAHLE: The number of different Web sites people go to. It's the long tail, and it's a growing tail -- people don't just find the five Web sites in general. People think they do -- if you ask people, what are the five Web sites you go to, there'll be five of them on that list 'cause that's kind of all you can remember, plus or minus two or something. But, in general, people do stray around. And especially if we can keep the diversity and the quality of the Internet in the public sphere, we can develop a really much more interesting culture -- just because there's more available for people to build on and grow from.
FEED: Is there anything in the basic architecture of the Web that's missing that you wish had been put in?
KAHLE: Oh yeah, absolutely.
FEED: What would be at the top of that list?
KAHLE: A business model -- at a small-scale publisher's site. Minitel is the system in France that was absolutely fantastic in the early eighties, where they put in all these terminals. They were trying to build these terminals into five to six million homes in France, and they made it really drop-dead easy to make a service. You could basically take an IBM PC, you get this special card from Minitel, and you could be a server. There were sixteen thousand servers in 1988. And if people went to those servers, they got charged, kind of like a 900 number. But the prices often can be quite low. The popular sites made money. And when we came out with ways that...the Web, it all came out of the wrong places. And the people that did have an ability to put in a business model didn't extend it to the Web. And a lot of the economics had turned into something quite bizarre -- in which the advertising world tends to benefit the large-scale publishers. And you tend to have a collapse of the number of those publishers over time, based on the dynamics of ad sales and the like. But the royalty system of books has preserved a diversity of book publishing that is unparalleled in magazines, newspapers, video.
FEED: Is it too late to insert that somehow?
KAHLE: No, but it will be difficult. It will have to be seen as in the interest of the big people. But then I think you can cause another level of renaissance. But with the invention of the book, the royalty structure took till 1600. It took a hundred and fifty years, you get all these complaints of Voltaire not making money or Cervantes dying a pauper even though he published the most popular book of his time.
FEED: Generally the venture capitalists don't like it when you tell them that it's about a hundred and fifty year cycle that we have to go through until the model works.
KAHLE: I hoped we could have learned from it and done things a little bit faster. But we screwed up. We didn't make it easy for small-scale publishers to get paid. And I think the right place to tax is the ISPs. Because the ISPs provide the end-user access for a fee. They've got a billing relationship and if we set up something like an ASCAP, public clearance center, that would allow the distribution of some percentage of what's collected from the users back to the content, that makes it such that people want to be online. Right now, people are paying all of their money to use ISPs but the ISPs don't have to pay for the content.
FEED: So how would you get to that? Would you regulate it or would you just start blocking, would you organize all the content sites and block ISPs that didn't participate?
KAHLE: I don't know. And it's gruesome. The development of ASCAP is a union-style story -- there were, you know, windows being broken, arms being broken, it was a bad news sort of situation to get it going. It's usually a lot easier to do it early on. AOL is in probably the best position to start it up. But why should anybody be first? If the content is free, then why pay for it? In fact, AOL goes the next step of, "Shouldn't they pay us to get to our users?" So I don't think it's going to come from them.
FEED: What can you tell me about what you're working on now?
KAHLE: Extending Alexa into the realm of helping people with products. So if they're shopping on the Net or doing information on a product, so instead of just information about Web sites and Web pages, extend it so that it's information about products that are on Web sites and on Web pages. Because that's a very sticky feature: If we can help save you money or have you make a better purchase faster by using our free widget, then people will like us. And so we've been spending a lot of time trying to understand what are our products and where are they. So we just did a data mining pass, we've got this parallel cluster with the archive of the whole Net. Looked for all the ISBN numbers all over the Net, and there were about 550 million unique pages in the collection we were looking over. Those are unique pages. There were about 56 million instances of an ISBN number, and if 56 million pages had some ISBN number, a vast majority of those were either Amazon on pages or pages that point to Amazon. But there are about 10 million other ISBN numbers all over the Net. We can help people when they're on those pages be able to find whether it's available on Amazon, Barnes & Noble -- you know, books anywhere -- and compare the prices. So the idea of Alexa is you can basically find information about the products on the Web page. We're in alpha-test now, and we're going to be launching much more actively in October, November, December.
FEED: Are you having fun being part of Amazon?
KAHLE: Love it. They're really good people. At some point you have to love your work and you have to love your coworkers. And the people at Amazon -- they have this gonzo, go for it, you know, "how hard could it be?" attitude. And we love being here in the Presidio. A setting helps. If you're going to think big thoughts and new thoughts, putting your company in a national park in the middle of San Francisco is going to make you think.
Share your thoughts on Alexa,
and whether the Web really can help us
make discoveries about ourselves as a species, in the Loop.
© FEED Inc. 2000