Technology Blog Valuations – Getting to be Real!

Update 2:Rafat has a comment to this post pointing out that by just looking at paidcontent.org I am doing the valuation of ContentNext a disservice. Of course he is quite right. ContentNext has other sites and also events. It is also true to say – although Rafat doesn’t – that valuation has many variables, including the quality of the people etc. Rafat is very good at what he does and he has a great team. So … fair point Rafat.

In my own defense, this post is not intended to be a scientific analysis of valuation. I did a “back of the envelope” comparison. I didn’t take into account any of the other sites that GigaOm has, or TechCrunch, or ReadWriteWeb. I also didn’t take into account TechCrunch events. All I was saying is, there are probably (by relative comparison of the web sites) some pretty valuable businesses out there right now. Hope you agree with that Rafat.

Update: Kara Swisher is speculating who’s next. Jeff Jarvis is hoping she’s wrong. Now there’s a Techmeme discussion.

The news that Rafat Ali’s ContentNext, owner of PaidContent.org, has been acquired by the UK’s Guardian Media Group got me to thinking. What does this mean for the valuations of other Tech blogs?

I did a quick back of the envelope calculation based on the numbers published and the Compete.com stats for June 2008.

By this math PaidContent.org got something like $139 per unique reader or $56 per visit as an acquisition price. Of course the Compete stats will not be wholly accurate (although Quantcast has Paidcontent.org at only 40,000 unique visitors, so Compete could be high)

Using Compete.com for 4 other significant technology Blog services we get some interesting numbers. TechCrunch should be valued at between $200 and $450m; GigaOm at between $46 and $55m; ReadWriteWeb at between $63 and $65m and Venturebeat between $50 and $53m. I’d say a merger between these 4 would bring them collectively up to about $350-500m even without the synergies and growth prospects of being one. I also looked at the search analytics data from Compete.com. 4,563 keywords for TechCrunch, 585 for GigaOm, 913 for ReadWriteWeb, 581 for Venturebeat and 363 for PaidContent.org Interesting indeed.

I am adding some graphics from Compete.com (all from this URL).






Disclosure: I am a shareholder in TechCrunch – along with Mike Arrington.

edgeio has announced the paid content platform. Distributed Commerce meets Web 2.0

Press release for download

Product Description given to Gnomedex Attendees

News from the Web on this:

Digg – here
Techmeme – here and here
ReadWrite web – here
TechCrunch – here
Venturebeat – here
Gnomedex – here
Jeff Jarvis – here
Dan Farber – ZDNet – here
Rob Hof at BusinessWeek – here
Bub.blicio.us – here
Mashable – here
Forecast Blog – here

It has been a great effort by the team at edgeio to get this launched. The company now has 6000 advertisers who will, by September, have 29 million classified ads in the system, and with the launch of “Classified Boards” in March recruited its first 1000 publishers. Now with “Transactional Classifieds” the number of publishers who can use edgeio will grow enormously. The vision of a Classified Ad Network for the Internet is one step closer.

Debating Andrew Keen tonight

I will be at the Campbell Barnes & Noble tonight debating author Andrew Keen and his debating partner Nick Carr.

Joining me will be Steve Gilmor and the moderator is Dan Farber of ZDNet. Here is an excerpt from Andrew’s Blog.

Nick Carr’s Big Switch by ZDNet‘s Andrew Keen — Nicholas Carr, amongst the most incisive and profound critic of information technology, will be in Silicon Valley tonight (7.00 pm), at Campbell’s Barnes and Noble bookstore in conversation with ZDNet honcho Dan Farber, Edgeio co-founder/CEO Keith Teare, and Gillmor Gang ringleader & Podtech exec Steve Gillmor, and me. While the event is ostensibly to discuss […]

TechCrunch20 web site goes live

picture-1.pngIt seems like a long time ago that Jason Calcannis and Mike Arrington announced their intention to host a conference for startups in the fall. It was at the DEMO conference at the start of the year. Well ….. TechCrunch20 is now real and today the web site went live with more details of the event.

“The format is simple: Twenty of the hottest new startups from around the world will announce and demo their products over a two day period at TechCrunch20. And they don’t pay a cent to do this. They will be selected to participate based on merit alone.”

The venue is the prestigious Palace Hotel on New Montgomery Street in San Francisco.

Although free to companies the event is not free for attendees. 2-day ticket prices, based on availability, are $1,995 through July 15, 2007 and $2,495 through September 10, 2007. There will be a limit on the numbers attending so get yours now.

Its true – TechCrunch and F***edCompany to do press release tonight.

Another deal I have had to keep under wraps….. Techcrunch and F***edCompany are set to release news of a merger tonight at around 9pm Pacific Time. Mike has blogged it early due to rumors circulating on the web. Techmeme has it here.

It is true. Heather Harde is new TechCrunch CEO

Mike ArringtonOm Malik has the scoop but this is something I have been keeping under my hat for several weeks. Heather Harde, former Fox Interactive Media executive, responsible for Mergers and Acquisitions, is to be the new CEO of TechCrunch.
———————————————————————-

UPDATE

Mike has now confirmed the news here.

———————————————————————-

Heather was a key figure in Fox’s strategy to acquire the key assets needed to turn itself into a major Internet presence. She is accomplished, charming and as sharp as a razor. It’s a major coup for Mike and TechCrunch to recruit her. I believe it augurs well for TechCrunch that Mike has decided that the continued growth of his amazing venture requires the services of a hands on senior operational executive.

Heather HardeUntil now I was the only other shareholder in TechCrunch besides Mike (a fact that dates back to our 2005 collaboration in Archimedes Ventures and edgeio). I can’t say how thrilled I am to welcome Heather into the company. I got to know her in my role at edgeio and have found her to be a straightforward, highly observant, passionate and focused person. I don’t know anybody with a bad word to say about her. The potential of TechCrunch is being realised every day. RSS subscribers, unique visitors, advertising revenues, job listings (edgeio hosts CrunchBoard) and every other measure shows this. But the potential is far greater still. Heather and Mike will be the team to realize the vision and take it to a new level. Congratulations to both.

Links

PaidContent
Techmeme

I’ve been “tagged”

As the title says I have been tagged by Dave Winer.

The rules say I now have to tell you 5 things you didn’t know about me and then tag five others.

So, here goes:

1. I am currently in St James, Cape Town, S Africa. It is a small area between Muizenberg and Fish Hoek (see map).
Cape Town Area
2. I own a home here – on Jacobs ladder.

3. My wife is South African – Gené McPherson. Born in Jo’burg. Her parents and one of her sisters live in Cape Town today. Gené was a co-founder of Cyberia [free subscription needed] (the worlds first Internet Café – London 1994. She was also VP Marketing at RealNames. She is now a Mom – and a great one.

4. We have a new son – born 4 November. Luke Graham Teare. This is the first time his grandparents have seen him and he them. Then again, it is pretty much the first time he has seen anything :-).

5. I am the oldest son of 5 brothers and a sister. Two of my brothers died (one an his first year and one at 37). So there are 3 brothers and a sister remaining. My Mom is still alive  and living in Scarborough, North Yorkshire. She is 72 and I am 52. My brother Brian is CTO at cscape.com, which I started in 1983.

I am tagging Ivan Pope; Gabe Rivera; Auren Hoffman; Michael Tanne and Richard MacManus

The Pareto Principle is nonsense.

In response to the current discussion on Techmeme and TailRank hipmojo writes that the Pareto principle is in play on the internet and that no matter how much we want it to be otherwise 80% of online advertising will go to 20% of the web sites.

When the dust settles, the top 20% of websites will get 80% of ad revenues. It’s that simple. Portals might change in shape, form or nature, but whatever they represent loosely will still get the bulk of revenues and traffic.

With respect, that is nonsense. Since the advent of Google Adsense the shape of internet advertising spend has mirrored the flattening of traffic I speak of on the edgeio blog. Almost half of Google’s revenue comes from Adsense. And about 75% of the dollars earned through Adsense stay with the publishers whose sites the ads run on. Clearly the lions share of the money spent through Google is shared about 50-50 with the publishers in the “foothills”.

It may be worth listening to the Google Earnings calls on Earningscast to validate this.

That is why Google talks so much about “inventory”. That is, traffic from outside google.com. The size and cost of this inventory is a major variable and the need to grow it helps us to understand deals like the one with YouTube.

If you roll the clock back to the pre-Adsense days when DoubleClick ruled, and online advertising was only going to large sites, it is a huge change in monetization and traffic flows. Give Google credit for this.

One of the things my piece argues is that there is a new trend on top of this established one – publisher monetization of their own content through direct relationships to advertisers (job boards, sponsorships and Techmeme like ad units being examples).

Sure the portals are still big but the collective foothills are as big now, and will be a lot bigger in the future.

De-portalization and Internet revenues

Last week Fred Wilson did a post on a phenomena he called de-portalization. I think he is right on the money.

I just posted a piece on the edgeio blog that picks up on that theme and discusses the consequences of the trend.

The top 10 consequences are:

1. The revenue growth that has characterized the Internet since 1994 will continue. But more and more of the revenue will be made in the foothills, not the mountains.
2. If the major destination sites want to participate in it they will need to find a way to be involved in the traffic that inhabits the foothills.
3. Widgets are a symptom of this need to embed yourself in the distributed traffic of the foothills.
4. Portals that try to widgetize the foothills will do less well than those who truly embrace distributed content, but better than those who ignore the trends.
5. Every pair of eyeballs in the foothills will have many competing advertisers looking to connect with them. Publishers will benefit from this.
6. Because of this competition the dollar value of the traffic that is in the foothills will be (already is) vastly more than a generic ad platform like Google Adsense or Yahoo’s Panama can realize. Techcrunch ($180,000 last month according to the SF Chronicle) is an example of how much more money a publisher who sells advertising and listings to target advertisers can make than when in the hands of an advertiser focused middleman like Google.
7. Publisher driven revenue models will increasingly replace middlemen. There will be no successful advertiser driven models in the foothills, only publisher centric models. Successful platform vendors will put the publisher at the center of the world in a sellers market for eyeballs. There will be more publishers able to make $180,000 a month.
8. Portals will need to evolve into platform companies in order to participate in a huge growth of Internet revenues. Service to publishers will be a huge part of this. Otherwise they will end up like Infospace, or maybe Infoseek. Relics of the past.
9. Search however will become more important as content becomes more distributed. Yet it will command less and less a proportion of the growing Internet traffic.
10. Smart companies will (a) help content find traffic by enabling its distribution. (b) help users find content that is widely dispersed by providing great search. (c) help the publishers in the rising foothills maximize the value of their publications.

Discussion

Kevin Burton
Techmeme
Mike Arrington
Syntagma
Dan Farber at ZDNet
Mark Evans
Fred Wilson
Ivan Pope at Snipperoo
Tech Tailrank
Collaborative Thinking
David Black
Surfing the Chaos
Ben Griffiths
Dave Winer (great pics)
Kosso’s Braingarden
Dizzy Thinks
Mark Evans

Is scraping and crawling stealing?

A spat has blown up over the weekend regarding Oodle and Vast.com “scraping” content from 3rd party sites and re-purposing it inside their environments. This essay is my reaction to the spat. As a founder of edgeio I clearly have an interest in the answer to the question. edgeio does not scrape or crawl. All of its content is permission based (published using the “listing” tag; uploaded directly into edgeio OR published on edgeio directly to a personal listings blog that we host).

However, there is more at stake here than competitive issues between edgeio on the one hand and Vast/Oodle on the other. The wider issue is whether or not scraping (which is very like crawling and indexing except it reads displayed content not files) constitutes stealing of data.

The following is taken from an article on ClickZ:

“This is called stealing content…there’s no advantage to me to have them steal,” commented Laurel Touby, founder and CEO of media industry site mediabistro.com, upon learning that Vast.com had linked from its search results to full mediabistro.com job listings pages, even though those pages require registration when accessed on the mediabistro.com site.

Vast.com CEO Naval Ravikant said Vast.com’s crawlers do not automatically register or login to sites, so they must have found passage through the mediabistro.com system via a legitimate entryway.

So let’s try and address this broader issue. Firstly this is a new discussion. Nobody accuses Google of stealing the data that is in it’s index (except book publishers of course). Why not? Well, because Google primarily indexes the “visible” web. That is to say, sites that are linked to from other sites and are not behind a password protection system of any kind; and even then it respects directives in a file called robots.txt where a publisher can ask not to be indexed. And secondly, Google does not display entire documents (although its cache is getting very close to doing so and may give rise to similar discussions in future). Rather it points to the original source for reading/viewing the content. Thus the business model of the original publisher is left in tact.

With the emergence of vertical search aggregators, especially in the commerce space, the issue of ownership and permission become far more pronounced. Why? Because the data represents an inventory, and often an “invisible” web inventory – that is to say, behind a password protected site. The effort to aggregate that inventory into a central marketplace is done without permission of the owner of the inventory. Whether password protected or not this is going to give rise to disputes like the one between Craigslist and Oodle a little while ago.

There is no need to invent new means of dealing with this. But there is a need for good behavior. Crawlers always should respect robots.txt. Scrapers are different. The spiders can read displayed contentl directly and do not crawl the file system. As such they can bypass robots.txt. If scrapers respected robots.txt then a publisher could effectively put its content out of the reach of the crawlers. It isn’t clear at this point whether the scrapers do respect robots.txt files. A better solution is to use RSS for syndication rather than crawling and scraping. More on this below.

The second issue is whether the item level link from a result set points to the original source or to a hosted copy of the original. Oodle and Googlebase had a difference of opinion about this issue. Content publishers will care what the answer is.

The third issue with scraping is a quality issue. On its home page Vast.com states:

All results are automatically extracted by crawling the web
Vast.com cannot guarantee the accuracy or availability of the results

And the oodle blog notes that its index:

…only includes listings that are fresh and relevant: we keep track of all the listings we’ve seen and auto-expire old ones that are still online and exclude things that look like listings but aren’t (reviews, spam, etc.).

The issue here is twofold. To stay current with a live inventory of listings is hard. To even attempt to do so creates a need to crawl and index very aggressiveley, and the results are often not good. Craigslist’s gripe with Oodle was at least in part driven by its experience with Oodle’s crawlers. They were apparently polling and sucking content very aggressiveley and needed to in order to stay current. If you do not poll aggressiveley your index gets even more out of sync with the original source than it already is.

It seems to me that RSS is a custom made solution to these problems. Scraping and Crawling are the wrong tools.

If publishers who wish their content to be syndicated to a third party publish an RSS feed and the third party consumes the feed we have a) a permission based syndication system; b) a real time ability to update inventory. edgeio made the decision to follow the Craigslist model whereby a listing is explicitly requested by a publisher. Publishers of listings, from your Mom to a large site, are a community, made up of many smaller communities. A central listings service (CLS) should be a service to that community. Permission based, real-time publishing, via RSS, is the right tool for the job. Over time this is a highly scalable solution. Publishers can opt in and out at will.

I predict many more accusations of stealing insofar as the industry continues to mine the “invisible” web, and the specialist web, via scraping and crawling.

And finally, edgeio publishes RSS feeds of every item (either individual items, or our entire inventory). Oodle and Vast are not competitors, but distribution partners. Our data is more valuable insofar as more people see it. That will happen if the data is placed in more environments. So, take it, for free. But please, do not scrape it or crawl it. Just read the RSS feeds. That is why we have them.

Last word goes to And Beal’s The Marketing Pilgrim blog. After reading the ClickZ piece he says:

Ouch!

Certainly Vast is not alone in convincing classified sites that they’re helping them bring new visitors, but if the classified search engines are to see a bright future, they’ll need to secure strong partnerships with their partner sites.

My emphasis!

Update:

Tech.memeorandum link for this subject.