« Podcasts vs. Television | Main | Martin Sargent is Funny. »

September 03, 2007

Who's Counting?

Digg and Revision3 are not my first Internet businesses, but they are entirely different from my previous ventures in how success is measured by everyone around us.   This is the first time I've been hostage to outside metrics companies, with which I usually have no relationship, for telling others how well I'm doing.  That annoys me, of course, but "welcome to the web," analysts, reporters, and my investors tell me.

Equinix is a public company, so it was well established (and reinforced with SOX) that outside consulting companies, acting as independent auditors, would determine the accuracy of our reports.  Even as a private company, I can still hire consulting firms to come in and do the same thing where revenue or valuation is concerned...Still, for the most important metric of all, usage (generally viewed through page views and unique visitors), there are no audit options.  I have to rely on companies who have performed no technical audit of what is going on, providing their viewpoints through flashy websites and questionable panel methods.  Doesn't anyone else see this as strange?

Well, before I get too deep into that, remember the purpose of publishing these numbers is ultimately two fold:  One, to tell the public how well we're doing.  Two, to tell the advertisers how many people will see their ads, to help establish a price/market for our ad inventory.

The funny thing about the first one is that page views and unique visitors aren't a perfect view into success, because they don't mean the same thing for every site.  A blog with 20,000 visitors may consider themselves hyper-successful; it really depends on the site's purpose and the niche quality of the site.  For example, if I were to do e-commerce selling rare vacuum tubes, and most of the specialists who order those things used my web site, I'd have penetrated my market perfectly.  If Digg was only a tech site, my market would be considerably smaller than it is, etc.  Nevertheless, it is true that measuring change over time, or in our case, growth quarter over quarter (month over month tends to be a bit too granular because of seasonal changes) is a pretty good indicator of success.

The second issue is typically measured through page views, though it's not exactly what advertisers want.  It is true that the greater the page views the greater the ad impressions... Though what advertisers ultimately want to know is how many ad impressions they will push, and even more important, how well targeted are those ads.  You can't really get that data from page views, you have to do more direct research.  A great way is via the ad networks themselves, who know better than anyone how many ads are being served.  Honestly, I find when I speak with advertisers they trust themselves the most, do test campaigns, measure results, and know for certain the effect of their ads on a particular audience.

So now we're back to the issue of how these statistics are measured.  I've read countless articles over the past two years on the problem with external panel-based measurement, such as BusinessWeek and Fortune.  We've all heard about it, the trust in these metric companies is gone.  At this point, I've had conversations with just about every major media company online, and they all agree that panel-based measurement doesn't work, particularly for niche-targeted web sites.  The problem is fundamental:  Let's say you're a targeted, niche site (this is not about Digg, this is just a made up example).  If 5% of the general population, tens of millions of people, are your audience, and a typical panel only has 5% of it represented by your audience, you know you'll show less than 5% of their result.  Further, since it's such a small portion of the sample base, that 5% isn't really a diverse representation of those tens of millions.  You need a panel designed for your site, which is a fairly subjective number with an ever-changing audience.  It's simply not the right way to do it.  It can measure what your demographic is fairly well, which is important, but it can't really measure usage.

The issue is with credibility.  ComScore and Nielsen/Netratings, largely for historical reasons, are assigned a certain trust level regardless of the open outcry of the failure of panel-based reporting.  The alternatives are young or flawed themselves... One metric company I spoke with claimed to be more accurate through the use of sampling aggregate ISP backbone pipes.  I could go into the technical and statistical reasons why this sounds good on paper but doesn't work, but during their presentation to us, I didn't need to because they revealed they didn't target any business traffic, so they missed people surfing the web at work.  These guys want credibility through an alternative approach, which is awesome, but they can't break through the age and tradition of using comScore.  Also, considering so many people hit Digg from work, I wasn't thrilled that this demographic wasn't important to them.

These panel methods or sampling methods by definition would need panels that fit the profile of your userbase, or would have to somehow adjust for your niche in their modeling.  To date, I haven't seen anyone come close to accuracy using these tricks, so we can't trust these numbers when doing our own market research.  When I see a panel repeatedly match the real numbers, then I'll reconsider my opinion.

The alternative approach, championed by Quantcast, is to use panel-based methods for the mass, and for those who subscribe (for free, I might add), they'll measure using the accurate pixel-based method (where they put a pixel on each page that they can track directly).  It's not a bad approach, but having any panel based results, then setting them side by side with the direct measurements, drops the credibility of the direct measurements.  Still, I like these guys as far as external services go.

Then, of course, there are the infamous toolbar-based metric systems.  About the only websites these are good for are the ones that everyone on earth uses, because very few Digg users are the type of people who would want someone watching what they were doing.  The more popular Digg becomes, the less likely these are correct.   Maybe... Unless going more mainstream means more users willing to use toolbars, but I doubt it.

Anyway, the absolute best and most accurate way to know how well a website is doing is to just plug right in.  Just as auditors come into the offices of publicly traded companies to check everyone's accounting, so could the same auditors come in and compare WebSideStory or Omniture statistics, ensure they are correctly configured, and give their stamp of approval.  What will it take to move to this more accurate method of reporting?  It's already happening... A number of us websites are starting to work together to plan these audits, because we're tired of inaccurate numbers.

For example, Digg did 18.5 million unique visitors in July, 2007, as measured by WSS.  Remember, WSS uses a pixel and is a third-party service, so we're not talking about "internal logs."  This also doesn't take into account RSS feeds (which are important for measuring success, but ignored by most making comparisons) or the Digg buttons syndicated all over the planet.  In an article in Fortune making similar points to this blog entry, they even got it wrong, citing the number as 10.5 million.  [Editor's note:  10.5M is correct for U.S. visitors only, my bad.]  ComScore still says 4 million.  The Digg employees look at each other and just shake our heads.  Essentially, the numbers being traded around about how many people visit Digg is are completely wrong.  Stop the insanity.

At least with websites like Digg there is some common metric as defined by a browser-based ad impression.  For Revision3, anyone looking at the website is missing the point:  Revision3's success is not measured by how well the website is doing, but rather how many ad impressions are viewed when people watch the episodes.  70% or more of the people watching Revision3's shows do not watch them on the website (something the folks there are working on making more attractive, by the way)... but rather receive the shows via RSS, such as with iTunes, and thus skip the web all together.  How do we measure these impressions?  Right now, the common method is to measure full downloads (Revision3 tracks over 1M downloads a month, for example) that repeat.  The RSS readers stop downloading if you don't watch, so there is some trust in a repeat download.

PodTrac is a great start, and Revision3 continues to experiment with them, as they also measure downloads and views.  However, like panel-based measurements of websites, the devil is in the details.  Some media players download files in chunks at a time, and thus show up as multiple "hits" on a download server.  Revision3 takes this into account by dividing the total amount of bits downloaded by the file sizes in question.  Most tracking services measure based on how often you use a particular URL to establish a download, built into the enclosure, but this will increase artificially with these weird players.  I'm not sure if PodTrac takes this into account, it may, but I'll let the wizards at Revision3 do the analysis themselves by comparing the numbers.

[Recently, Revision3's "The GigaOM Show" interviewed execs from Quantcast and Hitwise.  Interesting to hear it from their own perspectives.]

Eventually, the promise of some of the technologies involved is that regardless of how you watch it, there can be some tracking capabilities built within the media itself.  Don't kid yourself, however, with the myriad of formats and players, that day hasn't come yet.

Like Digg, Revision3 and other video companies could do well with consulting companies doing third party audits.  This way, everyone would conform to the same standard, and no one would second-guess the numbers. 

Panel-based measure is a quick fix for an impatient audience.  While we all want an automated and universal way to deal with this problem, the truth, if you want it, is going to take more work.  The good news is, I can tell you directly:  This is work we're willing to do.

[Editor's Note:  For a great summary of various web analytics packages and their various methods of tracking, as well as a great analysis and comparison of the packages, check out Jim Sterne's 2007 Web Analytics Shootout.]

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/1038268/21305637

Listed below are links to weblogs that reference Who's Counting?:

Comments

Jay, a question, one that I'm not sure you will be willing to answer.

What is the current best solution for handling web analytics for a site like Digg. By that I mean, a site that has millions of regular visitors and has RSS feeds coming off it.

We currently use WebSideStory, a division of Visual Sciences (http://www.websidestory.com/products/web-analytics/hbx-analytics/overview.html). Pretty much we like those guys and Omniture (http://www.omniture.com) as pixel-based analytics.

Part of your decision on which one of these guys makes sense is how they offer visualization of the data, and how much control you want of the reports. You'll have to evaluate both to make a decision.

As for RSS, that's a completely different animal. Most sites I know that are serious about monetizing RSS use someone like Feedburner who provide a dashboard for external stats.

Anyway, I hope that answers your question.

Since pixel-based analytics fail to capture proper per-user-usage for AJAX-heavy sites, I have a feeling that some major innovation will be forthcoming in this area. Any ideas of technologies on the horizon for this?

Robert: I would think the innovation would have to come from the developers of the sites themselves, probably to a spec that was defined by ... er ... _someone_ (a standards organization for online metrics?)

I tend attach a pixel-based tracker to an anchor point on the page as if it were an ad, and then govern those points with business logic. I'd love to have someone define what what logic can and should be, but the precise technical execution would then have to be up to me.

In other words: on a page that doesn't need to refresh, if you swap out ads (which correlate to page views), how often do you do it, and do any conditions need to be met first? If everyone standardized that kinda thing, then we could be sure my hits meant the same as yours.

This is the kind of thing that makes me want to give up coding altogether :)

It is true that a simple pixel test can undercount page views, since certain types of page views (even ones that could generate ad impressions) would not load a pixel by default. Better to undercount than overcount.

The question is a subjective one, or at least a good one for advertisers: Which page views matter to count? For the ones that do (generally behind a click of some kind) there are work-arounds to ensure they are properly counted.

Now, to us, all page views, even AJAXy-fresh ones, matter. This is because we are trying to track how much an unique individual hangs out on the site. They matter more to us, however, than the public or the advertisers.

As for AJAX-friendly specific analytics, I don't know of any yet.

I'm guessing Hitwise is the company you are referring to as "One metric company I spoke with claimed to be more accurate through the use of sampling aggregate ISP backbone pipes."

One thing that I like about Hitwise is they don't extrapolate their data: their purpose is measuring the relative rank of websites, by industry. So they can measure market share, and don't try to tell me how many visitors I have on my sites.

Please make the font darker! I'd really like to know what you are writing there, but I just can't see a damned thing :(

Ok, for now I'll just print it out...

I suspect the correct answer is closer to comScore's 4 million and further from WSS's 18 million. For one example, I view Digg from two computers home and work. I suspect most Diggers do the same. WSS will view that as 2 visitors. comScore will take user patterns into account and count the correct 1. Standard pattern like that drops you 50%.

The numbers would disagree with that, particularly since WSS doesn't account for user-agent, so NAT systems, like many of AOL's users, appear as a single user. From that standpoint we are probably massively under-counting, not over-counting.

Also, we have database statistics showing individual users (regardless of where they log in from) engaging with the site, as well as new registrations, which are congruent with the growth rates WSS shows with uniques and page views.

Finally, and I should have said this in the post, really page views are more important than uniques, as they are most closely associated with ad impressions. We're over 200 million page views and growing, and those numbers are regardless of whether you have two logins at the same time (we don't care).

WSS utilizes cookie for unique tracking. Not IP or User-agent. If I use two different computers, that's two different cookies.

For what it's worth, I'm addicted to the site and generate a significant number of page views. Hopefully one day I'll take the time to look at the ads.

I suspect the number of page views versus users looks like a pareto distribution (aka 80-20 rule).

You are correct about WSS utilizing cookies for unique tracking.

For anyone interested in the specifics of how the analytics packages track uniques and page views and how they differ from each other, I recommend Jim Sterne's 2007 Analytics Shoot Out:

http://www.stonetemple.com/articles/analytics-report-august-2007.shtml

It is very detailed and explains the pros/cons in a useful way.

Jay - I've poured a considerable amount of time into this subject over the years. It's time for a completely new set of metrics. My machine is sitting in the garage, waiting for a set of wheels and some gas.

Jay, I'm the one who posted the first comment, asking for your thoughts on the best system.

I hadn't checked back until now, thank you for your response.

am victor i will be very happy if you can help to work in your company.am very hard worker if you can make my dreams come true thank

Post a comment

If you have a TypeKey or TypePad account, please Sign In