Over the last few years, the Blogshares index has grown, to proportions we wouldn’t have expected those few years ago, back when we were at under a million, and C’s goal was 5 million blogs. Here’s what our growth looks like over the last two years (click for larger version):
In a few places, one of them very recently, more blogs were removed than added, from a combination of a short time where we purposefully didn’t add new blogs, and a cleaning of the index to remove the splog filth. You’ll notice a stronger rise over the last year, we can attribute this to the rise of MSN Spaces more than any other service.
A Believable Index
I cannot but help take issue at times with sites that make bold claims about tracking 30,000,000+ blogs. There’s two problems I see to this. One, that I’m sure a large proportion of those blogs haven’t posted in months, abandoned like my own personal blog becomes at times, and two, that the count is skewed by splogs. Big numbers sure look cool though, dont they!
Our own index is much smaller by comparison to some sites out there, a mere 8 million. Why, you ask? One reason I’ll humbly admit is we aren’t actively adding LiveJournal blogs to the system. Why not!? Because the last time we tried, the flood was so huge we couldn’t keep up, and until we get a new DB box, we still can’t keep up. Hopefully this will soon change, the future is looking better, and at this rate, without any VC funding, we’ll be a lot bigger and better. Y’know that RSS parsing and stuff some blog tracking sites can do? We can do it too, it’s just turned off until we have better hardware.
Theres a better reason our index isn’t as big as other sites, and thats splogs and dogs.. uhh, dead blogs. Over two years ago we set up a blacklist to stop spam blogs (and sites whos owners did not wish them in the game) being re-added. Over the months new features were added, to classify entries, regexes, the works. It currently contains over 1,500,000 entries, and 70% of new blogs are rejected as a result. We purposefully remove blogs that haven’t posted in several months from the index, we aim to be up-to-date, not all-encompassing. True, just because they’ve stopped posting doesn’t mean they dont still contain valuable information, but we still have that list.
So, thats a little bit about our index. I should finish up by noting that while those ‘other sites’ counts may be skewed by splogs, that doesn’t mean they arent also fighting them too, and good luck to them ‘keeping the crap out’.