Main

April 28, 2007

A Bright Future Ahead

BSDB (Old Database Box) I've spent the last two weeks in California, visiting with Jay and checking out the area - it's fantastic, I hope to move here very soon. During the visit, I had the opportunity to visit the hosting facility for Blogshares, and snap a photo of the old database box, shown here. In case you didn't know, it's a Quad PIII 550Mhz with 4Gb RAM and 10K SCSI disks with RAID-5. It's served us well, handling up to 900 queries per second when the spiders were running flat out alongside the website, but it's slowly failing. Not to worry, due to your contributions, soon this old box will go on easy street.

No, we haven't purchased the new one yet - this is taking some careful planning and execution, going for a balance of future security while not compromising on components. We're gonna shell out for a 3 year 4-hour response warranty out of our own pocket, on top of the community donations, to make sure that in the unlikely event of a hardware problem, we can get it sorted quickly. The hope is we'll have a box on order within the next two weeks, we'll keep you informed of any news.

Cheers!

Addendum: While doing some Java today, I'm reminded of a site that has helped me immensely in the maintenance and growth of the spiders, as well as our other projects, yet I've never given it props. It's Roedy Green's Java & Internet Glossary, a useful resource. Thanks to him for the help it's brought me.

October 28, 2006

Disgusted

We've had our own share of ups and downs lately with the Blogshares servers, the AboveNet NOC suffered power outages of some sort two mornings in a row, knocking us out and creating a few hours of table checking and fixing in both cases. Finally, we're back to normal.

Or so we thought. It turns out the PubSub Feedmesh has been offline for the past 24 hours, and we're worried it may not come back, after warnings a few months ago that they were about to close. This was our primary source of pings for new and recently modified blogs, and it's disappearance (again) meant it was time to try something else.

Whats our other option? Blo.gs. We've been approved to access their Cloud Interface for a few months, so we gave it a try.... after a few minutes, we were forced to turn it off again while we did some hard thinking.

Why? Here's an example of the quality of pings we were receiving - this is in raw ping format. Looks like about 90% spam, up from the more usual 30 - 50% spam we'd see from PubSub, and quite a bit of it was getting past our fairly decent filters. This is, as far as we're concerned, a nasty change from what we were quite good at handling, and our time is limited to handle this new torrent of, well, crap.

Some will have noticed we're still hovering just under 9 million blogs on our count, while Technorati now proudly proclaims to have 55 million. Piffle. Our model is different, however, we remove old/dead blogs from the index, plus our hardware restrictions (no VC funding for this little group, remember) means we have to ignore some legitimate blog groups (LiveJournal) that would otherwise swamp us. I'd say our spam filtering is a little better so far also - at least, it is while we dont use the blo.gs feed.

What to do now...

June 18, 2006

Growth & Time

Over the last few years, the Blogshares index has grown, to proportions we wouldn't have expected those few years ago, back when we were at under a million, and C's goal was 5 million blogs. Here's what our growth looks like over the last two years (click for larger version):

In a few places, one of them very recently, more blogs were removed than added, from a combination of a short time where we purposefully didn't add new blogs, and a cleaning of the index to remove the splog filth. You'll notice a stronger rise over the last year, we can attribute this to the rise of MSN Spaces more than any other service.

Continue reading "Growth & Time" »