So, lately I’ve been noticing some odd traffic patterns on some of the domains here. Domains that get about 40-50 MB of traffic a month (and maybe 1,000 page hits a day) suddenly were getting thousands of page hits a day for a couple hundred MB of transfer. Now, it isn’t like this is resulting in some major problem: my webpages are generally all hosted at Dreamhost, which provides a very generous amount of bandwidth with their hosting plans. That said, it’s a pretty major annoyance to have to filter a netblock out of your log results.
But worse, it shows what a “bad citizen” Microsoft has become.
In looking at my stats, Microsoft’s MSNBOT has created more traffic just crawling my website than Bing has ever sent. For that matter, MSNBOT drives more traffic than all other search engine robots combined. In just the time it has taken me to write this short article (15 minutes), there have been 13 hits. Total hits in the same period from all other search engine robots: none..
And, in looking at the statistical data, this isn’t just today. This is a pattern going back at least three months.
So, I have chosen to ban the netblock that MSNBOT comes from, and I have also taken the exceptional step of banning a number of Microsoft domains as well. I’m just some dork with a personal website, so I don’t expect Microsoft to change their behavior because I add some “deny” lines to my Apache configuration. However, since Microsoft has made apparent by their actions that they cannot “behave” as responsible net.citizens, I feel I have little choice but to ban them until they change.
UPDATE: Here’s some hard stats from the month of December. It’s worth noting that these are the first three lines of the report from analog, with the Googlebot-Image bot added in for reference:
|9667||9664||Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|
|9518||9517||Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, firstname.lastname@example.org)|