PDA

View Full Version : Latest Quota Sapping Problems and solutions


Andy
05-08-09, 15:01
Apparantly, I'm not the only affiliate having quota sapped over the last couple of days.

I've gone right through my logs and found that the Majestic bot has been hammering it's way trhough my quota for about a week.

I left this one alone because it's not a spam crawler and I didn't mind them learning a thing or 2 out of my sites. But recently, this bot has gone mental.

So, To block it from my site, Ive added the following to my robots.txt:

User-agent: MJ12bot
Disallow: /That takes my complete robots.txt bot blocking to:

User-agent: SapphireWebCrawler
Disallow: /

User-agent: IRLbot
Disallow: /

User-agent: Twiceler
Disallow: /

User-agent: ShopWiki
Disallow: /

User-agent: Yanga
Disallow: /

User-agent: psbot
Disallow: /

User-agent: dotbot
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: CazoodleBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: Vagabondo
Disallow: /


I would suggest you do the same to your robots.txt.

Looking through my logs, there were a couple of different MJBots hitting me and on further investigation, it appears that some bots are spoofing the MJBot user-agent. To combat this, I've added a rule to my htaccess file blocking that useragent completely.

All of the above crawlers are quite legitimate, but there is no benefit in allowing them to sap your quota. You won't get any traffic out of them, yet they still want to index each and every page you have (which is probably millions).

If anyone has any more blocked that aren't in that list, feel free to share them.

Cheers
Andy

Raid
05-08-09, 18:11
Andy

Thanks a lot for this list, I have updated my robots.txt accordingly.

Rgds

Raid

Raid
06-08-09, 01:21
It's just gone midnight and already I've got the e-mail warning I only have 10% of my daily quota left, so I've just changed the robots.txt file to ban all bots. I'd rather have my sites up even if they aren't being indexed.

Rgds