View Full Version : Quota not working
Hi,
My quota was reset on Thursday to the full amount and seems to be stuck at the reset point.
There has been traffic and clicks and commission are being recorded but the daily quota has not moved.....
Any ideas?
Thanks
Griff
It's not just you, must be some problem their end.
I'm having a similar thing going on with the system messages for the last day or two. Read a message or click on marka ll as read and the unread figure duesn't change. Not sure if anyone else is getting that one?
Does worry me that so many things like this keep going wrong so regularly with AWs. Makes me think how many sales have been lost over the years and how much have I ended up out of pocket as a result of shoddy workmanship.
Thanks Amoochi,
I did create a ticket for this problem on the support desk but they did not really know what the problem was.
A bit of a worry that other actions may not be recorded either, hadn't thought of that.
Hi guys,
Apologies for the delay. I have been off for the last few weeks.
The quota system was taken offline for the weekend due to a quirk in an upgrade we were making. All requests were logged but your quotas were not being updated. It was deemed acceptable to allow unlimited usage of the API for this period and no downtime would have been experienced.
I should point out that the ShopWindow system and AWin systems/tracking are completely separate platforms and are in no way affected by each other. If the quota was to stop working for whatever reason tracking will still track and commissions will still be paid out.
As far as I know, our tracking solution is one of the best in the industry and has many failsafe measures to ensure we never lose commissions. That said, if you feel you are missing commissions please contact technical services and we will look into this.
HTH
Cheers
Looks like you have all had unlimited quota for a few weeks actually.:p
Ill fix this today and get your stats updating again :D
Cheers
Still not qute right - it ate most of my quota last night then stopped,
my quota has not moved for at least the last couple of hours.
I liked the older version where we had unlimited quota and did not have to keep switching the websites away from shopwindow!
Still not qute right - it ate most of my quota last night then stopped,
my quota has not moved for at least the last couple of hours.
I liked the older version where we had unlimited quota and did not have to keep switching the websites away from shopwindow!
Ah yes, I made a small update last night right before 5pm which will have eaten up your quota right before they were reset.
I should have the updated system deployed today. :D
Seems completely stupid to me that there is a limited quota in the first place. It's like being at school. Either ShopWindow want my efforts working at sales for AWs merchants or not. I either do it whole hearted or in the very near future, stop doing it at all and look to implement other solutions of making money for my sites.
The entire purpose of anyone using ShopWindows is to get sales, therefore it is in everyone who uses the scripts interests to actually get as many people as poosible looking at as many products and pages as possible, therefore meaning going through as much of a quota as possible... Something which has to currently be completely avoided, otherwise the stupid quote exceeded, lets make the affiliate look like a prat with page errors messages show up, sending potential sales away and stopping them from wanting to bother coming back. It's a great way of killing motivation for affiliates. I know I certainly can't be bothered when any amount of promotion resulting in extra visitors would almost certainly result in page errors galore, as happened twice in the space of a week the last time I bothered with any kind of promotional work for the scripts on my site.
At the moment it looks like the quota is being ignored, it seems to be jumping up and down from positive to negative but at least my sites still seems to be working, but it would be nice if I could stop spending time trying to avoid my sites showing up error messages, and instead go back to posting merchants offers etc on my websites!
Right now my quota is: -140,319 (thats well negative!)
but all seems to be working OK!
At the moment it looks like the quota is being ignored, it seems to be jumping up and down from positive to negative but at least my sites still seems to be working, but it would be nice if I could stop spending time trying to avoid my sites showing up error messages, and instead go back to posting merchants offers etc on my websites!
Right now my quota is: -140,319 (thats well negative!)
but all seems to be working OK!
Ah yes! We disabled the SOAP faults when excess quota was used until we were sure the new API Quota system was working correctly.
This will be re-enabled in the morning sometime once we have completed some final checks. What is showing in your account may not be correct but its probably not far off.
Most accounts are not in the negative very much, but I will take a look at your account again before we switch it back to ensure its reporting correctly.
Cheers
Remaining operations: -48,946 Edit 5 mins later... -59,119
Really being filled with absolutely zero confidence with things right now... It's one thing after another.
Still showing products and the likes as it should right now and not the normal quota exceeded bollocks though.
Preferred the 10,000,000 quota limit that was showing a couple of days back to this rubbish.
Remaining operations: -48,946 Edit 5 mins later... -59,119
Really being filled with absolutely zero confidence with things right now... It's one thing after another.
Still showing products and the likes as it should right now and not the normal quota exceeded bollocks though.
Preferred the 10,000,000 quota limit that was showing a couple of days back to this rubbish.
As I said, we updated the system recently to enable us to launch the new enterprise/community platform. We also removed functionality that would have returned insufficient quota errors so that if anything went wrong your sites would not have gone down.
If your account is showing a minus quota its because your are well over your daily allowance. Obviously we will keep on serving requests until the old functionality is re-enabled. Probably tomorrow.
If you have having issues keeping your quota under control take a look at http://www.shopwindowforum.com/showthread.php?t=1294
If your account is showing a minus quota its because your are well over your daily allowance.
Nevermind.
Hello All,
lets make the affiliate look like a prat with page [quota] errors
This really is not the intention of ShopWindow, to create errors on affiliates pages. In an ideal world of course everybody would have unlimited quota.
However, this simply is not feasible. If we chuck more hardware at ShopWindow then there is no guarantee that we will get any ROI on that extra resource. Doing this for no return would drive ShopWindow into the ground. This is why we have to work with what we have and not just install extra servers, at a high cost to us.
If everyone had unlimited quota then how do we possibly police usage on our limited server space and at the same time maintain a fair and optimum service for all?
Upon the release of V3 there will be an option for unlimited quota and a press release is due that is going to detail all of this a little further.
Regards
Upon the release of V3 there will be an option for unlimited quota.
That sounds really ominous, if truth be told.... Immediate idea hits the brain that it would be some kind of pay per quota thing, where those that can afford it get, unlimited... Hope i'm miles off the mark with that. Will wait for the announcement on that one... No point speculating and winding myself up about it.
Oh and I wish I knew how the hell to get rid of the mass spammers and script abusers, at the end of the day, they are costing people like me sales and when you survive on relatively few sales, you notice a lot when they drop.
That sounds really ominous, if truth be told.... Immediate idea hits the brain that it would be some kind of pay per quota thing, where those that can afford it get, unlimited... Hope i'm miles off the mark with that. Will wait for the announcement on that one... No point speculating and winding myself up about it.
Dont wind yourself up about it mate. Those that don't take the pis will be fine. Aff Win have a pretty fair morality, they're not just about the dollar ya know! I'm alittle worried about not being able to carry on, but at the end of the day, I know Ive put something into the whole project, not least on here. I think AW would probably let all us regulars on the forum carry on as normal, after all, without us, they wouldnt get all those error reports and suggestions would they?
Oh and I wish I knew how the hell to get rid of the mass spammers and script abusers, at the end of the day, they are costing people like me sales and when you survive on relatively few sales, you notice a lot when they drop.
If you're talking about spam bots on your site sapping your quota, PM me cos I have a solution in beta that seems to be working well.
If you're talking about all hose webspammers who have filled the serps with SW products and got everyone elses installs penalized, dont worry about it. They're likely to be the big picture as far as the quota problems go, so are likely to be in the cross hairs.
I know what it's like seeing those figures drop sharply and quickly. I have faith though and know eventually, it'll all come back with knobs on.
BTW, has anyone noticed a massive up in googlebot activity over the last couple of days? They're hitting one of my niche sites every 10(ish) seconds, which is knacking my quota at the minute. I wonder if there's a way to throttle them back without getting penalised. :cool:
re googlebot activity - I added the following to my robots.txt to tell good robots to slow down - seems to help a bit
User-agent: *
Crawl-delay: 20
but my quota is still being hammered,
almost hit -2,000,000 earlier today
but looks like it has been reset to zero
but the websites are still working.
If you're talking about all hose webspammers who have filled the serps with SW products and got everyone elses installs penalized, dont worry about it. They're likely to be the big picture as far as the quota problems go, so are likely to be in the cross hairs.
I am trying to be patient, it's a bit of a pain in the ass though, when you get used to having a certain level of income and see it increasing at a nice level and then, wham, in the space of a few weeks it drops to virtually nothing and stays like that for months... Coinsiding with various spam techniques used by those that should be barred from AWs full stop, quota limits making it not worth while doing promotion work and the thought of needing to upgrade soon, making it not worth while further developing the current scripts much.
That'll be the one and the major pain in the rear...
Without those ******s, quota limits would not be needed anywhere near so much and neither would we all be competing for sales on an uneven level, just because we do things the proper way and don't break every ethical morally correct code of conduct possible.
Not having much of a problem with spambots and the likes, my Total Format site especially, is pretty heavily defended against most nasties.
I know what you mean about morality and ethics.
If our childhood teachings were correct, those with Morals and Ethics would do better than we do. Unfortunately, the internet has no morality, so those with less morals do the best. Oh, dont get me started!
G says "Do no evil" but at the end of the day, their algos cannot detect evil unless a human decides it's evil in the firstplace and creates a way to detect it. Sometimes, the human in question has an ulterior motive to allow the evil to continue. Imagine the affiliate spam power of someone like Matt Cutts! he could make billions a week if he put his mind to it (wouldnt be suprised if he did)
That said, AffWin are human, and can detect evil. Im hoping their morality will come into play in V3.
That said, AffWin are human, and can detect evil. Im hoping their morality will come into play in V3.
:D
Just to follow up. The quota restrictions are now back in place. Thanks for your patience on this one.
Yes I noticed that normal service was resumed when all my shopwindow websites started displaying soap errors all over the place!
Yes I noticed that normal service was resumed when all my shopwindow websites started displaying soap errors all over the place!
Ive hit my quotas a few times lately and found alot of bots I didnt have blocked.
Ive posted a robots.txt list here: http://www.shopwindowforum.com/showthread.php?p=5836
should help abit :D
Confuscius
05-08-09, 18:46
Hi Andy
I, too, have been slammed BUT I cannot see any offending bots and my server demands over the last week seem to be less bandwidth than normal - my monitoring over the last 3 months is basically telling me that my quota should be enough for about 28 hours per day which is more than the 24 hour real day which is OK!!!
However, my recent monitoring now tells me that I have sufficient quota for about 14 hours i.e I run out at about 7.00am and have no site until 5.00pm in theory!!! That is not the problem, the problem is that the proper bots are indexing blank pages - well not exactly blank as I have suppressed the AW messages and show a template full of adsense ads but of course my main sites are now being systematically destroyed and this is evidenced by the massive fluctuations that I am seeing in earnings on a day to day basis. My current plan is to RECODE SW such that if a page cannot be served because of no quota THEN I serve content from elsewhere - my favourite option being to create my own new API from multiple network feeds in an attempt to preserve some of my rankings. The other option is to drop SW altogether and concentrate on some other models that I know work well but do not have a heavy reliance on external matters outside of my control BUT I really do not want to do this!
Clearly, IF the quota was to be reset at say 7.00am instead of 5.00pm then it would be down mainly overnight when the scumbag Russian and Chinese spoofing bots are most active, if, in fact, that is the issue. At least REAL people would see something instead of what looks like some very elaborate Google cloaking exercise which is what the current scenario LOOKS like!
The other thought that crossed my mind is that because my logs are not identifying the problem bots/ip's then is some joker deliberately consuming my calls (I will not explain how this is possible, but it is!) - in other words, is there a systematic denial of service on SW attack going on - a bit strange that so many seem to being hit so hard????
I have no idea what is causing the drain unless of course for some reason SW might be registering my CONSTANT calls as MORE calls for some unknown reason.
For once, I am stumped, albeit I have a little idea that I should be able to test tomorrow morning whenthe quotas have expired again! (Try to find out who is NOT affected may tell us why some of us are being affected!!!).
Can anyone at DigiWin spot the pattern of what is causing this current issue or can DigiWin let quotas run out and register the overrun caused by keeping on serving at the higher rate? Basically I have no way of identifying from my end which one of my sites is causing the issue, or is it all of them??
Running out of quota each day clearly makes time investment in the long term in SW sites a less than desirable backdrop for the future.
I too have just been hammered and put out of action by an attack from 67.195.115.176, starting on 4th August (at 00:00:01am) and hitting every 4 seconds, triggering a '10% of quota remaining' message at 4am, and hoovering up the remaining quota long before I checked my e-mail at 8am. Take a day or so holiday, turn your back on the Internet, and look what happens!
I hope to have now solved the problem. The problem being that the bot was bypassing my interface pages and gaining direct access to the SW client software. Also the IP address concerned was not on Project Honey Pot's blacklist. That was the downfall of my system, as there will always be emerging problem IPs/bots that have yet to be added to the blacklist.
My solution (fairly easy to set up): any visitor that does not access SW via one of my website pages is forced through to the SW interface pages on my website, thus preventing quota drain and hopefully enabling me to take stress-free holidays.
So far, so good, but time will tell.
Rgds
Val
Confuscius
06-08-09, 00:56
That IP isa YAhoo crawler - so we are now banning the major search engines!
Anyways, I am now down to just over 6 hours before the quota runs out - arriverderci to about £xxx,xxx sales per annum.
I give up!
Plan B!!!
I've just banned all bots, changing my robots.txt to:
# go away
User-agent: *
Disallow: /
I hope this will mean my quota can't get used up anymore??
I realise this means my site won't be indexed, but I link in to my site from other sites in my network and that's where most of the traffic comes from, so I'd rather live with no indexing than have the site down.
Rgds
So far, not good. So forget that idea from yesterday.
My quota was hammered again yesterday (5th Aug) after the 5pm reset, so at 23.45 on Aug 4th, got the '10% remaining' email. Now my shops and all links to shop products from my contents pages will be out again until 5pm.
I have two shops on two sites - one is cycling related and the other is gardening related. Currently it's only the gardening shop that's being attacked -again by 67.195.115.176 every 4 seconds, even though I thought I'd blocked it.
What is going on?
Are you right, Confuscious? Is it definitely Yahoo or is it masquerading as Yahoo? It's not being recognised as a search engine by Project Honey Pot. In any case I will try Raid's idea and ban all bots. Fortunately my shops are part of bigger content sites and that's where my traffic comes from too. I'm not sure if the attacks will continue via the shop product links on my website pages too - I might have to remove those links if it does.
This is defeating the purpose of SW as I can't set it up to run on auto. I do not want to be monitoring it daily in order to keep it working, so may drop the whole idea as I don't want dead-end links appearing to my site visitors.
Can't AWIN do something - anything?????
In despair
Val
Confuscius
06-08-09, 10:19
Hi All
I am now in a position to investigate the real causes of this issue - I have set up some scripts on some of my sites to independently record request activties on my servers.
To date, my conclusions - the majestic bot / spoofer identified by Andy is currently particularly active - give away signs are that it is being run from rather a lot of disparate IP addresses and is clearly some sort of comment scraper / spammer - as this is a recent new one then Project Honeypot will take some time to catch the offending IPs - for now I will ban IPs server wide rather than mess with lots of robots.txt files and to be honest the scrapers / spammers tend to ignore them anyway!
The new bing / yahoo collaboration seems to have led to a microsoft / yahoo indexing fest! The msnbot2.0 is even trying to index goto pages which of course it should not IF it respected robots.txt - a right bloody mess of spidering gobbledegook.
I will report back!
If anyone has taken action which now keeps you within your limits please add what you did to this thread!
Ta, Paul.
VAL - http://whois.domaintools.com/67.195.115.176 - definitely Yahoo - if you can access your server firewall then ban the IP at server level IF you want to stop the activity in its tracks. One call every 4 seconds is a NORMAL crawl rate for Google in my experience BUT is an increased rate for Yahoo.
WE shouldn't need to be banning ALL or any of the good bots, just so we don't use up a quota. There are plenty of bots out there that are pretty much essential in terms of getting sites seen and spidered without paying stupid over the odds prices for advertising, which defeats the whole purpose of trying to make money with this in the first place. The good bots too can and do sometimes gobble up quota, but without them, it's pointless having the shop at all, as nobody would ever find it, except maybe by pure chance and no visitors means no sales, or as has been mentioned, by linking in via your own sites and networks.
I have a fairly substantial list of banned bots, ips and various other spam/bot/security techniques and the likes for my sites, but simply for the fact that the shop is not the main section of my Total Format site, i'm not going to cut my nose of to spite my face, by limiting other bots. My Amoochi site doesn't have the luxury of having an entire other sites worth of content like Total Format, once Amoochi hits the quota limit essentially that site is dead until 5pm, even worse is when good bots visit when the quota errors are showing and the bots basically stop indexing completely and don't come back for ages, it's a stupid catch 22 situation that results in less visitors and far less sales and a crappy looking ROI for AWs and SWs.
I really hope that you guys at SWs HQ are working on something much more tuned into ensuring that us affiliates aren't going to be hit by these quota issues like this for the future. SWs isn't just about having a pretty interface and groovy search functionality, it's also very much about the software actually being solid enough to not be hammered to buggery by rogue bots, soemthing which with some clever little coding techniques can pretty much be cut out from SWs end completely. Meaning quota limits would be much less important...
Quota limits and non function pages due to limits being reached = complete waste of time for all concerned.
I know... Be patient and all that, V3 is coming, it'll be much better and will benefit those of us who put effort in, etc, but just seeing and hearing the other hard working regulars on here, hitting limits, winds me up as much as when it's happening to me. Especially when I can only imagine how many sales some must be losing as a result.
A solution of sorts I think is to set up a .htaccess redirect to the main shopwindow website, I am trying the following (change 623 to your own affiliate id and 393 to whichever category you want to goto), the websites now work for the customer but as far as I can see the clicks do not seem to be credited to my account.
DirectoryIndex index.php
Options +FollowSymLinks
RewriteEngine on
RewriteRule . http://www.awin1.com/swclick.php?a=623&c=393 [nc,l]
My quota has shown zero since I checked it this morning, but now I can access SW again, so have the good folk at AWin suspended the quota today? If so, thank you!
I will try your idea Amcho, but as my quota is showing zero, I won't know if the code works until the quota reset at 5pm, when I can then monitor the effect of clicks.
Also working on another solution.
Am going off on a week's holiday soon, so really want to get this sorted before I go, otherwise my shop will be offline for a week and my web visitors are going to get a really bad impression of the shop pages and are unlikely to bookmark or return to my site. I'll just have to put up an apologetic message.
Rgds
Val
Confuscius
06-08-09, 13:05
Amcho - there are various ways to redirect stuff to SW to transfer visitors BUT it also transfers the bots and still leaves the strain on the SW servers - a bit dangerous if we all did this!
My update is as follows - I set up a script to run everytime the category tree is requested i.e NOT server based logging BUT SW logging! This logs all visits to a series of domains and records host + agent + time + IP and is formatted as a simple csv file - I create one file per day - a sort of activity summary. I can then drop this file into Open office stick a pivot table over the top and see what is going on, hopefully! No complicated log analysis needed!
Example in relation to ONE of my files today, approx 20,000 requests in total. Just to pick out the bot that Andy identified :
Mozilla/5.0 (compatible; MJ12bot/v1.2.4; http://www.majestic12.co.uk/bot.php?+)
138.47.102.92 88
173.25.58.214 112
69.115.241.123 113
Mozilla/5.0 (compatible; MJ12bot/v1.2.5; http://www.majestic12.co.uk/bot.php?+)
195.138.193.179 272
205.209.170.19 38
205.209.170.2 97
205.209.170.26 74
213.93.147.235 83
66.112.55.170 2
68.112.173.126 62
69.115.241.123 63
76.125.200.130 148
82.193.97.66 294
83.171.167.32 71
85.10.196.62 52
85.113.244.201 178
85.178.83.163 300
85.23.34.55 252
88.112.48.68 117
88.159.74.199 343
91.121.76.115 80
91.121.96.210 79
91.205.174.49 249
91.67.216.229 33
92.249.115.128 23
93.95.187.253 52
94.169.191.51 498
95.132.2.250 7
98.207.146.204 74
99.163.252.217 104
Now this shows me all IP addresses used by the user agent so far today and a count of how many requests from each IP - cut and paste and block Even thogh MJ12 tell me I am wasting my time trying to block them by IP - well I sure as hell can block their users' IPs.
I will do a breakdown of all activity by user agent next and draw some conclusions!
Confuscius
06-08-09, 13:15
User agent summary as follows - I unblocked various things last night to see if they were still about!
Agent
(empty) 1141
Baiduspider+(+ http : // www . baidu . com / search/spider.htm]) 7
Gaisbot/3.0+(robot06@gais.cs.ccu.edu.tw;+http : // gais . cs . ccu . edu . tw/ robot.php]]) 1
ia_archiver (+http : // www . alexa . com/ site/help/webmasters; crawler@alexa.com) 52
Jakarta Commons-HttpClient/3.1 1
Mediapartners-Google 127
MLBot (www.metadatalabs.com/mlbot) 1
Mozilla/4.0 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2) 2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648) 2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2) 1
Mozilla/4.0 (compatible; Vagabondo/4.0; webcrawler at wise-guys dot nl; http://webagent.wise-guys.nl/; http://www.wise-guys.nl/) 570
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322) 3
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; DigExt) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) 3
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa Toolbar; (R1 1.5)) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FREE; .NET CLR 1.1.4322) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; .NET CLR 2.0.50727) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 1.0.3705; InfoPath.1) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) 2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 3
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) 2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322) 2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) 7
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; CIBA) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322; InfoPath.1; SpamBlockerUtility 4.8.4; IEMB3) 2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1) 2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Maxthon; CIBA) 3
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Sky Broadband; GTB6; .NET CLR 1.1.4322; Creative ZENcast v2.00.13) 2
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; YPC 3.2.0; FunWebProducts; .NET CLR 1.1.4322; yplus 5.3.04b) 3
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) 15
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 1
Mozilla/4.0 (compatible; MSIE 6.0; Windows XP) 3
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727) 6
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; FunWebProducts; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; .NET CLR 2.0.50727; Zune 3.0) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; FunWebProducts; sbcydsl 3.12; YComp 5.0.0.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; ga=MDAxMzIwMmUwYmMw; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2) 2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) 2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.1.4322) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 2.0.40607; .NET CLR 1.1.4322) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2) 2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; WSA v1.0 - www.vinn.com.au; .NET CLR 2.0.50727; .NET CLR 1.1.4322) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) 3
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB6; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; .NET CLR 1.1.4322) 1
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506; InfoPath.1) 2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506) 2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30618) 2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506) 2
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506) 2
Confuscius
06-08-09, 13:16
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/4.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506; InfoPath.2; im_en_0002) 2
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MDDS; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; UGES 1.5.1.0; .NET CLR 2.0.50727) 3
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322) 2
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; BT Openworld BB) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Comcast Install 1.0; .NET CLR 2.0.50727; msn OptimizedIE8;ENUS) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB5; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30) 2
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) 2
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 2.0.50727) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; InfoPath.1; .NET Client 3.5.30729.01) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2) 2
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618; .NET CLR 3.0.04506) 1
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618) 3
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618) 7
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.1; .NET CLR 3.5.30729; .NET CLR 3.0.30618) 2
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; SLCC1; .NET CLR 2.0.50727; InfoPath.2; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618) 2
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30729) 2
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; Zune 3.0) 1
Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter/tests); +http://www.exabot.com/go/robot) 1
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 7939
Mozilla/5.0 (compatible; MJ12bot/v1.2.4; http://www.majestic12.co.uk/bot.php?+) 313
Mozilla/5.0 (compatible; MJ12bot/v1.2.5; http://www.majestic12.co.uk/bot.php?+) 3645
Mozilla/5.0 (compatible; Seznam screenshot-generator 2.0; +http://fulltext.sblog.cz/screenshot/) 11
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp) 1184
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_7; en-us) AppleWebKit/530.18 (KHTML, like Gecko) Version/4.0.1 Safari/530.18 1
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_7; en-us) AppleWebKit/530.19.2 (KHTML, like Gecko) Version/4.0.2 Safari/530.19 2
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13 1
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13 2
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr; rv:1.9.0.12) Gecko/2009070609 Firefox/3.0.12 1
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1.5) Gecko/20070713 Firefox/2.0.0.5 8
Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13 2
Mozilla/5.0 (Windows; U; Windows NT 5.1; el; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.20) Gecko/20081217 Firefox/2.0.0.20 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7 16
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13 2
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729) 3
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 2
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13 (.NET CLR 3.5.30729) Creative ZENcast v1.02.11 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/530.5 (KHTML, like Gecko) Chrome/2.0.172.39 Safari/530.5 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; nl; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729) 1
Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 1
Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/530.5 (KHTML, like Gecko) Chrome/2.0.172.39 Safari/530.5 1
Mozilla/5.0 (Windows; U; Windows NT 6.0; da; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12 3
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12 2
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13 1
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 (.NET CLR 3.5.30729) 1
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729) 1
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12 (.NET CLR 3.5.30729) 2
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13 1
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.1) Gecko/20090715 Firefox/3.5.1 (.NET CLR 3.5.30729) 3
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729) 2
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5 (.NET CLR 3.5.30729) 3
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); http://spinn3r.com/robot) Gecko/20021130 24
msnbot/1.1 (+http://search.msn.com/msnbot.htm) 417
msnbot/2.0b (+http://search.msn.com/msnbot.htm) 3747
Opera/9.10 (Windows NT 5.1; U; en) 1
Opera/9.27 (Windows NT 5.1; U; en),gzip(gfe) (via translate.google.com) 1
R6_CommentReader(www.radian6.com/crawler) 2
Sosospider+(+http://help.soso.com/webspider.htm) 14
SurveyBot/2.3 (Whois Source) 1
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618) 1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729) 1
Total Result 19422
Thanks for this Confuscious, but it still requires daily monitoring and daily tinkering.
Also I wonder if it is intentional, or coincidence based on the country of origin of the bots, that the quota drain begins after most people in the UK have stopped work/gone to sleep, so by the time we notice the next working day, the quota has gone again.
Can AWin help us out here?
Val
My quota has shown zero since I checked it this morning, but now I can access SW again, so have the good folk at AWin suspended the quota today? If so, thank you!
:D
You all seem pretty active today and are working hard to kill some bots so we have toned down the quota restriction on V2 for a few hours to give you some help in debugging.
Plus we are performing some system maintenance :p . The restriction will be re-enabled when the quotas are reset at 5pm!
Confuscius
06-08-09, 13:21
Conclusion is as follows (as I can compare to previous days months as I have months of logs!) :
Googlebot - normal levels!
MSN - increased Activity
Yahoo - Normal levels!
MJ12bot - is becoming a pain in the ar*e!
A few other bits to investigate. Of course, I am assuming that the reported activity is representative of what is happening on my other sites and other peoples sites BUT it sort of gives you an insight into relative activity for a fairly active SW'er!
MJ12bot - is becoming a pain in the ar*e!
That's one of the bots I have blocked on an IP level and on a bot disallow level, it keeps on trying even after being blocked, then activity slowly dies down over the course of a few weeks, then low and behold, back it comes again knocking at the door. It is supposed to follow rules, it doesn't seem to stop trying though. It also has a nasty problem with fake versions of itself.
http://www.google.co.uk/search?hl=en&q=MJ12bot&meta=
The people behind it claim to be building a search engine and are very quick to go round the net defending their bots activity and fast to inform when they hear of fake bots claiming to be them, but im not unbanning them until such time their supposed search engine appears in full proper actual Google style working order. There is a list on their official page listing all the fake mj12 bots, worth adding to the ban list, i'd imagine more often than not, the mj12 hammering is fake, not the official one:
http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
Confuscius
06-08-09, 14:01
The MJ12 bot activity looks to be going exponential - there may be a way to unblock it, trap for it at request level and redirect it back to index their own pages. This may well let their remote bots create a denial of service attack on themselves and teach them a nice lesson. Perhaps, they then might ban MY site addresses from their bots!
Andy - code please!!!
Banning MJBot and adding a few more to my Robots and htaccess has solved my problem. 1 site had 600,000 gone in 12 hours now down to < 200,000 per day. All traffic now appears to be users, G,Y + Bing
Based on Pauls analysis of MJBot, and all previousness, here's some stuff for your htaccess and your robots.txts
Add in your .htaccess:
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
#######kill some bad bots
RewriteCond %{HTTP_USER_AGENT} ^Balihoo [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^IRL/bot[OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl.*
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
deny from 195.138.193.179
deny from 205.209.170.19
deny from 205.209.170.2
deny from 205.209.170.26
deny from 213.93.147.235
deny from 66.112.55.170
deny from 68.112.173.126
deny from 69.115.241.123
deny from 76.125.200.130
deny from 82.193.97.66
deny from 83.171.167.32
deny from 85.10.196.62
deny from 85.113.244.201
deny from 85.178.83.163
deny from 85.23.34.55
deny from 88.112.48.68
deny from 88.159.74.199
deny from 91.121.76.115
deny from 91.121.96.210
deny from 91.205.174.49
deny from 91.67.216.229
deny from 92.249.115.128
deny from 93.95.187.253
deny from 94.169.191.51
deny from 95.132.2.250
deny from 98.207.146.204
deny from 99.163.252.217
deny from 138.47.102.92
deny from 173.25.58.214
deny from 69.115.241.123
Robots.txt:
User-agent: SapphireWebCrawler
Disallow: /
User-agent: IRLbot
Disallow: /
User-agent: Twiceler
Disallow: /
User-agent: ShopWiki
Disallow: /
User-agent: Yanga
Disallow: /
User-agent: psbot
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: CazoodleBot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: Vagabondo
Disallow: /User-agent: Charlotte
Disallow: /
Dunno what Charlotte is, but it's in the SW robots.txt, so must be worth blocking.
I think all that's correct. Check to make sure you dont get an internal server error before walking away.
It'd be good to build on this, so let me know if you use it and if it works or not
Confuscius
07-08-09, 17:58
Andy, the MJ12bot IP addresses are only from a 12 hour sample - there are probably 2-3,000 being used in reality - hence why I am doing what I say next!
I have decided to adopt the robots.txt route only for now - curious to see when mj12bot does not respect robots file. I can then use my SW log files to see what is going on on individual domains.
Unfortunately mj12bot is only part of my issue!!! The other issue being 'bing' which seems to be going ballistic on sub domain domains for some reason to the extent that one of my vps's was running a load equivalent of 75 cpus last night - it was a bit slow BUT it did not fall over just kept queueing the requests. Anyways, a quick 65.55.0.0/16 in my ConfigServer firewall and I have basically banned MSN / bing for now - not too fussed as not much previous trafffic but I have been getting more bling from bing recently which is hardly surprising given spider activity.
I still have one vps which is on a runaway and does not seem bing /mj12bot related but some scripts should pin it down for me over the weekend.
There really ought to be a SET of recommended exclusions / practices that become a condition of use of SW BUT then that would probably mean that 95% of SW installations would then get blocked permanently! :p
After a few weeks, thought we'd finally worked out an automated system to keep out the spammers, but our quota has been hammered again overnight (13th/14th Sept 09).
Just checking the event logs now to identify the culprit(s), but has anyone else had any trouble?
This thread seems to be the wrong one now, as the quota is working - too well!
Will continue to post in the Resources section under blocking spambots, spiders etc.