View Full Version : how big is your site?
OK, going through my control panel I see that my site which has shop window on it is a massive 818 M !! It has over 30 M in the cache.
I haven't marketed the site and it only gets about 100 unique visitors a month, yet it uses about 2 Gig of bandwith a month!
Are these figures really ok? Is the site really supposed to take up this much room and bandwidth?
When I come to market it presumably these figures will sky rocket out the roof! :(
Hi Jill
I've just been looking at some past stats and roughly speaking my site works out as 1G bandwidth per 1000 visitors per month.
This seems to hold true for the majority of this year so there could possibly be a problem.
Very early on I found that a couple of rogue spiders were drawing some seriously excessive bandwidth so I blocked them.
User-agent: ShopWiki
Disallow: /
User-agent: psbot
Disallow: /
User-agent: MJ12bot
Disallow: /
Disk usage for TheOffsetShop is 220M (of which 75M is temp/analog, which I believe is just the stats records, 25M other stats, and 52M is mail. So the site itself is only about 68M).
Hope that helps (Oh - I'm still on V1.0 if that matters)
Bud
:)
OK, going through my control panel I see that my site which has shop window on it is a massive 818 M !! It has over 30 M in the cache.
Wooah.... thats not good :mad:
It turns out Smarty has no garbage collection within the timeout of cached files and therefore does not delete them upon expiration.
I have a fix though :D
If you place the following into your index.php file:
// Clear the smarty cache for docs older than 60mins
$oSmarty->clear_all_cache(3600);
Right below where you initialize your smarty object.
Your index.php file will then include something like this:
// init Smarty
$oSmarty = new smarty_sw();
// Clear the smarty cache for docs older than 60mins
$oSmarty->clear_all_cache(3600);
$oSmarty->cache_lifetime = 60;
Ill make sure this is included in the next update of the ShopWindow Client Software
Cheers
My main Total Format website uses over 500gig a month in bandwidth and takes up over 200gig in filesize. The worst bit about that, is that i'd say it's only about 10% complete, after nearly six years it's still only a work in progress.
So as you can imagine, my shopwindow bits on there count for very little of whats going on there.
Thanks Ollie - that's brought it down to 67m :)
Thanks Bud - how can I find out about the spiders? I have two stats on the site but neither show anything. I think they may just discount spiders and not show them.
Amoochi - aahh, but you do get just a little bit more traffic than me <VBG>
Thanks Bud - how can I find out about the spiders? I have two stats on the site but neither show anything. I think they may just discount spiders and not show them.
If your unsure whether your server host discounts spiders or not within your stats and the likes, you may want to contact your host via their support and ask them such things directly, I've found that different hosts do things differently, so things like this it's easiest to ask them.
You could try using one of the many spider tracking scripts that are available on the net if you want to keep up with what spiders are visiting and how often and the likes:
http://www.apogee-web-consulting.com/tools/track_spiders.html
http://www.google.co.uk/search?hl=en&q=spider+visit+log+script&meta=
Confuscius
25-09-08, 14:59
As AW will eventually limit your calls allowance then it is a good idea to have a plan to ban the baddies. You can find bad bot lists online or build your own. Use .htaccess to ban them e.g.
SetEnvIfNoCase User-Agent "MyBadBOTBot" bad_bot
SetEnvIfNoCase User-Agent "the BaddestBot" bad_bot
SetEnvIfNoCase User-Agent "kissMyBotGoodbye" bad_bot
<Limit GET POST>
order allow,deny
allow from all
deny from env=bad_bot
</Limit>
You will also find that you will also end up banning IP addresses too to stop those pesky scraper types.
Paul
banning IP addresses too to stop those pesky scraper types.
Paul
Just curious, i've never heard that term before, but what exactly do you mean by scraper types? I don't do a lot with the spiders, seeing as i've got unlimited bandwidth and the likes, so I may be suffering with this...
Confuscius
25-09-08, 20:05
"pesky scraper types" = those who scrape your site for content
They mainly masquerade as a fake Googlebot and ignore robots file exclusions.
It is quite interesting to include a unique made up phrase in a website footer and then phrase match search for it a few months later - quite amazing where your content can get to without your knowledge. Some of the worst offenders can call 10 pages per second which can make your calls disappear pretty quickly.
Paul
PS There is no such thing as unlimited bandwidth given the number of unlimited bandwidth hosting deals that I have blown up!
It turns out Smarty has no garbage collection within the timeout of cached files and therefore does not delete them upon expiration.
I have a fix though :D
Ill make sure this is included in the next update of the ShopWindow Client Software
Cheers
Hi Ollie
Is this fix now part of V2? I downloaded the client software a couple of months ago. Where can I look to see if it's there or not?
Cheers
Val
sourchocolate
03-12-08, 15:45
In terms of blocking rogue spiders, below is probably the best solution since you do not need to analyze logs, manage lists of bad IPs/agents and update them periodically. It's done automatically, and what is more important - on the fly, when a spider visits your site for the first time:
Starts here:
http://www.webmasterworld.com/forum88/3104.htm
More explanation/details here:
http://www.webmasterworld.com/forum88/3524.htm
S.
acorndomains.co.uk
15-04-09, 20:56
I had the same thing, it seems that debug logging is switched on by default which creates huge error log files, since a lot of templates have errors.
Delete your error log file and set it to be deleted daily, unless someone knows exactly how to turn off error logging? I'm still looking at that