Paul Inman
02-07-07, 18:47
Hi all,
To follow up from this weekends post, I thought it would be good to let you guys know a bit more about what happened and how we've reacted to it.
A bit of background first:
We have standard and custom monitoring in place for all our servers. Whenever anything happens, from High CPU or services not responding we (a techie that can actively fix things) will get a call from our hosting centre.
The error that caused issues this weekend was unfortunately not caught by our monitoring systems. Our primary server received a number of interrupted database connection requests and blocked the server sending them indefinitely.
We've already modified the system to flush these errors if they occur. Next we will be modifying the system to remove the dependency on the primary server. This was in our project list already, as it will allow us to make secondary API servers more portable in the future.
If you have any queries please contact me on the details below or pm me.
Kind Regards,
Paul
To follow up from this weekends post, I thought it would be good to let you guys know a bit more about what happened and how we've reacted to it.
A bit of background first:
We have standard and custom monitoring in place for all our servers. Whenever anything happens, from High CPU or services not responding we (a techie that can actively fix things) will get a call from our hosting centre.
The error that caused issues this weekend was unfortunately not caught by our monitoring systems. Our primary server received a number of interrupted database connection requests and blocked the server sending them indefinitely.
We've already modified the system to flush these errors if they occur. Next we will be modifying the system to remove the dependency on the primary server. This was in our project list already, as it will allow us to make secondary API servers more portable in the future.
If you have any queries please contact me on the details below or pm me.
Kind Regards,
Paul