Ok, now that things are back up you may be wondering what went wrong and why wasnt it fixed over the weekend. To put it briefly (or not so briefly), a small hardware issue led to some serious database corruption which brought down the entire site over the weekend. Following a normal procedure we restored the data from a backup and turned the site back on but two days later it became severely corrupted and all the post data had to be erased. Of course when you have two failures in a week it becomes obvious that there are more problems then just a little error. To prevent problems down the line we did another restore using old post data from the 24th which was carefully and forcefully rebuilt to remove any possible issues.
So, where does this leave us? Well, the board is back up and running as you can see. All of the user information and pms were transferred from the moment the board went offline so that data should not be lost. As for posts, unfortunately we couldnt use the corrupted data from this week so all posts from Sunday, Monday and Tuesday were lost. This means you will either need to repost messages or post a summary if there is a need to wrap up an old post. Obviously this is going to be inconvenient but such is life.
Can this happen again? I wish I could say that this is a 100% fix but truthfully, no one knows. Like I mentioned before, we have done a lot to remove any corruption but you really never know. Several backups exist at this stage and a new one will be made every night. As a long term precaution we have ordered another server to handle database files only, this machine will be added to our network within the next week giving us a 3-system redundancy. We will also be updating our backup process to keep more backups for a longer time and on multiple machines, just incase. Basically this problem has caused us to look at everything we can think of, to add hardware before it is needed and to build a detailed emergency procedure process should anything break again.
Obviously loosing 3 days of the board is annoying and hurts everyone in the community but we are back up and 100% committed to keeping the board growing and stable. Thank you for your understanding and enjoy!
So, where does this leave us? Well, the board is back up and running as you can see. All of the user information and pms were transferred from the moment the board went offline so that data should not be lost. As for posts, unfortunately we couldnt use the corrupted data from this week so all posts from Sunday, Monday and Tuesday were lost. This means you will either need to repost messages or post a summary if there is a need to wrap up an old post. Obviously this is going to be inconvenient but such is life.
Can this happen again? I wish I could say that this is a 100% fix but truthfully, no one knows. Like I mentioned before, we have done a lot to remove any corruption but you really never know. Several backups exist at this stage and a new one will be made every night. As a long term precaution we have ordered another server to handle database files only, this machine will be added to our network within the next week giving us a 3-system redundancy. We will also be updating our backup process to keep more backups for a longer time and on multiple machines, just incase. Basically this problem has caused us to look at everything we can think of, to add hardware before it is needed and to build a detailed emergency procedure process should anything break again.
Obviously loosing 3 days of the board is annoying and hurts everyone in the community but we are back up and 100% committed to keeping the board growing and stable. Thank you for your understanding and enjoy!