Forum offline today - post mortem


#1

Hello all,

We had some downtime of the forum today. The cause was “death by backups”.

In addition to the N backups we export offsite, we also keep 5 rounds of backups in local storage. The quick growth of the community means the database is growing quickly as well, and times 5 for the backup. This morning while trying to perform the backup, the web application failed to perform a 6th backup (it does the Nth+1 before removing the Nth) and instead of just failing the web application locked up and had to be manually restarted.

As a consequence of this failure, we will move this forum to a better location, with more hardware available, and also with a larger team that can watch it 24x7.

Apologies for any inconvenience this may have caused you.


#2

This is great news, thanks!


#3

We continue to work on moving this onto a 24x7 monitored environment.

Meanwhile, we just had a short downtime of a couple of minutes to update the hardware in-place so that we have plenty of space for such backups while we don’t do the full migration.

Please let me know if you see any issues.