Issue with S3 service in sto3 private cloud
Incident Report for Safespring
Resolved
This incident has been resolved.
Posted Jan 11, 2022 - 12:11 CET
Update
We are continuing to work on the backend to increase the performance and stability of the cluster. This means running a rebalance task which takes time. To ensure the stability we let the rebalance task work slowly while we are closely monitoring the cluster.
Posted Nov 22, 2021 - 14:05 CET
Update
The cluster is recovering slowly. Still some misplaced and degraded objects but we will not make any changes to make things recover faster until Monday. Next update will be around noon on Monday or when we see other needs of update.
Posted Nov 21, 2021 - 18:55 CET
Update
The service has slowly recovered during the night and is working as expected. There might be some performance degradation since there's still data integrity checks ongoing. The root cause analyzes and monitoring continues. Status updates will be made if there's any changes with the cluster.
Posted Nov 20, 2021 - 09:24 CET
Monitoring
The cluster is now open for read/write access but still under monitoring. In case of any new issues it might be shut down again. We'll keep a close look at things until we can say it's fully recovered. Next update is not until tomorrow unless anything dramatically changes.
Posted Nov 19, 2021 - 20:57 CET
Update
We are still analyzing things and added more resources investigating this. Write is still blocked until we know more.
Posted Nov 19, 2021 - 18:25 CET
Update
Even though the cluster is in an ok state now, in order to find the culprit of the problem we have kept the access to the service down. We are right now working hard to be able to enable the service again.
Posted Nov 19, 2021 - 16:42 CET
Update
Recovery of the cluster at STO3 to a stable state is going forward. We still have to wait a bit to enable traffic from the outside again to ensure the stability in the cluster when it comes back. Sorry for the inconvenience.
Posted Nov 19, 2021 - 13:53 CET
Update
We are continuing to work on the issue. We have performed some operations to get the cluster in a stable state and the operations are working. We still have shutdown the S3 frontend to ensure the stability before we let on customer traffic. We will update here again at 14:00 the latest.
Posted Nov 19, 2021 - 12:32 CET
Update
We are continuing to investigate this issue.
Posted Nov 19, 2021 - 12:11 CET
Investigating
We have an incident in the STO3 storage site at the moment that we are working hard to fix. All incoming traffic is blocked at the moment. We will update here with new information at 13:00 at the latest.
Posted Nov 19, 2021 - 11:54 CET
This incident affected: Safespring Storage.