What happened
We planned moving the Artifacts storage from Azure Blob Storage to Cloudflare R2 for Hosted Mender EU on the morning of the 4th of December.
With Mender, you can use both Global and per-tenant settings: at this time, Global settings were using Azure Blob Storage, and the per-tenant settings were unset.
As a first step, we enforced per-tenant setting to Azure Blob Storage for every tenant. In our tests, this would have been a no change. Unfortunately, we soon discovered that it was no longer possible to upload an Artifact with the GUI.
After some troubleshooting, we decided to revert the configuration, so we dropped the override per-tenant settings, but this caused no changes: still, the upload service was not working correctly.
The backend team has been involved, and the incident filed up.
Soon after, with the help of the backend team, we found that the workflows service was not processing NATS messages correctly (no alert was fired because it was still under the threshold). We restarted the service, and the messages started flowing again.
At this time, the Hosted Mender EU service was back fully operational.
Why this happened?
We discovered that the workflows service stopped working because of some unhandled conditions.
What needs to be done now?
Before applying again the Cloudflare migration, we have to verify the status of the Workflows and NATS services