New artifact upload unsuccessful
Incident Report for Hosted Mender
Postmortem

What happened

We planned moving the Artifacts storage from Azure Blob Storage to Cloudflare R2 for Hosted Mender EU on the morning of the 4th of December.

With Mender, you can use both Global and per-tenant settings: at this time, Global settings were using Azure Blob Storage, and the per-tenant settings were unset.

As a first step, we enforced per-tenant setting to Azure Blob Storage for every tenant. In our tests, this would have been a no change. Unfortunately, we soon discovered that it was no longer possible to upload an Artifact with the GUI.

After some troubleshooting, we decided to revert the configuration, so we dropped the override per-tenant settings, but this caused no changes: still, the upload service was not working correctly.

The backend team has been involved, and the incident filed up.

Soon after, with the help of the backend team, we found that the workflows service was not processing NATS messages correctly (no alert was fired because it was still under the threshold). We restarted the service, and the messages started flowing again.

At this time, the Hosted Mender EU service was back fully operational. 

Why this happened?

We discovered that the workflows service stopped working because of some unhandled conditions.

What needs to be done now?

Before applying again the Cloudflare migration, we have to verify the status of the Workflows and NATS services

Posted Dec 05, 2023 - 08:17 UTC

Resolved
This incident has been resolved.
Posted Dec 04, 2023 - 08:03 UTC
Monitoring
The issue has been identified and the problem is solved. The upload service is working as expected.
We're monitoring the results before ending the incident
Posted Dec 04, 2023 - 06:31 UTC
Investigating
The previous maintenance encountered an issue during the rollout and we rolled back to the previous settings, but the artifact upload feature via GUI is still affected. We're investigating the issue.
Posted Dec 04, 2023 - 06:07 UTC
This incident affected: Hosted Mender EU.