Date: October 22, 2025
Duration: 78 minutes (08:10 - 09:28 UTC)
Severity: Major service disruption
Executive Summary
A database migration in release v4.1.0-saas.16 caused a complete failure of the Device Authentication service across US and EU hosted Mender clusters. The migration incorrectly deleted a critical uniqueness constraint during online operations, leading to database corruption that prevented service recovery. We restored service by performing a point-in-time database rollback, resulting in 78 minutes of data loss.
Customer Impact: Device authentication was unavailable for 78 minutes. New device enrollments were blocked, and existing device operations may have been disrupted during this period.
Root cause
The new version contained a database migration to 2.0.1 for the Device Auth database, which was designed to replace a uniqueness constraint on device authentication records but executed the deletion and recreation as separate operations. During online migration, the window between index deletion and recreation allowed duplicate device entries to be created, corrupting the database state and preventing both forward migration completion and rollback. For this reason, the only viable solution was to rollback both the Mender Server version and the Database.
Resolution and recovery
With duplicate records preventing normal rollback procedures, we performed a point-in-time database restore to 08:10 UTC, with a safe timestamp before migration execution. This restored database integrity but resulted in permanent loss of all data created between 08:10 and 09:28.
Incident timeline (UTC)
What went wrong
Action Items
We sincerely apologize for the disruption to your operations and, specifically, for the data loss that occurred during the recovery window.