A major Google Cloud outage was caused by uninterruptible power supplies being...interrupted

Google had a “critical battery failure” that took its Columbus, Ohio zone offline for up to six hours on March 29.

Apr 15, 2025 - 15:23
 0
A major Google Cloud outage was caused by uninterruptible power supplies being...interrupted

  • Google Cloud’s us-east5-c (Columbus, Ohio) zone had a six-hour outage
  • It was caused by a failure to its uninterruptible power supply
  • Over 20 services and some storage disks went down

Much like the unsinkable Titanic met its demise when it sank, Google Cloud has recovered from a major outage caused by an interruption to its uninterruptible power supplies (UPS).

The company confirmed that its us-east5-c zone, otherwise known as Columbus, Ohio, experienced “degraded service or unavailability” for a period of six hours and 10 minutes on March 29, 2025, blaming it on a “loss of utility power in the affected zone.”

Over 20 cloud services suffered reduced performance or downtime as a result of the outage, including BigQuery, Cloud SQL, Cloud VPN and Virtual Private Cloud.

Google’s uninterruptible power supply just had a pretty major failure

In its incident report, the company explained exactly what had happened: “This power outage triggered a cascading failure within the uninterruptible power supply (UPS) system responsible for maintaining power to the zone during such events.”

“The UPS system, which relies on batteries to bridge the gap between utility power loss and generator power activation, experienced a critical battery failure,” the log continues.

Google’s Columbus zone uses powerful chips from Intel like Broadwell, Haswell, Skylake, Cascade Lake, Sapphire Rapids and Emerald Rapids, as well as the AMD EPYC Rome and Milan processors to power its cloud computing services. The cloud giant also noted that “a limited number of storage disks within the zone became unavailable during the outage.”

Engineers were made aware of the outage at 12:54 PT on March 29, successfully bypassing the failed UPS and restored power via generator by 14:49 PT. Most services were brought back online pretty quickly, thereafter, but some manual action was required for a full restoration hence the six-hour outage.

Google now promises to learn from this event, hardening cluster power failure and recovery paths and auditing systems that did not automatically failover, as well as working with its UPS vendor to mitigate future incidents.

You might also like