UPS refresh: when "still works" isn't the right question

The unit had been sitting in the comms cupboard for eleven years. Beige plastic, two amber LEDs lit, one occasional beep that everybody had learned to ignore. The IT contact pointed at it during the site walk and said the line we hear constantly: “Still works, hasn’t given us any trouble.” We pulled the front panel off, looked at the date code on the battery (original install, never replaced), and asked when he’d last simulated a mains drop. Nobody had, in his eleven years or before. The next Friday at lunchtime, with the office quiet and a planned five-minute outage notified, we pulled the mains plug. The uninterruptible power supply (UPS, the battery-and-electronics box that keeps the network running for a few minutes when the mains drops) held for 41 seconds before the inverter shut down and the network died.

41 seconds isn’t a UPS, it’s a placebo, and the unit had been claiming to do its job for half a decade while the batteries invisibly lost capacity. “Still works” was true at the level of the LEDs and false at the level of what the device was for.

This is the most common UPS conversation we have. The frame below is what to ask instead.

Why this matters more than it looks

Three reasons a tired UPS is worse than no UPS.

The first is false confidence. A working UPS lets the business assume the network is protected, which means the rest of the resilience plan (generators, failover, runbooks) doesn’t get attention. A tired UPS gives the same confidence with none of the protection.

The second is graceful shutdown. The real job of a UPS isn’t to keep the office running through a power cut. It’s to give the kit enough time to shut down cleanly so storage doesn’t corrupt. A UPS that holds for 30 seconds instead of the rated 10 minutes can’t do that. Servers crash mid-write, file systems need a fsck on reboot (an emergency check that can take hours), and what was meant to be a 20-minute power cut becomes a four-hour recovery.

The third is the invisible risk. UPS batteries degrade predictably: capacity drops to about 80% by year three, 50% by year five, and below useful by year seven. The device doesn’t know. The LEDs don’t change. The unit reports “online and healthy” right up until the moment the mains drops and there’s nothing in the battery to deliver.

Common failure modes

The patterns we see when UPS estates haven’t been managed:

Original batteries, never tested. The most common single failure. Install once, forget for a decade.
Wrong-size load. The UPS was specced for two servers and a switch in 2018. The rack now has six servers, two switches, and a SAN (a shared storage unit). The runtime is a third of what’s labelled.
No graceful-shutdown trigger. The UPS holds the load when the mains drops, but nothing tells the servers to shut down. When the battery dies, the servers crash hard.
No remote monitoring. Nobody knows the battery’s been at 40% capacity for the last 18 months because nobody’s checking.
Mixed-age estate. Two UPSs in two different cupboards, installed in different decades, with different management interfaces and different battery cycles. Nobody owns the schedule.

Each of these turns a piece of resilience infrastructure into a liability.

The four questions to ask

Forget “still works”. The four questions that matter:

Question 1: When did the batteries last get tested under real load?

What this means in practice: when did somebody actually pull the mains and confirm the UPS held for the time it’s meant to?

Self-test cycles inside the unit are useful but not enough. They check the charging circuit. They don’t simulate a sustained load draw on the cells. A proper test is to schedule the swap into a quiet window, pull the mains feed, and time how long the unit holds the actual load (not the rated load, the live load). We do this annually on every managed UPS. If the result is less than 70% of rated runtime, the batteries are due.

Question 2: What load is the UPS actually carrying?

What this means in practice: how many watts of equipment are plugged into it, and what runtime does that imply at the current battery capacity?

UPSs are specified in volt-amps (VA) and watts (W). A 1500VA / 1000W unit will give roughly 10 minutes of runtime at half load and roughly 4 minutes at full load; those are headline numbers from a fresh battery. If the rack has crept up to 90% load over the years, the runtime collapses. The check is to measure: most managed UPSs report current draw in the management interface. If you don’t have visibility, plug a clamp-meter on the input for a working day and read the peaks.

A UPS at 90% load is on its way to overload trips. A UPS at 30% load is wasting your money. The sweet spot is 40-60%.

Question 3: What happens at the end of the battery window?

What this means in practice: does the kit on the UPS get shut down gracefully, or does it crash?

This is the question most often missed. The UPS holds the load for as long as it can; at end-of-battery, the inverter cuts. What’s plugged in either has its own UPS-shutdown agent (server connected via USB or network, runs a “save state and shut down” command 30 seconds before the cut) or it doesn’t (kit just dies).

For every server on a UPS, there should be a shutdown agent installed and tested. For network kit there usually isn’t, since the switches just crash, which is acceptable as long as the recovery is fast. The differentiator is whether anything writeable is on the protected estate without a shutdown agent, that’s where the file-system corruption risk lives.

Question 4: Who’s notified when the UPS does something?

What this means in practice: when the mains drops, when the battery goes self-test-failed, when the runtime estimate drops below threshold, does anyone find out in time?

A UPS with no remote monitoring is a UPS that fails without anyone noticing. Modern units have network management interfaces that can email or SNMP-alert when something happens. Setting it up takes 30 minutes and turns the device from a passive box to an active part of the monitoring estate. The alert needs to go to a watched inbox or a monitoring platform, not the same generic info@ that nobody reads, the same trap we covered in the SSL post.

If the answer to all four questions is “yes”, the UPS is fit for purpose. If any answer is “no” or “I don’t know”, it’s not, regardless of whether the LEDs say it’s fine.

The decision frame

Run the four questions and the answer falls into one of three lanes.

Refresh the batteries. Unit is otherwise sound, load is appropriate, monitoring is in place, but the batteries are over three years old or have failed a load test. Batteries are typically £200-£600 for an SME-class unit. Half-day swap, no chassis replacement.
Replace the unit. Either the unit is over seven years old (capacitors degrade, inverters wear), or the load has outgrown the chassis, or the management interface is too old to do what’s needed today. Full replacement, planned outage, typically £800-£2,500 plus install.
Restructure the protection. The single point-of-failure UPS isn’t fit for what the business now is. A second unit in N+1 configuration (one extra, so if one dies the others still cover the load), or a generator behind the UPS, or a move of critical workloads to cloud where the UPS question disappears. This is the conversation when business continuity has outgrown what a single box can deliver.

Most refreshes are option 1 or 2. Option 3 is a longer scoping conversation.

Where SMEs trip

Two big ones come up repeatedly. The first is delaying the refresh because the UPS “still works”. The whole point of this post is that “still works” isn’t measurable. The annual load test is what tells you whether the UPS works. If you’ve never done one, you don’t know.

The second is treating UPS as part of the seven-year infrastructure cycle (which we covered in the hardware refresh post) without separating the battery cycle from the chassis cycle. The chassis can run for ten years. The batteries can’t run for five. Mix them up and you either replace too often or too rarely.

What good looks like

When this is working, every UPS in the estate has a known age, a known battery date, a known load percentage, a known runtime under that load, a working shutdown agent on every connected server, and a monitoring alert going to a watched destination. The annual test confirms it. The replacement schedule is on the same 36-month rolling calendar as the rest of the hardware refresh. Nobody describes the UPS as “still working”; they describe it by its measured holdover time, which they know because somebody tested it.

That’s the goal: no placebo boxes in the comms cupboard.

Where this lands with us

UPS management sits inside our Managed Services practice. For managed clients we hold the inventory, run the annual load test, schedule the battery refresh, monitor the alerts, and replace the units when they reach end-of-life. For self-managed clients we’ll do the four-question audit and a refresh plan and hand it over.

Either way, the placebo UPS isn’t a saved cost. It’s a four-hour recovery the next time the mains drops, a corrupted file system on a server that didn’t shut down cleanly, and a board conversation about why the kit you’d paid for didn’t do the one job it was bought to do. The four-question audit takes an afternoon, and the replacement plan takes a week.

If you’re worried the UPS in the cupboard might be on a placebo run, that’s our Managed Services practice. Drop us a note at info@jmopartners.co.uk and we’ll do the four-question audit.

JMO|Partners · Enterprise IT, sized for SMEs.

UPS refresh: when "still works" isn't the right question

Why this matters more than it looks

Common failure modes

The four questions to ask

Question 1: When did the batteries last get tested under real load?

Question 2: What load is the UPS actually carrying?

Question 3: What happens at the end of the battery window?

Question 4: Who’s notified when the UPS does something?

The decision frame

Where SMEs trip

What good looks like

Where this lands with us

Book a free 30-minute IT & security health check

Thanks — we got it.

UPS refresh: when "still works" isn't the right question

Why this matters more than it looks

Common failure modes

The four questions to ask

Question 1: When did the batteries last get tested under real load?

Question 2: What load is the UPS actually carrying?

Question 3: What happens at the end of the battery window?

Question 4: Who’s notified when the UPS does something?

The decision frame

Where SMEs trip

What good looks like

Where this lands with us

Book a free 30-minute IT & security health check

Thanks — we got it.

More from JMO|Partners

SSL renewals that don't surprise anyone: the five-touchpoint checklist

Cyber Essentials Plus renewal: the prep-window calendar

Hardware refresh cycles: a 3-5-7 year decision frame