← Back to The Feed
INDUSTRY 12 min read

Data Centre Cooling: Why HVAC Monitoring Is Non-Negotiable

Cooling is 30–40% of your data centre’s energy bill. When a CRAC unit fails at 3am, you have minutes — not hours. Here’s how intelligent monitoring changes the equation.

Australia’s Data Centre Boom

The Clean Energy Finance Corporation (CEFC) has projected that data centres could consume up to 11% of Australia’s total energy by 2035. That’s not a rounding error — it’s a category-shift in how we think about national electricity demand.

Australia has become a prime data-centre location for reasons that don’t show up in a brochure: political stability, abundant solar potential, sovereign-grade security standards, and direct subsea cable links into Asia-Pacific. Microsoft, Google, and AWS are all aggressively expanding Australian data-centre capacity. Domestic operators are building out tier-III and tier-IV facilities in Sydney, Melbourne, and increasingly Brisbane and Perth.

Every new data centre needs precision cooling. And every precision cooling system needs monitoring. Not a nice-to-have. Not a phase-two upgrade. Monitoring is the thing that separates a facility that runs at its designed SLA from a facility that finds out about a cooling failure when IT starts throttling.

A typical office building can tolerate 2–3 hours of HVAC failure before anyone complains. A data centre has 10–15 minutes before servers start thermal throttling, and 30–45 minutes before hardware damage begins. Your SLA clock started the moment your cooling did something you didn’t know about.

Why Cooling Is The Biggest Cost

Cooling accounts for 30–40% of a data centre’s total energy consumption. In hyperscale facilities with aggressive engineering, that number drops into the low 20s. In older enterprise facilities — especially those retrofitted into buildings not designed for data-centre loads — it can climb above 50%.

The industry standard metric is PUE — Power Usage Effectiveness — defined as Total Facility Energy ÷ IT Equipment Energy. A PUE of 2.0 means you spend as much on cooling, lighting, and infrastructure as you do on the actual computing. A PUE of 1.0 is a theoretical limit — it would mean zero overhead.

Here’s what a 0.1 PUE improvement is actually worth. On a 1 MW data centre running at Australian commercial electricity rates of roughly $0.25–$0.35/kWh:

“A 0.1 PUE improvement saves $80,000–$120,000 per year on a 1 MW facility. Monitoring is how you find that 0.1.”

Hot-aisle/cold-aisle containment is the foundation — separating the discharge air of servers from the intake air so you don’t cool air you’ve already cooled. Blanking panels, brush strips, and proper rack layout extract the next few percentage points. But the biggest opportunity, and the one most operators leave on the table, is free cooling.

Free cooling potential by Australian city

Free cooling can reduce mechanical cooling energy by 30–50% during available hours. The catch: you only capture those hours if the system switches modes correctly. Manual switchover loses hundreds of hours a year to conservatism. Automatic switchover requires accurate monitoring of indoor load, outdoor wet-bulb temperature, and chilled-water return temperature — all in real time.

Precision Cooling vs Comfort Cooling

Most HVAC systems in Australia are comfort cooling — designed for people. They hold 22–24°C with a ±2°C tolerance and moderate humidity control. If they drift a couple of degrees on a hot afternoon, occupants notice but nothing breaks.

Data centres need precision cooling — designed for equipment. Tight band (20–22°C at rack inlet), tighter tolerance (±0.5°C), strict humidity control (40–60% RH), and 24/7 operation with no failover grace period. The equipment is categorically different:

Why does monitoring matter more for precision cooling than comfort cooling? Three reasons. First, tighter temperature bands leave smaller margins for error — a 1°C drift that’s invisible in an office is a deviation event in a data hall. Second, redundancy (N+1, 2N) must be verified, not assumed — a standby CRAC that hasn’t run in six months may not run when you need it. Third, humidity control matters: too dry and you get static-discharge events, too wet and you get condensation and corrosion on connectors.

Your building’s VRF system is designed for comfort. A data centre’s cooling system is designed for survival. The equipment looks similar from a distance. The monitoring requirements are fundamentally different.

What Data Centre Monitoring Actually Looks Like

A properly instrumented data centre monitors four nested layers. Miss any one of them and you’re flying blind on something that matters.

1. Room level

2. Rack level

3. Equipment level

4. System level

“Redundancy only works if you know it’s there. Monitoring verifies your backup cooling is ready — not just installed.”

The 3am Failure Scenario

Here’s how the same cooling fault plays out in two data centres, one without monitoring and one with it. Numbers are realistic for a 400 kW enterprise data hall with N+1 cooling.

Without monitoring

03:07
CRAC Unit 3 trips on high head pressure. No-one is notified. Remaining units absorb the load initially but drift above design capacity.
03:22
Row 4 rack-inlet temperatures cross 27°C. Servers begin thermal throttling. Application latency increases. No alert fires — the BMS is set to alarm at 30°C.
03:45
Row 4 hits 32°C. Hardware thermal protection kicks in. Servers begin auto-shutdown to avoid damage. Application failures cascade.
04:15
NOC escalates to facility manager after customer tickets spike. On-call technician dispatched. Arrives on site 05:10. Finds CRAC 3 offline, blocked condenser coil.
05:45
Cooling restored. Servers restarted in sequence. Total outage: 2h 40m on affected rows. SLA penalties triggered.

Business impact: $50,000–$500,000 depending on tenant SLA terms, plus reputational damage on the next renewal conversation.

With monitoring

03:07
Nexus iQ detects CRAC 3 offline within 60 seconds. Alerts fire to facility manager, on-call technician, and NOC simultaneously.
03:08
Standby CRAC 5 is automatically verified as running and carrying load. Remaining units redistribute demand. Rack inlets remain within 1°C of setpoint.
03:09
Facility manager acknowledges alert from their phone. Dispatches on-call technician with CRAC 3 diagnostic snapshot pre-attached.
03:45
Technician arrives with the right parts because monitoring has already identified the condenser fault. Clears coil, restarts unit. Full N+1 restored.

Business impact: Zero downtime. Zero thermal events. Zero SLA penalties. $200 call-out fee vs a $50,000+ SLA breach on the silent-failure timeline.

Read the full story: The 5am Alarm That Saved a Data Centre →

Don’t wait for the 3am call.

Nexus iQ monitors every CRAC unit, every rack temperature, every cooling circuit — and alerts you the moment something changes.

See How It Works →

PUE Optimisation Through Monitoring

Monitoring isn’t just a failure-avoidance tool. It’s the single most effective lever for driving PUE down over time. Five concrete mechanisms:

  1. Identify over-cooling. Most data centres run colder than necessary “just in case.” Monitoring reveals actual thermal loads at the rack so you can raise setpoints safely — ASHRAE A1 allows up to 27°C at the inlet for modern equipment. Every 1°C higher setpoint typically reduces mechanical cooling energy by 2–5%.
  2. Optimise free-cooling switchover. Monitoring outdoor wet-bulb, chilled-water return, and IT load in real time tells you exactly when to switch to economiser mode. Manual switchover loses 200–500 free-cooling hours per year on a typical Sydney facility.
  3. Detect hot spots. Rather than cooling the entire room to satisfy the hottest rack, monitoring pinpoints specific hot spots for targeted fixes — blanking panels, airflow redirection, or relocating high-density racks. Cheaper and far more energy-efficient than bulk over-cooling.
  4. Track degradation. A CRAC unit’s cooling capacity degrades over time — dirty condenser coils, declining refrigerant charge, worn fan bearings. Monitoring catches the slow decline in delivered capacity against nameplate before it becomes a redundancy problem.
  5. Validate redundancy. Continuous monitoring proves your N+1 is really N+1, not N+0.8 because one unit is underperforming and nobody noticed. You cannot pass a DCMM audit on “we’re pretty sure.”

What monitoring is worth — by category

PUE componentWithout monitoringWith monitoring
Cooling setpointFixed 18°C “just in case”Dynamic 21°C based on actual rack-inlet load
Free coolingManual switchover, missed hoursAutomatic, maximised capture
Hot-spot managementOver-cool entire roomTargeted airflow fix, save energy
Equipment degradationDiscover at failureTrend and service proactively
Redundancy statusAssumed availableContinuously verified
Typical PUE result1.6–1.81.3–1.4

Australian-Specific Considerations

“When a CRAC unit fails at 3am, you have minutes before servers start throttling. Monitoring is the only way you find out in time.”

Getting Started

Whether you’re building a new data centre or optimising an existing facility, monitoring is the foundation. Not the finish line.

  1. Book a Demo. See Nexus iQ monitoring a live data-centre cooling system. Real CRAC units, real rack inlets, real PUE calculation.
  2. Connect your system. The Nexus 32 gateway connects to CRAC units, chillers, and environmental sensors via the native protocol — no BMS gateway required.
  3. Stop guessing. Replace “we’re pretty sure redundancy is fine” with a dashboard that tells you exactly what’s running, what’s degrading, and where the next 0.1 PUE point is hiding.

See intelligent HVAC monitoring live.

Book a free demo and see Nexus iQ transform cooling visibility in your facility.

Get in Touch