Category Archives: Critical Environment Management

Business Continuity Management-Mission Control Facilities (Data Centres, Airports)

Data centres have undergone significant evolution since the introduction of mainframe computers in 1945, leading to the emergence of various types, including Enterprise Data Centers, Multi-Tenant/Colocation Data Centers, Cloud Data Centers, Edge/Micro Data Centers, Hyperscale Data Centers, and Telecom Data Centers. Over the past four to five decades, the digital economy has experienced exponential growth, positioning data centres as pivotal components of the digital ecosystem. The reliability, resiliency, and restorability of utility infrastructure supporting data centres have garnered the attention of stakeholders, designers, construction service providers, and facilities management teams. In response to evolving business requirements, operational teams have refined techniques and procedures, particularly following significant events such as the dot-com bust of the year 2000 and the financial crisis of the year 2008.
Business continuity and disaster recovery are essential organisational procedures designed, assessed, and implemented for mission-critical facilities like Data Centres and airports. Power interruptions, cooling and water system failures and human errors are the predominant causes of operational failures. Facilities Management underscores the significance of disaster recovery and business continuity plans within the Service Level Agreement by attending to safety, legal and regulatory compliances, utility systems, and workforce challenges.

Business continuity strategies

Business Continuity Management encompasses any one-off or a combination of the following strategies:
 Active/Backup Model – Maintaining an active backup site to ensure the continuation of all mission-critical activities.
 Active Split Operations Model – The operations of an affected site may be delegated to multiple remote operating active sites.
 Alternate Site Model – Regularly alternating between primary sites.
 Contingency Model – Arranging necessary resources at the location in case of breakdowns.
In every Business Continuity Model, the ‘Maximum Tolerable Period of Disruption’ (MTPD) ranges from a few minutes to a couple of days annually. The organisation establishes ‘Minimum Business Continuity Objectives’ (MBCO) for each mission-critical asset operating in stand-alone status.

Data Center operations depend on business-critical utilities like Electrical Power Distribution, Uninterrupted Power Supply, Battery Bank, Cabling, Cooling Systems, Water Management, Fire Alarm and Suppression Systems, Security, Surveillance and Access Controls, Suppliers, Specialist Service Partners, and Support Manpower, which necessitate ongoing assessments, upgrades, and validation of risk mitigation strategies.

Business continuity management process flow

1. Program management –
The design basis for constructing electrical power distribution in a data centre is established to maintain the desired levels of availability and reliability of the system. Service level agreements with the service providers are designed to adequately reflect key objectives of business continuity, such as the Minimum Business Continuity Objectives (MBCO), Maximum Tolerable Period of Disruption (MTPD), and Recovery Time Objective (RTO). Generally, a minimum availability of 99.982% for Tier-3 and 99.995% for a Tier-4 level site is stipulated in Service Level Agreements. A specialised team must assess, prepare for, respond to, and manage natural or artificial disasters and system breakdowns. This team coordinates logistics for both internal and external support, prepares budget estimates, and oversees essential crisis management actions.
2. Risk and business impact assessment-
o Safety risk
 An assessment of safety risks associated with the electrical power distribution and cooling system must include comprehensive electrical load flow analyses and short-circuit studies. This evaluation should address the identification of thermal anomalies in electrical nodes, cable degradation, malfunctions of switchgear, incidents involving bypassing or malfunctioning safety interlocks, nuisance tripping, detection of unsealed openings facilitating rodent access within switchboards, and inadequacies in the as-built documentation of the power network. Furthermore, a systematic, integrated testing program must verify the reliability of interconnected fire safety alarms, suppression, access controls, and electrical and ventilation systems.
o Non-compliance and nonconformity risk
 Risk and business impact analysis will necessitate sufficient construction design details, documentation regarding non-compliance and nonconformity with electrical codes and regulatory standards, clearances from local governmental authorities, as-built system drawings, and walk-through observations.
o Operation risk
 Documentation – Inadequate or absence of design and construction details, operating procedures (SOP, MOP, EOP), and troubleshooting charts.
 A yearly system testing program will pinpoint potential risks for sourcing clean, dependable power and uncover opportunities for cost-effective risk management solutions.
 Identify the “Single Points of Failure’ within the power distribution network and cooling systems, particularly those potential failures that may be ascribed to human error and loss of standby redundancy.
 Failure Modes and Effect Analysis (FMEA) evaluation for equipment, components and technology upgrades.
o Environmental risk
 Identify and assess potential environmental hazards, such as
• Flooding of all or part of the site
• Fire or failure to preserve fire suppression system
• Overfilling fuel or containment storage tanks leading to spillages
• Untreated or partially treated sewage water,
• Vandalism
• Pandemics, and
• Water and air contamination.
o Suppliers and support network risk
 Identify and establish priority spare components and equipment based on
• Frequency of failures
• Operational criticality of spare components or equipment
• Cost impact
• Environmental impact
• Expected useful service life of the component or equipment
 Identify dependencies on support resources such as suppliers, outsourced workforce, and other elements.
 Response time and Resolution time SLA with suppliers and support teams.
3. Obsolescence management –
Assess the service life of equipment (Transformers, Diesel Engine Generators, UPS, Battery banks, Switchboards, Static Transfer Switches, Circuit Breakers, Power Cables, Central Chilling plant, Computer Room Air Conditioners, Water Plant, Lifts)
o Condition assessment
 Periodic condition assessment will include tests to identify hot spots, insulation degradation, load flow, short-circuit analysis, and grounding system tests.
 Partial discharge test of VRLA battery bank(s) with a variable load bank.
 Vibration and Noise analysis of rotating equipment
 Electromagnetic field, Acoustics emission tests, Air and water infiltration tests for construction structures and water piping networks.
o Repairability and replaceability of equipment
 Documentation– manufacturer’s manual for diagnostics, disassembly instructions, and repair tips.
 Modularity and accessibility – modularity of components and ease of disassembly
 Spare parts – availability, costs, standardisation
 Software – open-source compatibility, upgrade version
 Frequency of failures
 Non-compliance with legal or regulatory guidelines
o Business impact analysis will include loss of redundancy and minimum level of service acceptable to business.
4. Business continuity action plan –
• Resource planning must encompass support from the in-house team, service providers, and material suppliers.
• The facility’s support network should involve government authorities and specialists who can offer guidance and logistics in the event of a disaster.
• A team comprising both in-house and outsourced personnel should possess the requisite knowledge of environmental regulations and expertise in safety, health, and the subject at hand. A Responsible, Accountable, Consulted, and Informed (RACI) matrix must be established.
• The financial impact of risk mitigation measures should be evaluated and acknowledged concerning the business impact across each disaster recovery scenario.
• The in-house team must be evaluated and trained to gather support during a crisis. The call tree during a crisis should include property stakeholders, business owners, and on-site senior management.
The business continuity plan of action for the data centre utility and support system must include the following –
– Addressing concerns around safety and security systems based on risk findings.
– Protection system coordination and harmonics treatment
– Legal and regulatory compliance and documentation, including construction design details.
– Capacity management of critical equipment and systems
– Managing standby redundancy of equipment and system
– Performing Predictive and Proactive maintenance
– Repair, replace or upgrade systems to enhance reliability
– Failure Reporting Analysis and Corrective Action System (FRACAS) in place
– Develop training programs for in-house and outsourced workforce engaged full-time or call-out.
5. Competency and training program for support workforce –
o Competency assessment must include
 Contract Manager
 Facility and Operation Manager
 Engineers and Technical Supervisors
 Technicians
 SHEQ members
o Skill requirements
 Must match operation requirements of knowledge and experience.
The training program must include
 Safety risk management
 Environment impact management
 Data Centre design objectives
 SOP, MOP, EOP
 Practices
The numerical count of Full-Time Employees (FTE) must meet the requirements of the workload and criticality of the Data Centre.
6. Review and validate –
A desktop review of the Business Continuity Plan must be supported by historical breakdown data, manufacturers’ equipment guidelines, legal and regulatory compliance documentation, and an annual comprehensive testing program that establishes alignment with the business objectives. Key performance indicators for service providers must be established to meet the minimum business continuity objectives (MBCO), maximum tolerated period of disruption (MTPD), and recovery time objective (RTO).

LOAD FLOW AND SHORT CIRCUIT STUDY FOR A MISSION CRITICAL FACILITY

  • 1. What is a Load Flow and short-circuit study?
     A Load Flow study is an iterative method for analysing system voltage, current, and power in a power distribution network under stable and fault conditions.
     Gauss-Seidel, Newton-Raphson, and Fast Decoupled methods are commonly adopted iterative methods to study load flow.
    2. Why is a Load Flow & Short-circuit Study essential?
    Today’s mission-critical businesses are highly automated, designed to be dynamic in response to variations in business needs, with stability and reliability as essential requirements. UPTIME INSTITUTE has identified the major causes of Data Centre failures, most of which are attributable to power interruptions, followed by server crashes and other factors. This situation has necessitated the establishment of a steady and reliable power distribution system that is designed to meet business needs in the most optimised manner. Load Flow and Short Circuit studies are among the most important analyses for power distribution networks.
    These studies can provide a range of critical requirements for operations.
     Enhance the stability and reliability of the power distribution network.
     Prevent nuisance tripping, isolate faulty sections during faults, and minimise the impact on healthy network components.
     Recalibrate and reset protection relays to function within their design specifications.
     Conduct a feasibility study and plan to implement significant changes in power sourcing and load connections.
     Perform an iterative study of interconnected power networks with incremental changes and transient simulations Procedures.
    3. Input data requirements
    Accurate data entry is crucial for load flow and short-circuit scenario analysis. It includes General information, System data, Bus data, Load types, Power distribution network data, and Power sourcing data such as generating equipment, transformers, and renewable sources.
     A single-line Diagram indicating the equipment nameplate data of Transformers, Diesel Engine Generators, UPS, Power Distribution Boards, Capacitor Banks, and connected loads.
     System Voltage, MVA and X/R ratio
     Impedance in  or % Per Unit of Power Transformers, all Feeders
     Maximum Load Current and Prospective Loading
     Current and Potential Transformers (CT and PTs) and Performance Curves
     Existing Protection Devices, Settings and Time Current Characteristics
     Reactive Power (KVAR) Control, Voltage Control, and the Scheduled Power Factor (pf) of the system.
    4. Data Collection
     Single Line Diagram
     Nameplate details of Transformers, Diesel Engine Generators
     Power cable runs, types, size and length
     Details of the Switchgear Panel, UPS, Power Distribution Boards
     Protection relays settings
     Manufacturers’ (TCC) data Time Current Characteristics curve of protection devices
     Current Load Data
     Branch network data
    5. Software Tools
     Software encompasses programs designed to implement, evaluate, and execute short-circuit protection system coordination, load flow analysis, harmonic analysis, system stability, motor starting, and grounding.
     Software tools for analysing nonlinear power flow conditions are applied online and offline.
     Online software (ETAP, DigSILENT, PSCAD) assesses and manages real-time load flow and can be integrated with BMS or the plant SCADA system to control and regulate KVAR, active harmonic compensation, and bus voltage to optimise power flow management.
     The software program can be used to control the switching ON and OFF of power sources, connected load, and safety features designed for interconnecting various sources and load centers.
     Offline software (ETAP, PSS®SINCAL, EA-PSM) is commonly adopted to investigate and establish optimal power flow through what-if scenario analysis, forming the basis for future power sourcing integration, load management, capacitor bank installation, renewable energy integration, and predictive maintenance planning of power distribution systems. Examine the stability of voltages at all buses within the specified limits.
    6. Analysis Observations
     Grid power capacity and availability
     Adequacy and resiliency of grid and backup power bus capacity, power cable ampacity, circuit breakers, isolators, online switches, and UPS capacity
     Power interruption scenario analysis
     Short circuit analysis
     Protection system coordination
    7. Load flow and short circuit analysis ensure power network reliability, guiding the planning of cables, switchgear, and protection elements.
    8. Reference Standards
     IEEE 3002.2-2018
     IEEE 242-2001
     IEC 255-3
     IEC 61642

Data Centre – Condition based Challenges and Maintenance

Maintenance tests for mission-critical facilities assess, analyse, and enhance the reliability of building systems. The challenges associated with operating and maintaining Data Centres, Mobile Switching Centres, and similar properties encompass fire and life safety, structural integrity, electrical systems, rotating equipment, controls, communications, and security. An annual test programme addresses concerns related to critical assets.

Physical Infrastructure Challenges for a Mission-Critical Technology Property

Most physical infrastructure components are in serviceable condition, aged 5 to 30 years. VRLA battery banks may require overhauling within five years. In contrast, construction structures, drainage and water piping networks, pumps, cooling system equipment, electrical power distribution systems, and electronic surveillance and controls can function effectively for 10 to 30 years. Age-related wear and operational degradation of these systems or elemental deterioration are significant concerns for mission-critical technology properties. The selection of non-destructive condition tests supported by analytics plays an essential role in optimising maintenance costs and enhancing the reliability of the property.

COMMON CONDITION SURVEYS APPLICABLE FOR BUILDING INFRASTRUCTURES ARE
 THERMOGRAPHY
 POWER QUALITY
 ELECTROMAGNETIC FIELD
 ACOUSTIC EMISSION MONITORING
 VIBRATION MONITORING

  1. Locational of Data Centres –
    • Requirements
      • Risk assessment to address concerns from environmental degradation
    • Risks with location of the property are –
      • Inadequate Environment Impact Assessment-related information is available from the Property Management team.
      • Frequent failure of electronic components due to contaminated indoor air.
    • Maintenance tests
      • Indoor and outdoor air and ground water quality tests can reveal contamination and necessary corrective actions.

2. Construction structures –

  • Requirements
    •  The load-bearing capacity of floors, ceilings and roof structures must meet the minimum requirements of the equipment point and distributed load, including expansion and safety margin.
    • Interior and exterior walls must be resistant to climatic risk elements.
    • Interior construction must provide for adequate airtightness and watertightness.
  • Challenges
    • Mobile Switching Centers and Data Centers housed in old legacy buildings are usually repurposed from regular office usage.
    • Critical information like building structural design data is missing.
    • Information on maximum bearable load by construction is missing.
    • Information on seismic zones, floodplains, etc, and topographical disasters are unavailable with the Facility Management team.
    • Technical space has expanded over the past years with minimal or no inputs on the structural integrity of the building.
    • Cracks and spalding concrete are visible in the building’s exterior and interior.
    • Mold formation and water and air ingress were noticed.
  • Maintenance Tests for construction structures
    • Rebound Hammer Test
    • Concrete core cutting and Compression test
    • Half Cell Potential Test
    • Ultrasonic test
    • Rebar scanning test
    • Thermography
    • Air and water infiltration test
    • Roof flood test
    • Acoustic emission and Eddy current tests for underground and overground storage tanks.

3. Rotating equipment (Fans, blowers, compressors, pumps)

  • Requirements
    • Reliability and availability of equipment and systems
  • Challenges
    • The expansion or decommissioning of technical space within the building often fails to take adequate design considerations for water pumping and ventilation systems.
    • Repetitive failures of rotating equipment.
    • The Energy Metering system is often not installed for the water pumping and ventilation fans.
    • Fans and Pumping equipment contribute 15% of total Data Center Energy consumption.
  • Maintenance tests
    • Acoustic emissions test
    • Vibration test
    • Wear and Oil analysis
    • Thermography for pipework

4. Air-conditioning and ventilation systems

  • Requirements
    • Resistant to fire hazards
    • Reliability and availability of system and standalone equipment
    • Control and extraction of smoke
  • Challenges
    • Smoke extraction fans require periodic testing
    • Controls of interconnected building systems with fire alarm systems require periodic testing.
    • Compliances with building codes
  • Maintenance tests
    • Acoustic emissions test
    • Vibration test
    • Noise analysis
    • Stairwell smoke extraction system test
    • Interconnected systems response test – Fire alarms, Lifts, Ventilation fans

5. Electrical power distribution

  • Requirements
    • Resistant to fire hazards
    • Reliability and availability of system and standalone equipment
  • Challenges
    • Absence of adequate metering system
    • Inadequate information on construction design details
    • Uncontrolled change management process over long tenure of operations
  • Maintenance Tests
    • Insulation resistance test
    • Short circuit and load flow studies
    • Harmonics, load stability analysis
    • Protection system testing, calibration and coordination
    • Thermography

 

Ten challenges to address in the annual power-down testing programme of the Data Centre

The operations and maintenance of mission-critical facilities present challenges unique to site conditions, the operating team, and client business requirements. Addressing these challenges requires a detailed evaluation of existing systems, exploration of improvement opportunities, and fulfilment of client needs for reliability, availability, and maintainability of property assets. An annual testing programme and a comprehensive assessment of support systems are essential for enhancing dependability, including workforce up-skilling and a capital investment programme focused on improving technology and performance efficiency.

The techniques utilised in the annual testing programme include ‘Pull the Plug’, ‘End-to-End’, ‘Variable Load Bank’, and non-destructive condition tests of standalone building systems and equipment. A combination of these testing methods is utilised as part of the annual test programme to achieve the best evaluation and improvement of the reliability and availability of building systems.

Challenges commonly addressed in the annual test program –

  1. Safety concerns regarding fire protection and life safety hazards due to the failure of single or multiple critical systems.

o The fire protection and life safety systems test programme, in compliance with regulatory codes, establishes the integrity of the systems to function as intended and to repair or replace faulty elements.

o Concerns regarding the operational integrity of interconnected systems with the fire alarm and suppression systems, such as ventilation fans, lifts, emergency power sources, emergency lighting, access controls, and call-out protocols, can be addressed in the annual testing program.

2. Skill-gap of personnel managing the property support utility systems.

o Implementing a comprehensive competency assessment programme, followed by an up-skilling initiative, can address knowledge and skill gap issues.

3. Reliability and availability of the building utility system

o The Annual Integrated System, which combines the ‘standalone equipment performance evaluation’, ‘Pull the Plug’, and ‘Variable On-load Integrated System Test’, can establish the resilience of the existing systems.

o The on-load test helps to analyse the stability and integrity of power and water systems, ensuring restoration during grid and pump failures, respectively.

o The on-load Test programme can address various concerns, including:

4. Checking the functional integrity of interconnected system controls and measurements.

    • The Failure Reporting, Analysis, and Corrective Action System (FRACAS) procedure adopted in the on-load test program encourages knowledge transfer by allowing the operations team to witness and participate in the on-load tests.
    • The on-load testing methodology can facilitate condition assessment through thermography, power quality assessment, load flow and stability study, smoke extraction and cooling performance evaluation.

o The test outcomes lay the groundwork for enhancing the capital investment program to improve technology, capacity, and performance efficiency.

5. Inadequate capacity management of building utility system

o A load bank test regime can explore means of efficient capacity management under various loading scenarios.

o Forecasting demand load and spare capacity available to meet near-future needs.

6. Missing construction details of the property.

o The annual test program is created to establish the command logic of system interoperability through command and controls.

o As-built drawings of electrical and water management systems can be developed based on observations of end-to-end integrated system tests.

7. Inadequate metering and sensing systems to gauge the key performance indicators of the property are a familiar challenge property managers face.

o Annual power down test with variable load bank allows one to measure key operational indicators and evaluate performance integrity and efficiency. Baselining performance indicators helps in optimising operations.

8. The resiliency of the electrical power distribution system under fault scenarios is a significant concern for the property manager.

o Annual testing and coordination of the protection system alleviates worries about the integrity and resilience of the power distribution system following modifications or expansions of connected systems.

9. The aging building’s equipment and systems raise concerns about the construction’s physical condition, the equipment’s safety, and the expected useful life of these components.

o Non-destructive condition tests for civil structures, electrical power equipment, and rotating machinery can indicate the state of these systems and components. A thorough obsolescence management programme can be established through the annual testing program.

 An ageing battery bank with an Uninterrupted Power Supply (UPS) or Switched-Mode Power Supply (SMPS) raises concerns about its capacity to support current and future demand loads. A replacement programme is a high-capital-investment preventive maintenance initiative that requires carefully assessing its health condition and autonomy reliability.

o Partial discharge testing of battery banks with a load bank can reveal their health condition and potential risks of failure. Based on the test results, a decision can be made regarding replacing degraded cells in the circuit.

10. Wet stacking of backup diesel engine generators due to partial loading is a common issue faced by facility engineers.

o On-load performance testing can tackle the wet stacking problem and help establish a health check for the diesel generator sets.

Comments

Photo of Partha Lodh

Like

Comment

Share
Comments settings

Add a comment…
Open Emoji Keyboard

No comments, yet.
Be the first to comment.

Start the conversation

Data Centre Property Dependability Improvement Program

The rapid digital expansion across industries such as telecommunications, banking, government services, and manufacturing has heightened the demand for increased resilience and reliability in backend infrastructures, including Data Centres and Mobile Switching Centers. Facility Managers are responsible for performing thorough assessments of the dependability of the building superstructures, substructures, utilities and support systems.
The Improvement Programme for Critical Property Dependability prioritises an annual comprehensive assessment of the property’s reliability, availability, and maintainability, the competence of maintenance support personnel, and compliance with statutory and regulatory requirements.
Property Dependability Management Challenges:
• Collecting information from property owners and stakeholders.
• Evaluating the dependability requirements of end-customers.
• Addressing the lack of necessary in-house or third-party team knowledge and skills.
• Developing a cost-efficient property condition assessment program.
• Obtaining approval for a dependability enhancement initiative.
• Executing a comprehensive risk management program.
• Analysing outcomes from condition assessments of property elements.
Develop and implement a Dependability Improvement programme for the business owner(s) and stakeholders.
1. Information needed from the Property Owner(s) and stakeholders at the outset may include the following elements.
 Construction details, drawings, commissioning reports, and Operation Manuals.
 Status of operating licenses and compliance certificates for statutory and regulatory requirements.
 Safety inspection records in recent past.
 Building energy performance and systems functionality checks records.
 Resource allocation for operations and maintenance services.
2. Dependability needs assessment
 Ensuring the safety of property and life.
 Addressing gaps in statutory, regulatory, and international standards and best practices.
 Assessing the ‘Residual Useful Life’ of assets for obsolescence management.
 Evaluating the reliability, availability, and maintainability of building utilities.
3. Setting Objectives for the Dependability Improvement Program
 A comprehensive assessment of the building’s structural integrity, asset maintainability, maintenance support, safety risk management, reliability, and availability of essential utilities, compliance with statutory and regulatory requirements, and meeting current business needs while accommodating future expansion requirements.
 Develop and implement a Dependability Improvement programme for the business owner(s) and stakeholders.
4. Create a Plan of Action
 Pre-assessment Planning
 Carry out a walk-around survey of the property to identify the boundary limits of critical elements to include in the programme.
 Create a programme for the ‘Property Condition Assessment’. Take into account the criticality of asset elements and standard periodicity.
 Develop customised Test Worksheets, risk matrix, and resource allocation (skilled manpower, testing equipment)
5. Implementation
 Conduct a Property Condition Assessment of selected critical assets.
 Perform a comprehensive feasibility and risk assessment.
 Evaluate the choice of tests and cost efficiency.
 Conduct an ‘End-to-End Integrated System Test’ comprising, but not limited to, the following elements:
– Emergency Power System
– Fire Protection System
– Individual, Integrated, and Interconnected Systems
– Life Safety Systems
6. Review the outcomes of actions.
 Use risk management tools such as:
– Interviews with Property Owner(s) and stakeholders
– Checklist
– Failure Modes, Effects and Criticality Analysis (FMECA)
– Failure Reporting Analysis and Corrective System (FRACAS)
– Business Impact Analysis (BIA)
– Human Reliability Analysis
Business-risk based Maintenance (RUN, REPAIR, REPLACE) priority grading (ref. IEC: 22237-1)

The Property Condition Assessment must conclude with an evaluation of the improvements achieved in the Dependability of the Property.

Green Data Centres – Challenges and Opportunities

The Growing Need for Sustainable Data Centers

Data centres, the powerhouses of our digital world, face a critical challenge: sustainability. While they provide the essential infrastructure for today’s digital ecosystem, their energy usage intensity can be staggering, often exceeding commercial office buildings of similar built-up areas by 20 to 50 times. With server racks evolving to hold even more powerful equipment (50 to 100 kW/rack compared to the previous 4 to 20 kW), the need for sustainable practices becomes even more pressing.

The Sustainability Imperative

Sustainability encompasses the design, construction, operation, and resource management (electricity, cooling, water, waste) provided for a data centre. Few facilities have embraced green practices from the outset, but many legacy data centres are now turning towards  Environmental, Social, and Governance (ESG) frameworks to achieve long-term sustainability and reap its benefits.

Transforming Legacy Data Centres

For traditionally constructed data centres, achieving sustainability involves an ongoing improvement process. Sustainability initiatives, which encompass assessments and evaluations conducted by reputable organizations such as IGBC Green Data Center Rating, ECOVADIS (ESG Score), and the Building Research Establishment Environmental Assessment Method (BREEAM framework), play a pivotal role in facilitating the transition towards carbon, water, and waste neutrality.

Two-pronged approach for Facility Managers:

In-House or Expert Assessment:  

Conduct regular assessments to identify areas for operational efficiency improvement across the entire data centre portfolios. This can be done by an internal team or by seeking external expertise.

Focused Interventions:

Based on the assessment findings, implement targeted interventions to address inefficiencies. This may involve upgrades to cooling systems, renewable energy sources, or water conservation measures.

India’s Booming Data Center Market: Balancing Growth with Sustainability

India’s digital landscape is undergoing a rapid transformation, fuelled by the proliferation of data centres. Currently ranking 13th globally, the Indian data centre market is experiencing significant growth, driven by factors like:

Digitalization Across Sectors: Education, healthcare, commerce, and communication are all experiencing a surge in online activity, demanding more data storage and processing power.

Emerging Technologies: The adoption of the Industrial Internet of Things (IIoT) and Generative AI necessitates data centres with a robust infrastructure.

Data Residency Requirements: The Digital Personal Data Protection Act (2023) encourages the construction of Edge and Hyperscale Data Centers to meet data residency requirements and ensure low latency.

“Challenges and Opportunities for Achieving Sustainable Growth”

While the data centre boom brings undeniable benefits, sustainability concerns require immediate attention. Here’s a closer look at the key challenges and opportunities:

Challenges:

Limited Sustainability Focus: Many legacy data centres haven’t prioritized sustainability principles, leading to higher operational costs and carbon footprints.

Green Energy Sourcing: Off-site green energy options are limited by a lack of awareness, unclear policy frameworks, and bureaucratic hurdles.

Energy Efficiency: Upgrading existing infrastructure to improve power utilization effectiveness (PUE) can be expensive and require a long payback period.

Water Management: The water crisis in some Indian cities highlights the need for Water Use Effectiveness (WUE) measures. Inadequate metering and stakeholder awareness exacerbate the issue.

E-Waste Management: A primary concern is the lack of proper monitoring, recording, and recycling systems for e-waste generated by data centres.

Opportunities:

Government Support: The Indian government’s revised data centre policy aims to facilitate land acquisition, green energy access, and supporting infrastructure. This will incentivize sustainable practices.

Favourable Green Energy Tariffs: Long-term Open Access (LTOA) tariffs offer cost-effective green energy options for data centres.

On-Site Green Energy: Technological advancements in solar-wind hybrid power systems make on-site renewable energy generation more attractive.

Water Efficiency Technologies: Implementing water-efficient building cooling systems, exploring recycled water reuse, and installing proper metering can significantly reduce data centre water usage.

Effective E-Waste Management: Policy development and enforcement focusing on e-waste reduction, reuse, and recycling is crucial. Additionally, establishing a network of trained and authorized e-waste recyclers is essential.

Conclusion

The Indian data centre market presents a golden opportunity for economic growth. However, ensuring long-term sustainability requires a collaborative effort from stakeholders. By addressing the existing challenges, embracing technological advancements, and implementing environmentally conscious practices, India can foster a thriving data centre ecosystem that is both economically viable and ecologically responsible.

Case Study – Data Center Indoor Contamination and Cleaning Improvement

Case study of indoor contamination of data centre: root cause analysis and risk management

Problem –

Server elements in a newly constructed data centre have frequently failed, resulting in significant downtime that has impacted the reliability of the global data centre and increased costs.

Background –

The Data Centre is located on a reclaimed marshy area around 1.0 Km away from the seashore. Its primary objective is to support business units throughout the Asia-Pacific region. The white space, which measures over 50,000 sqft, is home to Enterprise Servers and Storage products. The data centre is designed to operate within a thermal environmental boundary of  Class A1 and the reliability level of Tiers 3 and 4. The maintenance and cleaning services for the Data Centre have been outsourced to a specialized service provider.

Root Cause Analysis –

During routine indoor air quality tests of the Data Center, it was found that the indoor environment does not meet prescribed standards and guidelines. The high Sulphur content in the air is due to natural emissions such as H2S, NH3, and SO2 resulting from the Data Center’s location in a reclaimed marshy area. This has led to non-conformities with the Indoor Environment Standard.

Solutions adopted-

A two-pronged approach was adopted to address the contamination issue.

  1. Upgrading the clean environment mechanical systems

An additional air filtration system was installed at the fresh air intake to filter out harmful gases. A pressurization system was also set up to maintain positive air pressure within the data centre, preventing any external pollutants from entering. Furthermore, a real-time indoor environment monitoring system was implemented to detect deviations from the ASHRAE-laid standard of environmental limits of Class A1 for indoor temperature, humidity, and air quality.

  1. Enhancing cleaning protocol

A cleaning protocol has been developed for the White (SERVER) and Grey (POWER & COOLING equipment) spaces inside the Data Centre to improve surface cleaning and address dust particles and chemical contamination issues. Several internationally recognized standards and guidelines, including ISO 14644 – 1 to 9, 13, and 14, were consulted to develop a robust and effective cleaning protocol.

By combining environmental upgrades with a more rigorous cleaning regime, the data centre significantly reduced contamination and minimized the risk of server failure. This case study highlights the critical role facilities service contractors play in maintaining optimal data center environments. Partnering with a qualified contractor who understands the specific needs of cleanroom environments and implements industry best practices is essential for ensuring data center uptime and preventing costly disruptions.

Case Study – Data Center Energy Performance, Obsolescence, and Dependability Assessment

Mission-critical data centres are the backbone of businesses, and it’s imperative to regularly assess and validate their energy performance, obsolescence, and dependability of the support utilities systems and subsystems. For telecom Businesses in regions across India, a comprehensive assessment of mobile switching and data centres was undertaken.

The comprehensive study mandated included the following components:

A process flow was mapped for the comprehensive performance assessment of the Data Centre.

1. A thorough assessment of the infrastructure’s environmental, health, and safety attributes.
• A detailed risk assessment based on indoor environmental test outcomes.
• Implementing risk mitigation measures, including enhancements to the clean room ventilation and filtration systems as necessary.
• A fire and life safety assessment to identify potential high risks and impacts on individuals and property.
2. Evaluation of site-specific location sustainability and transportation factors.
• Adequate space allocation for the Data Center and utilities to meet current and future requirements.
• Reliable availability of electricity and water sources to support present and anticipated needs.
• Assessment of location sustainability considering the risks from nearby fuel stations, concert halls, political establishments, and government institutions.
• Accessibility to renewable power sources for enhanced connectivity.
• Public transportation options within a 1.0 km radius of the site.
• Accessibility to skilled manpower from local communities.
3. Full-Time Employee (FTE) detailing and competency assessment:
• FTE detailing was conducted based on critical service needs.
• Comprehensive identification of competency and training needs.
4. Compliance with local and national regulatory guidelines includes reviewing compliance gaps concerning mandatory rules and regulations and risk mitigation actions taken over the past three years.
5. Evaluation of operations and maintenance services, including energy management, HVAC systems, water management, and waste management:
• Analysis of operating procedures and practices in alignment with governing standards and sustainability principles.
• Development of maintenance manuals for the operations team.
6. Energy performance assessment of the entire building and major critical systems (HVAC, electrical, and water):
• Evaluation of energy performance based on historical energy records.
• Baselining energy consumption for systems, sub-systems and the whole building.
• Spot measurements to identify the scope for efficiency improvement.
7. Creation of a short- and long-term capital investment business case for energy efficiency improvements on behalf of the client:
• Business case development for enhancing the energy efficiency of critical systems.
• Retrofit engineering solutions designed to improve energy and performance efficiency.
8. Dependability study of critical systems focusing on electrical power, HVAC, and water management:
• Assessment of reliability, availability, and maintainability.
• Obsolescence assessment.
9. Equipment condition assessment, which includes thermal scanning, power quality analysis, vibration and noise assessments:
• Evaluation of equipment age and reliability.
10. Capacity utilisation and forecasting for effective capacity management:
• Simple regression analysis of multiple variables to forecast the data centre’s most probable demand capacity for electricity, water, and waste management.
11. Functional criticality evaluation:
• Establishment of a functional criticality assessment based on Failure Mode and Effects Analysis (FMEA) tools.
12. Perform ‘Integrated System Test’ of Mechanical, Electrical, Plumbing, Lifts, HVAC, Fire Alarm and Suppression systems, Electronic surveillance and access control systems of the building and follow ‘Failure Reporting and Corrective Analysis System’ (FRACAS) procedure.
13. Development of a business case for capital investment projects aimed at improving, upgrading, and modifying systems and sub-systems:
• Submission of a capital investment project proposal for improvements, upgrades, and modifications.

Grading Asset Condition Survey and Action Priority

Preventive and Corrective or Improvement Maintenance Priorities

Priority 1
Failure or absence of critical elements has a direct impact on health and life safety.
Priority 2
Non-compliance with mandatory local and national statutory and legal requirements is identified.
Priority 3
Failure or absence of critical systems or sub-systems affects business operations.
Priority 4
Improvements or modifications in system or sub-system assets can enhance cost efficiency, service quality, and sustainability.

Critical Asset Condition Grading

Class A
The system/sub-system is fully operational and meets all design performance specifications without any issues.
Class B
The system/sub-system is functional and adheres to design performance standards. However, there are minor signs of wear or reduced efficiency at the equipment or component level.
Class C
The system/sub-system is still operational but shows significant degradation in condition or performance efficiency, deviating from the optimal design intent in several areas at the equipment or component level.
Class D
There is a serious risk of the imminent breakdown of critical element(s) running the risk of a major system breakdown.

Power Quality for Data Centre

Modern Data Centers of varying sizes and types are constructed and operated at a “Mission Critical Environment” level of criticality to meet business objectives. The reliability and availability of an Enterprise Data Center are expected to be no less than 99.99%. Research programs have established that the quality of stable and reliable power sourcing and distribution design is essential in maintaining the power system’s reliability and availability for electronic loads.

This article discusses the general requirements for power sourcing and distribution systems for electronic loads. It also includes a typical cause-effect analysis and proposes solutions to common power quality issues. 

Why is Power Quality Important to Data Centers?

Cause-effect analysis of Power Quality (PQ) related disturbances

Power Quality Threshold Limits for Data Center

Regular monitoring and assessment of power quality are crucial for a data centre, as they are necessary for operational and change-management requirements. Periodic assessment of power quality is also essential to troubleshoot power-related interruptions, abnormal system behaviour, the addition of new electronic equipment, or developing a baseline.

Solutions to Power Quality related disruptions.

The reliability and availability of data centres depend on the quality of the power supply. To ensure optimal performance and energy efficiency, the key performance indexes must be kept within acceptable threshold limits. This also contributes to the reliability and availability of the power distribution system.

Collaboration with key stakeholders, such as IT experts, end-users, and utility service providers, is crucial in maintaining the electrical power network. By taking input from all stakeholders, the Facility Team must develop a maintenance regime that adequately addresses any issues related to the power network. This approach ensures that the data center operates optimally, providing a seamless experience for all users.

Reliability, Availability, and Maintainability Assessment of Data Center

Reliability, Availability, and Maintainability (RAM) Study of Data Centres

Performing a RAMS (Reliability, Availability, Maintainability, and Safety) study is crucial for businesses operating in critical environments. It helps devise operating procedures and emergency response plans that align with business objectives. Facility Managers must work with stakeholders to develop business-level agreements, understand design intent, and identify gaps concerning the governing standards, best practices, design assumptions and estimates. Operations and maintenance teams should conduct a Business Risk Analysis to acknowledge stakeholders and develop a risk management program.

This article provides an overview of the Dependability assessment program for a Data Centre and highlights gaps commonly observed in new and legacy Data Centres alike.

Data Centre Business Context

The criticality of an enterprise data centre is typically evaluated through a comprehensive failure cost impact analysis that considers the immediate financial impact of any failure, the consequential losses that may be incurred, and the long-term effects on the company’s brand reputation. According to surveys conducted by reputable research firms, the average direct cost impact per incident of Data centres ranges from a few hundred thousand to millions of USD. The leading causes of failures in data centres are power interruptions, human error and cooling system failures. It has been reported that approximately 80-90% of Data Centres have experienced severe operational failures in any given five-year tenure, resulting in not just suboptimal end-customer experience but a long-term negative impact on brand image that can be difficult to recover from.

Dependability Assessment

In the context of a data centre, ensuring that the infrastructure is dependable and meets the expected levels of service resilience under normal and emergency operating conditions is crucial. It is also a pathway to significant improvements and cost optimisation. The key dependability attributes, including Reliability, Availability, Maintainability, and Safety, need to be assessed, analysed, and reviewed in collaboration with stakeholders to achieve these benefits.

The dependability of mission-critical infrastructure is contingent upon several factors, including, but not limited to, the locational attributes, architectural and structural features of the building infrastructure, support logistics for operations and maintenance, obsolescence of building systems and subsystems, and the competency of the maintenance team. The Disaster Recovery Business Continuity plan must be tested periodically in a simulated doomsday environment.

It is crucial to ensure that the organisation complies with all relevant statutory and regulatory rules of local and national authorities, in addition to validating assumptions and estimates. This involves obtaining the necessary permits and licenses to operate and complying with all relevant laws and regulations governing its operations.

To ensure that the organisation’s operations run smoothly and efficiently, logistics such as procurement and stock management, local transport, building architectural and structural conditions, and environmental sustainability must be supported. By optimising these logistical processes, the organisation can reduce costs, improve efficiency, and enhance the reliability and dependability of its operations.

Overall, taking a holistic approach to these various aspects of organisational management enables continual improvement of dependability. This helps ensure that the organisation is well-positioned to meet the needs of its customers and stakeholders over the long term.