Mechanical & Electrical Problem Analyst, Data Centre Infrastructure
10 hours ago
Sheffield
About Colt Data Centre Services Colt Data Centre Services (DCS) has over 20 years’ experience in designing, building and operating energy-efficient, reliable data centres – hosting significant financial, media, corporate and cloud wholesale providers across the world. Our customers are at the heart of everything we do. We endeavour to take a customer-led approach across our operations, striving to serve our customers with a seamless experience no matter what facility or region they are in. Finding the right solutions for our customers starts with finding the right people for Colt DCS. We believe in creating a healthy, learning environment for our employees to flourish. Our vision: to be the most customer-centric data centre provider. About the Role Reporting to the Senior Problem Manager and working remotely, you will be responsible for identifying, analysing, and supporting the resolution of issues predominantly related to data centre Mechanical and Electrical (M&E) equipment / services. Your purpose will be to minimise business impact and prevent future incidents across all of our sites in Europe, India and Japan. In practice, this means you'll use your knowledge of M&E equipment / systems / services in working closely with cross-functional teams to investigate root causes, implement corrective actions, and drive continuous improvement in DCS service delivery. This may also include working with or presenting to DCS customers and/or suppliers. In summary, you'll be managing the lifecycle of infrastructure Problem tickets across our DCS estate. This will include proactive work to analyse historic issue data, using that to deliver comprehensive recommendations to help mitigate these arising in future where possible. About you As you'll be investigating, analysing and reporting on a range of infrastructure related issues, you'll need a proven track record in managing incidents and problems associated with critical mechanical and electrical systems (for example, power distribution, cooling, UPS). It is therefore likely that your experience will include working in environments such as data centres or other large scale infrastructure, and be supported by facilities related qualifications (ie Uptime Institute or electrical / HVAC certifications). Within this, you'll have the ability to understand how to get to the root cause of infrastructure related problems, alongside analysis capability serving to identify trends and make data-driven recommendations. All of this will be underpinned by your strong communication skills and the ability to help technical and non-technical audiences understand issues and how best to resolve them. What we offer We offer skill development, learning pathways and accreditation to help our people perform at their best, regardless of role and location. In addition to offering competitive salaries and incentive plans, a range of benefits and local rewards packages are offered to staff. Colt DCS recognises the importance of a work-life balance. These are just some of the reasons why Colt DCS is recognised as Great Place to Work Certified UK. The Role in More Detail Key Responsibilities • Risk Identification/Problem Identification and Analysis o Investigate and analyse incidents to determine root causes (RCA) and patterns. o Proactively identify recurring issues through trend analysis, with a specific focus on building monitoring systems (BMS) data analysis. o Risk assessment of non-standard Changes. • Resolution and Mitigation: o Ensure timely resolution and closure of problem tickets. o Identify permanent solutions for recurring incidents. o Ensure timely escalation of critical issues and coordinate resolution efforts across teams. • Documentation and Reporting: o Maintain comprehensive documentation of problems, their resolutions, and preventive measures. o Prepare detailed RCA reports and graphs to support the technical explanation of a Problem o Keep documentation for problem management accurate and current. • Continuous Improvement: o Prepare presentations on Problem management metrics, trends, and improvements. o Proactively identify opportunities to enhance service reliability and efficiency. o Participate in problem management initiatives and projects to drive continuous improvement. o Deliver internal technical training in areas of expertise. • Collaboration and Communication: o Collaborate with stakeholders to gather information and perform thorough problem diagnostics. o Join and document problem review meetings, providing regular updates. o Actively participate in Monthly Executive Reviews and other key meetings, lead the meeting if required. o Work with cross-functional teams to implement corrective actions and drive improvement. o Support and oversee the onboarding process for new sites, representing Service Operations, with particular focus on ensuring documentation is available and accurate and “snags” are captured. Requirements • Hands-on knowledge of Mechanical & Electrical (M&E) systems and Data Centre infrastructure., • Proven experience in problem management or incident management within a complex DC environment., • Experience with monitoring systems and ability to analyse monitoring data to detect and pin point issues., • Strong analytical skills with the ability to analyse data, identify trends, and make data-driven decisions., • Excellent documentation and reporting in English., • Knowledge of IT and Networking concepts., • Knowledge and experience with at least one of common RCA methods (5-Whys, Fishbone Diagram, Fault Tree Analysis, Pareto Analysis), • ITIL certification or knowledge of ITIL framework practices. Note that for this role, there will be a need to travel to site locations when required, in addition to a small number of trips to our offices annually for face-to-face team meetings.