Senior Azure SaaS Reliability & Support Engineer
5 days ago
Kingston Upon Thames
Senior Azure SaaS Reliability & Support Engineer - Hybrid (2 days a week in Kingston) - ASAP Start You will be the bridge between support, engineering, and cloud operations • Investigating and fixing complex application and infrastructure issues., • Monitoring capacity, performance, and error budgets across all deployments. 1. Environment Health & Incident Response • Monitor ST and MT environments for server performance, response times, error rates, and application health., • Detect and resolve database issues, stalled file processing, or misplaced storage objects., • Use Azure diagnostics and telemetry to troubleshoot and resolve complex incidents., • Provide third-line support for escalated customer cases, collaborating with development for code-level fixes. 2. Reliability Engineering (Fleet Level) • Maintain uptime, performance, and scalability across all ST and MT deployments., • Define and track service-level objectives (SLOs) and error budgets for different environment types., • Perform capacity planning for servers, databases, and storage, scaling resources before issues occur., • Identify systemic patterns causing downtime and implement fixes at scale. 3. Automation & Tooling • Build scripts and automation (PowerShell, C#, Azure Functions, Logic Apps) to detect and remediate common application or infrastructure issues., • Automate environment health checks and reporting., • Develop self-healing routines for recurring problems. 4. Monitoring & Reporting • Implement and maintain Azure Monitor/Application Insights/Log Analytics dashboards for:, • Environment uptime & performance, • SLA compliance & error budget tracking, • Incident trends and recurring issue analysis, • Provide regular reliability reports and improvement recommendations to stakeholders. 5. Continuous Improvement & Knowledge Sharing • Feed recurring issues and systemic risks into the continuous improvement programme., • Contribute to post-incident reviews with actionable follow-ups., • Uptime: = target SLO % for ST and MT environments., • Error Budget Burn Rate: Maintain within agreed thresholds., • Incident Metrics:, • Reduce MTTR for P1/P2 incidents., • Reduce recurrence rate of common issues., • Automation Impact:, • Number of recurring issues automated/self-healed., • Hours saved through automation vs manual intervention., • Customer Impact:, • Reduced escalations from L1/L2 support. Essential Technical Skills • 3+ years in third-line support, SRE, or cloud operations for enterprise SaaS., • Proven track record in incident resolution and root cause analysis., • Experience working with both multi-tenant and single-tenant cloud architectures., • Strong background in supporting C#/.NET Core/MVC web applications with SQL Server backends and Azure Blob Storage., • Advanced Azure diagnostics (Application Insights, Log Analytics, Kusto Query Language)., • Proficient in SQL for investigation and remediation., • Scripting and automation skills in PowerShell and/or C#., • Understanding of Azure components: App Services, VMs, SQL DB, Blob Storage, scaling strategies., • Experience in capacity planning, SLOs, and error budget management Your Personal Skills and Attributes • Exceptional problem-solving skills with strong attention to detail., • Ability to clearly document findings and communicate with technical and non-technical audiences., • Calm under pressure during high-priority incidents. Your Benefits • Private Medical Insurance: Your health matters, and we've got you covered., • Birthday Off: Celebrate your day your way - it's on us., • Holiday Purchase: Need more downtime? Purchase up to an additional 5 days of holiday., • Employee Assistance Programme: Confidential 24/7 helpline and support for you and your immediate family., • Time for You: We value your personal time. That's why we aim to finish work at 2pm on Fridays., • Better Working: We embrace hybrid working and where it is operationally practicable, we support employees splitting their working time between the office and home.