Azure Site Reliability Engineer
2 days ago
Richmond
Senior Azure Support Engineer • Location: Richmond-Upon-Thames, • This is a hybrid role., • The environment operates hundreds of Single Tenant (ST) and Multi-Tenant (MT) deployments across Azure, each with their own Servers, database, and storage., • This role exists to keep every deployment reliable resolving ensuring uptime and building automation., • Investigating and fixing complex application and infrastructure issues., • Monitoring capacity, performance, and error budgets across all deployments., • 3+ years in third-line support, SRE, or cloud operations for enterprise SaaS., • Proven track record in incident resolution and root cause analysis., • Experience working with both multi-tenant and single-tenant cloud architectures., • Strong background in supporting C#/.NET Core/MVC web applications with SQL Server backends and Azure Blob Storage., • Advanced Azure diagnostics (Application Insights, Log Analytics, Kusto Query Language)., • Proficient in SQL for investigation and remediation., • Scripting and automation skills in PowerShell and/or C#., • Understanding of Azure components: App Services, VMs, SQL DB, Blob Storage, scaling strategies., • Experience in capacity planning, SLOs, and error budget management, • Monitor ST and MT environments for server performance, response times, error rates, etc., • Detect and resolve database issues, stalled file processing, or misplaced storage objects., • Use Azure diagnostics and telemetry to troubleshoot and resolve complex incidents., • Provide third-line support for escalated customer cases, collaborating with development., • Maintain uptime, performance, and scalability across all ST and MT deployments., • Define and track service-level objectives (SLOs)., • Perform capacity planning for Servers, databases, and storage, scaling resources., • Identify systemic patterns causing downtime and implement fixes at scale., • Build PowerShell scripts and automation (Azure Functions, Logic Apps), • Automate environment health checks and reporting.