London
Experience Level 10+ years of hands-on experience in large-scale, production network environments Role Overview The Senior Network SRE (Site Reliability Engineer) is responsible for designing, operating, and continuously improving highly available, scalable, and observable network infrastructure. This role blends deep networking expertise with SRE principles, automation, and software-driven operations to ensure reliability, performance, and rapid recovery across multi-vendor network environments. The ideal candidate has extensive experience in routing, switching, firewalling, and wireless technologies, combined with strong automation skills and a data-driven approach to reliability and observability. Key Responsibilities Network Reliability & Operations • Own the reliability, availability, and performance of enterprise and/or service-provider-grade network infrastructure, • Define, implement, and track SLIs, SLOs, and error budgets for network services, • Lead incident response for major network events, including root cause analysis (RCA) and postmortems, • Drive continuous improvement through blameless postmortems and corrective action plans, • Architect and operate multi-vendor networks (Cisco, Juniper, Arista, Palo Alto, Fortinet, etc.), • Design and maintain:, • Layer 2/Layer 3 routing and switching (BGP, OSPF, IS-IS, EIGRP), • Data center, campus, and WAN architectures, • Firewall and security policy frameworks, • Enterprise and/or large-scale wireless networks, • Design and maintain network automation using tools such as:, • Ansible, • SaltStack, • Python and network APIs, • Implement Infrastructure as Code (IaC) principles for network provisioning and configuration, • Build automated workflows for configuration management, compliance, validation, and remediation, • Implement comprehensive network observability using tools such as:, • Grafana, • Splunk, • Prometheus and related telemetry systems, • Develop dashboards, alerts, and reports that provide actionable insights, • Correlate network telemetry with application and system metrics, • Design and manage firewall policies, segmentation, and access controls, • Partner with security teams on network security architecture and incident response, • Ensure compliance with internal standards and external regulatory requirements, • Work closely with software engineers, platform teams, and DevOps/SRE counterparts, • Act as a technical leader and mentor for junior network and SRE engineers, • Influence engineering best practices and reliability culture across teams, • Provide clear communication during incidents and change management activities Required Skills & Qualifications Networking Expertise • 10+ years of experience in network engineering and operations, • Deep knowledge of:, • Routing protocols (BGP, OSPF, IS-IS), • Switching technologies (VLANs, VXLAN, EVPN), • Firewalling and network security, • Wireless networking at scale, • Strong experience with Ansible and/or SaltStack, • Proficiency in Python or similar scripting languages, • Experience with APIs, data models, and network programmability, • Solid understanding of SRE principles and production reliability, • Experience defining SLIs/SLOs and managing error budgets, • Strong incident management and troubleshooting skills, • Hands-on experience with Grafana, Splunk, and telemetry platforms, • Ability to design meaningful metrics, logs, and alerts, • Strong problem-solving and analytical skills, • Excellent written and verbal communication, • Ability to lead technically during high-pressure incidents, • Mentorship and leadership mindset