Site Reliability Engineer
3 days ago
Macclesfield
Mid-Level Site Reliability Engineer (SRE) Are you an experienced Site Reliability Engineer with a passion for building reliable, scalable systems that empower innovation? Our client is looking for a skilled Mid-Level SRE to join our growing technology team. In this role, you’ll help ensure our infrastructure is stable, secure, and efficient - supporting the applications that drive support our clients. The Role We are seeking a mid-level Site Reliability Engineer (SRE) to join our technology team, helping to ensure the smooth operation and reliability of our infrastructure. You’ll play a vital role in maintaining uptime, managing deployments, and supporting other team members. This is a hands-on position suited for someone who thrives on problem-solving, process improvement, and cross-team communication. What You’ll Do: Maintain & Improve Systems • Ensure the reliability, performance, and availability of production systems., • Perform regular updates, patching, and maintenance across environments., • Manage infrastructure provisioning using Terraform, Ansible, and AWS. Collaborate & Support • Work closely with the junior SRE to develop their practical experience and technical confidence., • Partner with developers, data scientists, and business users to resolve technical issues. Automate & Optimise • Contribute to configuration management and automation improvements., • Identify and document standard operating procedures., • Implement proactive monitoring measures to detect and prevent issues. Monitor & Troubleshoot • Troubleshoot system issues using logs, monitoring tools, and a methodical approach., • Oversee and enhance system monitoring with Nagios, with a transition to Datadog. Incident Management • Support incident management processes, including post-mortems and follow-up actions., • Communicate outcomes with customers clearly and effectively. What We’re Looking For: Experience • Proven experience in an SRE, DevOps, or Operations Engineering role., • Strong working knowledge of AWS, Terraform, and Ansible. Technical Skills • Linux system administration & shell scripting., • Networking fundamentals, containerization, and infrastructure security best practices., • Version control experience (e.g., Git). Strong troubleshooting and root cause analysis skills. Desirable Skills • Experience with Kubernetes and/or other cloud platforms., • Familiarity with Nagios, Datadog, or similar monitoring tools., • Exposure to CI/CD systems such as TeamCity, AWS CodeBuild, AWS CodePipeline, or ArgoCD. Personal Attributes • Proactive, curious, and process-driven., • Enjoys collaboration and mentoring., • Calm under pressure, especially during incidents., • Flexible and adaptable to technical and business priorities. Nice-to-Have • Experience supporting scientific or data-intensive applications., • Background in post-mortem facilitation and follow‑up., • Enthusiasm for observability, performance tuning, and cost optimisation.