Senior AI Infrastructure Engineer
3 days ago
Bristol
Senior AI Infrastructure Engineer (OpenStack & Kubernetes) Location: Remote (UK or EU Preferred) Sector: High-Performance GPU Cloud Computing The Opportunity I am representing a fast-growing, international scale-up that is building next-generation GPU cloud infrastructure. This company is the powerhouse behind a high-performance platform designed specifically for the most demanding AI, Machine Learning, and HPC workloads. As they scale their global footprint to meet massive demand, they are seeking a Senior Infrastructure Engineer who enjoys deep technical autonomy. This is a role for a specialist who wants to move fast, solve complex problems, and have direct ownership over the stability and scalability of business-critical systems. What You’ll Be Doing • Owning Infrastructure: Designing, deploying, and operating OpenStack and Kubernetes clusters optimized for multi-tenant GPU workloads., • Driving Automation: Building and maintaining infrastructure-as-code and GitOps practices to ensure seamless scalability., • Optimizing Performance: Enabling reliable workload scheduling through Kubernetes-native tooling, container runtime optimization, and NVIDIA integrations., • Ensuring Resilience: Maintaining high availability and observability through proactive monitoring, logging, and incident response., • Strengthening Security: Implementing strong controls, including RBAC and network policies, to ensure tenant isolation., • Cross-Team Collaboration: Working closely with DevOps, AI, and Product teams to align infrastructure capabilities with customer needs. The Ideal Profile • OpenStack Expert: Significant hands-on experience operating OpenStack in a production environment., • K8s Specialist: Strong experience running production-grade Kubernetes, ideally in bare-metal or private cloud setups., • Systems Generalist: A solid grounding in Linux, networking, and storage with a practical approach to troubleshooting., • Modern Workflows: Experience with infrastructure automation, CI/CD, and Git-based workflows., • Scale-up Mindset: The ability to thrive in a fast-moving environment with a strong sense of accountability. Nice to Have • Exposure to GPU-based infrastructure, large-scale compute platforms, or HPC., • Familiarity with advanced networking technologies., • Contributions to open-source or cloud-native communities. What’s on Offer? • Impact: The opportunity to make a visible, meaningful impact on a platform used by teams running compute-heavy applications., • Flexibility: Flexible working arrangements, including remote or hybrid options., • Growth: Clear career progression and the chance to help shape the company's culture and future., • Culture: A collaborative, transparent, and international culture built on trust., • Benefits: Competitive salary, annual discretionary bonus, 25 days holiday (plus public holidays), and wellbeing benefits.