Senior Platform Engineer
8 days ago
London
Senior Platform Engineer Build and operate the platforms that make AI and machine learning work at scale We're looking for a Senior Platform Engineer to join our team and play a key role in designing and operating the platform that underpins AI and machine learning delivery. This is a hands-on senior platform role, focused on building robust, Kubernetes-based platforms that enable MLOps engineers, ML engineers, and data scientists to deploy, run, and manage models safely and effectively in production. While you'll need a strong understanding of how machine learning and LLM workloads are trained, packaged, deployed, and served, this is not a "deploy models all day" role. Instead, your impact will come from creating the infrastructure, tooling, workflows, and guardrails that allow others to do that work reliably and at scale. What you'll be doing You'll be responsible for building a production-grade AI / ML platform, not just running clusters. You will: • Design, build, and operate a Kubernetes-based platform that supports multiple ML and engineering teams, • Extend Kubernetes with MLOps-specific capabilities, rather than treating it as a finished product, • Provideplatform-level support for:, • Model development and experimentation, • Model packaging, deployment, and promotion, • Scalable inference and LLM-based workloads, • Build shared platform services that enable consistent, repeatable model deployment, even where day-to-day deployment is owned by MLOps or ML engineers, • Work closely with data scientists and MLOps engineers to ensure the platform is genuinely usable and fit for purpose, • Own platform operability, reliability, security, and lifecycle management in production, • Troubleshoot complex issues that cut across infrastructure, Kubernetes, and MLOps layers Essential experience: • Strong background as a Senior Platform Engineer or Senior DevOps Engineer, • Deep, hands-on experience building and operating Kubernetes-based platforms, • Strong practical experience with Helm and Infrastructure as Code (e.g. Terraform), • Proven experience building internal platforms for other engineers, not just running workloads, • Strong grasp of operational fundamentals: monitoring, logging, reliability, incidents, and maintainability Experience or exposure to areas such as: • MLOps platforms (e.g. Kubeflow or similar frameworks), • Model serving and inference platforms (e.g. KServe, vLLM, or equivalent), • Supporting LLM-based workloads, including performance and scaling considerations, • Notebook environments such as JupyterHub, • Working in organisations with a clear AI or data platform strategy, • Supporting data scientists or ML engineers at scale, • Experience in regulated, secure, or high-assurance environments Guidant, Carbon60, Lorien & SRG - The Impellam Group Portfolio are acting as an Employment Business in relation to this vacancy.