Platform Engineer / DevOps (AI / ML)
hace 2 días
Norwich
Brief We’re a UK-headquartered, European-wide media technology company with a profitable core business and a fast-growing consumer-facing video platform. We operate across the UK and Europe and compete in a market dominated by far larger, slower-moving incumbents. We win by moving faster, taking responsibility, and executing decisively. Pace matters here. Outcomes matter. Momentum matters. We run a high-ownership, high-intensity environment. People go beyond what’s expected because they care deeply about the work and the impact it has. This role is for someone who is comfortable carrying real responsibility for production systems and making decisions that matter. Alongside our core business, we are scaling a consumer social / film-based platform with meaningful early traction. The technical challenges span video, storage, distribution, ML workloads, and reliability at scale. We’re now looking for a DevOps / Platform Engineer to take ownership of our infrastructure and production platform as we prepare for the next order of magnitude of growth. The role This is a hands-on role for someone who wants to be close to production reality. You’ll be responsible for evolving our infrastructure to be faster, more reliable, and more scalable, while keeping operational quality high. Our platform today runs on distributed systems with object storage, CDN-first delivery, high-throughput video pipelines, and GPU-backed ML workloads. We operate a hybrid environment across on-site and off-site infrastructure. Kubernetes is not yet fully rolled out in production, but it is a near-term priority, and this role will play a key part in designing and enabling that transition. What you’ll be working on Horizontally scalable, stateless services designed for 10×–100× growth CDN-first delivery, with edge-heavy and origin-light architectures Object-storage-centric systems and migration away from filesystem-based storage High-throughput video pipelines: ingestion → processing / transcoding → storage → delivery Hybrid infrastructure: on-site and off-site / cloud GPU-backed AI / ML workloads in production, including video enhancement and embedding pipelines Multi-region architecture with GDPR and EU / UK data residency considerations Increasing use of frontier AI tooling within the platform layer, including interest in multi-step / agentic patterns where they add real leverage Transitioning the platform towards Kubernetes as a first-class production orchestration layer What you’ll do Take ownership of production infrastructure and evolve it as the platform scales Design and roll out Kubernetes thoughtfully, focusing on reliability, debuggability, and operational clarity rather than novelty Build and maintain CI / CD pipelines that enable frequent, safe releases Manage infrastructure using IaC (Terraform, Pulumi, or equivalent), keeping environments repeatable and auditable Build strong observability: metrics, logging, tracing, alerting, dashboards, and runbooks that are actually used Support storage and data systems, including S3-compatible object storage and PostgreSQL at scale Support media and ML workloads, including video transcode pipelines and GPU-backed inference Handle incidents, post-mortems, capacity planning, and disaster recovery with a focus on eliminating repeat failures What we’re looking for Strong experience running distributed systems in production Hands-on experience with Kubernetes, or experience owning the move from non-Kubernetes infrastructure into Kubernetes Comfort operating hybrid environments across on-prem and cloud Experience running high-availability systems and being on the hook when things break Solid understanding of object storage patterns and trade-offs, especially for large assets Strong Linux fundamentals and comfort working close to real hardware when required Strong plus Experience with video platforms: ingestion, transcoding, streaming protocols (HLS / DASH), and delivery optimisation PostgreSQL at scale, including replication, tuning, and failover GPU / ML infrastructure experience, including inference pipelines and performance trade-offs Experience building or operating agentic / multi-step systems in production Experience working within GDPR and EU / UK data residency constraints How we work We move fast and take ownership. We’re a small, focused team competing against much larger players by executing decisively. This role suits someone who takes responsibility end-to-end, enjoys solving messy production problems, and wants to actively improve systems rather than simply maintain them. • This is not a role for someone looking to optimise for comfort. It is a role for someone who wants to build leverage, raise the ceiling, and be deeply involved as the platform scales.