MLOps Platform Engineer
hace 2 días
Oxford
MLOps Platform Engineer Location: Oxford, UK (Hybrid) Organisation: Global AI & Health Innovation Programme Employment Type: Full-time | Permanent About the Role We are seeking a MLOps Platform Engineer to design, implement, and scale the infrastructure that supports high-performance machine learning and AI-driven research workflows. You will play a critical role in bridging the gap between data science, bioinformatics, and engineering — ensuring seamless, secure, and reproducible deployment of ML models in production and research environments. You’ll collaborate closely with AI Scientists, Data Engineers, and DevSecOps teams, building automation pipelines that accelerate model development and deployment across distributed, cloud-native systems. Key Responsibilities • Design, implement, and maintain end-to-end MLOps pipelines for model training, validation, deployment, and monitoring., • Develop and automate workflows using Terraform, Kubernetes, Docker, and CI/CD toolchains (GitHub Actions, Jenkins, Argo, etc.)., • Manage scalable cloud-based compute environments (Oracle Cloud, AWS, or GCP) for AI workloads and data processing., • Build and maintain feature stores, model registries, and versioning systems to ensure traceability and reproducibility., • Implement data ingestion and pre-processing pipelines to support ML and bioinformatics workloads., • Collaborate with security and DevSecOps teams to enforce best practices in access control, compliance, and governance., • Support AI/ML researchers in model experimentation and infrastructure optimization., • Monitor model and system performance, implement drift detection, and refine automated retraining pipelines., • Contribute to the development of a modular, reusable ML platform architecture supporting multi-modal data (genomic, clinical, imaging, etc.). Essential Skills and Experience • Proven experience as an MLOps Engineer, Platform Engineer, or DevOps Engineer supporting ML or data science teams., • Strong hands-on experience with containerization (Docker) and orchestration (Kubernetes)., • Expertise in Terraform, Infrastructure as Code (IaC), and cloud provisioning (OCI, AWS, GCP, or Azure)., • Solid understanding of CI/CD pipelines and automated testing frameworks., • Experience with ML frameworks such as PyTorch, TensorFlow, or Scikit-learn., • Familiarity with MLflow, Kubeflow, DVC, or similar MLOps tools., • Understanding of cloud security principles, IAM, and networking best practices., • Proficiency in Python and Bash scripting for automation and tooling development., • Version control with Git, and collaborative development practices. Desirable Experience • Exposure to bioinformatics or health data ecosystems (WGS, transcriptomics, clinical data)., • Knowledge of data governance and compliance frameworks (GDPR, ISO27001, HIPAA)., • Experience building monitoring dashboards for ML performance metrics., • Familiarity with distributed training environments and GPU/TPU orchestration., • Oracle Cloud Infrastructure (OCI) certification or equivalent. Terms of Appointment Applicants must have the right to work permanently in the UK and be within commuting distance of Oxford. Occasional travel may be required for collaboration across global sites.