Engineer, R&D Tech
hace 2 días
Barcelona
AI Platform / MLOps Engineer About the role We are looking for an AI Platform / MLOps Engineer to join a fast-growing AI team within an international technology environment. In this role, you will be responsible for operating, scaling, and improving AI/ML systems in production , ensuring that training, inference, pipelines, and platform services are reliable, observable, secure, and cost-efficient. You will work at the intersection of MLOps, DevOps, Cloud Engineering, and AI platform architecture , supporting the full lifecycle of AI systems — from model training environments to production inference, CI/CD automation, monitoring, and cost optimisation. This is a hands-on role for someone with a strong platform engineering mindset , solid experience in AWS, infrastructure, automation, and ML tooling , and a passion for building production-grade AI systems. If you enjoy making AI systems scalable, reliable, observable, and ready for real-world usage — this could be a great fit. What you’ll do Operate and scale AI/ML platforms end-to-end, including training, inference, pipelines, and production environments Build and maintain robust ML infrastructure using tools such as AWS SageMaker, MLflow, feature stores , and related ML platform components Design and implement CI/CD pipelines for ML models , AI workloads, and platform services Set up and optimise training and inference environments for reliability, scalability, and performance Implement observability, monitoring, alerting, and cost-control mechanisms for AI workloads Support production deployments of ML/AI systems with a strong focus on automation and operational excellence Work with DevOps and platform tooling such as AWS, Terraform, Kubernetes, Docker, GitHub Actions / CI/CD tools Collaborate with AI Engineers, Data Scientists, Data Engineers, and Tech Leads to ensure AI solutions are production-ready Contribute to best practices around MLOps, model versioning, experiment tracking, deployment, monitoring, and governance Work with LLM and agentic tooling ecosystems such as LangChain, LangFuse, LangSmith , or similar platforms Troubleshoot production issues related to infrastructure, pipelines, inference performance, latency, reliability, and cost Must Have Solid background in Platform Engineering, DevOps, Cloud Engineering, MLOps, or ML Platform Engineering Hands-on experience with AWS and cloud-native services Experience with Infrastructure as Code , especially Terraform Strong experience building and maintaining CI/CD pipelines Experience with ML platform tooling such as SageMaker, MLflow, feature stores , or similar tools Understanding of ML/AI workflows: training, inference, model deployment, pipelines, monitoring, and lifecycle management Experience setting up and managing production environments for AI/ML workloads Strong understanding of observability, monitoring, alerting, scalability, and cost optimisation Familiarity with containerisation and orchestration tools such as Docker and Kubernetes Experience with LLM / agentic tooling such as LangChain, LangFuse, LangSmith , or similar frameworks/platforms Strong automation mindset and ability to build reliable, repeatable, production-grade systems Strong problem-solving skills and ownership mindset Fluent English and Spanish ✨ Nice to Have Experience with data pipelines or data engineering workflows Experience with AWS Bedrock , vector databases, or LLM infrastructure Experience with model monitoring, drift detection, evaluation pipelines, or AI observability platforms Experience with workflow orchestration tools such as Airflow, Prefect , or similar Knowledge of security, governance, and compliance practices for AI/ML platforms Experience working in Agile / Scrum environments Previous experience in travel, aviation, digital platforms, or large-scale enterprise environments Hybrid model - 2 days onsite per week Why join this project? People first – diverse and inclusive culture in an international environment. Modern cloud platforms and large-scale, global projects. Be part of a high-impact environment where AI systems are moving from experimentation to production. ⚙️ Strong focus on engineering quality, automation, reliability, and scalability. ☁️ Hands-on exposure to AWS, MLOps, LLM tooling, and production AI infrastructure. Opportunity to shape how AI platforms are built, deployed, monitored, and scaled. High team stability and collaborative culture. €1200 per year training budget and continuous learning opportunities. Flexible compensation model. Private health insurance and benefits package. ⚡ Flexible working hours and hybrid model. ️ Wellhub: fitness, wellness, and mental health support. ⚽ Football and paddle tennis teams sponsored by Capitole. Team buildings, global events, and strong tech communities. Want to know more about us? Click here and discover all the details. Curious about our culture? Check out what people are saying about us on Glassdoor . We know that not every candidate will meet 100% of the requirements. If your profile doesn’t match perfectly but you believe you can add value, we’d still love to hear from you. Ready for the challenge? Apply now and help build scalable, reliable, production-ready AI platforms. Empowering People, Unlocking Innovation. Information Security Notice • The employee will have access to confidential information related to Capitole and the assigned project., • Compliance with internal security and information protection policies is mandatory., • NDA signature required.