Generative AI Engineer
hace 5 días
Madrid
Founded in 2019, our client had grown into one of Europe’s most recognized deep-tech scale-ups, backed by major global strategic investors and EU innovation funds. Their quantum and AI technologies had already transformed how enterprise clients built and deployed intelligent systems — achieving up to 95% model compression and 50–80% inference cost reduction. The company was recognized by CB Insights (2023 & 2025) as one of the Top 100 most promising AI companies globally, often described as a “quantum–AI unicorn in the making.” Role Highlights The AI Evaluation Data Scientist was responsible for: • Designing and leading evaluation strategies for Agentic AI and RAG systems, translating complex workflows into measurable performance metrics., • Developing multi-step task-based evaluations to capture reasoning quality, factual accuracy, and end-user success in real-world scenarios., • Building reproducible evaluation pipelines with automated test suites, dataset tracking, and performance versioning., • Curating and generating synthetic and adversarial datasets to strengthen system robustness., • Implementing LLM-as-a-judge frameworks aligned with human feedback., • Conducting error analysis and ablations to identify reasoning gaps, hallucinations, and tool-use failures., • Collaborating with ML engineers to create a continuous data flywheel linking evaluation outcomes to product improvements., • Defining and monitoring operational metrics such as latency, reliability, and cost to meet production standards., • Maintaining high standards in engineering, documentation, and reproducibility. Candidate Profile • Master’s or Ph.D. in Computer Science, Machine Learning, Physics, Engineering, or related field., • 3+ years (mid-level) or 5+ years (senior) of experience in Data Science, ML Engineering, or Research roles in applied AI/ML projects., • Proven experience designing and implementing evaluation methodologies for machine learning or Generative AI systems., • Hands-on experience with LLMs, RAG pipelines, and agentic architectures., • Proficiency in Python, Git, Docker, and major ML frameworks (PyTorch, HuggingFace, LangGraph, LlamaIndex)., • Familiarity with cloud environments (AWS preferred)., • Excellent communication skills and fluency in English. Preferred • Ph.D. in a relevant technical discipline., • Experience with synthetic data generation, adversarial testing, and multi-agent evaluation frameworks., • Strong background in LLM error analysis and reliability testing., • Open-source contributions or publications related to AI evaluation., • Fluency in Spanish. Contract Details • Location: Madrid or Barcelona, • Type: Fixed-term (until June 2026), • Work Model: Hybrid (3 days onsite, 2 remote), • Seniority: Associate, • Department: Technical Compensation and Benefits • Competitive salary package., • Signing and retention bonuses., • Relocation support where applicable., • Flexible working hours and equal pay guarantee., • Inclusive, international, and innovation-driven environment.