Senior Machine Learning Engineer | AI Tech | Spain
hace 1 día
Las Palmas de Gran Canaria
Senior Applied Research Engineer | Barcelona, Spain We’re partnering with a high-growth AI company building cutting-edge foundation models designed to solve complex enterprise decision-making problems. This is an opportunity to join a deeply technical team working on large-scale model training, efficiency, and productionisation. The environment is highly research-driven, but with a strong focus on real-world performance, scalability, and impact. The role You’ll work on the performance and efficiency of large-scale training workloads, helping to improve how advanced models are trained, scaled, optimised, and served in production. This role sits at the intersection of research and systems engineering , with a strong focus on distributed training, profiling, memory optimisation, and model efficiency. This is the chance to work on genuinely hard problems in foundation model development with meaningful real-world application. What you’ll be doing Profile end-to-end distributed training runs to identify bottlenecks across compute, GPU memory, and inter-GPU communication Improve the efficiency and reliability of large-scale training jobs, including contributing to architectural decisions and developing Triton/CUDA kernels where needed Design and implement model scaling, parallelisation, and memory optimisation techniques for very large context training workloads Partner closely with ML Researchers to diagnose inefficiencies, ensure new ideas scale effectively, and share best practice around model performance Support the productionisation and serving of models from the research side, including improving inference efficiency through techniques such as quantisation Barcelona, Spain - hybrid working [Firm will offer relocation support for Barcelona] Highly competitive salary + benefits + equity ⏱️ Permanent Role What you bring: Strong understanding of modern ML architectures and large-scale training pipelines Experience running distributed training jobs across multi-GPU systems Advanced profiling and debugging skills across CPU, GPU, memory, latency, and inter-GPU communication Strong Python skills Experience with model scaling and parallelisation approaches, including tensor and pipeline parallelism Familiarity with NCCL, MPI, and distributed communication primitives - highly desirable Knowledge of PyTorch and Triton internals - highly desirable Experience with C++ and CUDA - highly desirable If you are interested in this role, please respond directly to this advert with your updated CV or email it to