Systems Research Engineer
hace 3 días
Edinburgh
Systems Research Engineer European Tech Recruit are working closely with a leading telecommunications & research company, based in Edinburgh, who are looking for a talented Systems Research Engineer to join their team. In this role you will join a research centre driving new AI Infra & Agentic Serving architectures and helping define the next-generation large-scale data centre and AI infrastructure systems. Positioned at the intersection of advanced systems research and industrial-scale engineering, our client's teams turn innovative system designs into deployable, real-world technologies. This role is ideal for recent PhD graduates looking to build research-driven engineering experience in areas such as operating systems, distributed systems, AI model serving, and machine learning infrastructure. You will work closely with senior architects on real-world projects, helping to prototype and optimize next-generation AI infrastructure. Responsibilities as Systems Research Engineer: • Distributed Systems Research & Development: Architect, implement, and evaluate distributed system components for emerging AI and data-centric workloads. Drive modular design and scalability across CPU, GPU, and NPU clusters, building highly efficient serving and scheduling systems., • Performance Optimization & Profiling: Conduct in-depth profiling and performance tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems., • Scalable Model Serving Infrastructure: Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant AI serving across distributed environments. Research and prototype new techniques for cache sharing, data locality, and resource orchestration and scheduling within AI clusters., • Research & Publications: Translate innovative research ideas into publishable contributions at leading venues (e.g., OSDI, NSDI, EuroSys, SoCC, MLSys, NeurIPS, ICML, ICLR) while driving internal adoption of novel methods and architectures., • Cross-Team Collaboration: Communicate technical insights, research progress, and evaluation outcomes effectively to multidisciplinary stakeholders and global research teams. Requirements: • PhD in systems, distributed computing, or large-scale AI infrastructure., • Strong knowledge of distributed systems, operating systems, machine learning systems architecture, Inference serving, and AI Infrastructure., • Hands-on experience with LLM serving frameworks (e.g., vLLM, Ray Serve, TensorRT-LLM, TGI) and distributed KV cache optimization., • Proficiency in C/C++, with additional experience in Python for research prototyping., • Solid grounding in systems research methodology, distributed algorithms, and profiling tools., • Team-oriented mindset with effective technical communication skills. Desirable Experience: • Publications in top-tier systems or ML conferences (NSDI, OSDI, EuroSys, SoCC, MLSys, NeurIPS, ICML, ICLR)., • Understanding of load balancing, state management, fault tolerance, and resource scheduling in large-scale AI inference clusters., • Prior experience designing, deploying, and profiling high-performance cloud or AI infrastructure systems. If this role is of any interest please apply directly on LinkedIn or send a copy of your CV to nh@eu-recruit.com. By applying to this role you understand that we may collect your personal data and store and process it on our systems. For more information please see our Privacy Notice (https://eu-recruit.com/about-us/privacy-notice/)