Deep Reinforcement Learning Engineer (Principal)
hace 2 días
Girona
Friday Systems builds AI that allows industrial robots to adapt to dynamic warehouse environments. We focus on high-throughput palletizing and related tasks where classical approaches break down. Our stack is built around Deep Reinforcement Learning with modern sequence models. Tiny team, zero bureaucracy, direct impact, salary + equity. THE ROLE Own the DRL stack end-to-end: formulation → algorithm design → large-scale training → evaluation → deployment. You’ll work directly with the CTO to turn cutting-edge DRL into production throughput at customer sites. YOU WILL • Design & ship DRL algorithms (PPO/SAC/DDQN and variants, based on encoders/cross-attention/pointer networks) for complex control & combinatorial optimization., • Tackle stability & sample-efficiency: GAE, normalization, entropy/KL control, distributional/value-loss tuning, curriculum learning and reward shaping, …, • Launch multi-GPU training, parallel rollouts, efficient replay/storage, and reproducible experiment tooling., • Productionize: clean PyTorch code, profiling, Dockerized services (FastAPI), AWS deployments, experiment tracking, dashboards., • Collaborate with the C-Level Team to ensure product excellence and alignment with business strategy. Forge strong relationships with clients, effectively translating their needs into unique technology solutions., • Build and nurture a high-performing team by attracting top talent. Provide mentorship and leadership to foster a culture of quality and innovation. YOU HAVE • Track record shipping RL beyond academic demos: you’ve led at least one end-to-end RL system from idea to production or a state-of-the-art benchmark in the last 3–5 years., • Extensive Deep Learning, Reinforcement Learning & PyTorch expertise: You can implement several DRL algorithms from scratch, reason about root-cause performance drops and make informed decisions about next steps., • Systems know-how: Python, Linux, Docker, Multi-GPU, Cloud (AWS)., • Math maturity: MDPs/Bellman operators, policy gradients, trust-region/KL, GAE/λ-returns, stability/regularization in on-policy vs off-policy regimes., • Ownership: you’re comfortable being the primary owner for experiments, code quality, and results in a small team., • Location/time zone: EU-based (CET±2) and able to travel occasionally to customer warehouses. We are not considering entry-level or coursework-only profiles for this role. HIRING PROCESS • 30-min intro & mutual fit, • Deep technical session with CTO on your past RL work (no LeetCode, no homework), • Two one-hour “Traits & Skills” conversations with our other Co-founders., • Meet the team & offer