Senior Applied Research Engineer | AI | Spain (Barcelona) | Hybrid
hace 6 días
Profile end-to-end distributed training runs to identify bottlenecks across compute, GPU memory, and inter-GPU communication. Advanced profiling and debugging across CPU, GPU, memory usage, latency, and inter-GPU communication. Familiarity with NCCL, MPI, and distributed communication primitives.