Remote CUDA Kernel Optimizer - ML Engineer - AI Trainer ($120-$250 per hour)
1 month ago
Hempstead
1) Role Overview Mercor is engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize throughput while maintaining correctness and reproducibility, ### 2) Key Responsibilities - Develop, tune, and benchmark CUDA kernels for tensor and operator workloads. - Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling. - Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools. - Report performance metrics, analyze speedups, and propose architectural improvements. - Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks. - Produce well-documented, reproducible benchmarks and performance write-ups. ### 3) Ideal Qualifications - Deep expertise in CUDA programming, GPU architecture, and memory optimization. - Proven ability to achieve quantifiable performance improvements across hardware generations. - Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations. - Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial). - Strong communication skills and independent problem-solving ability. - Demonstrated open-source, research, or performance benchmarking contributions. ### 4) More About the Opportunity - Ideal for independent contractors who thrive in performance-critical, systems-level work. - Engagements focus on measurable, high-impact kernel optimizations and scalability studies. - Work is fully remote and asynchronous; deliverables are outcome-driven. - Access to shared benchmarking infrastructure and reproducibility tooling via Mercor support resources. ### 5) Compensation & Contract Terms - Typical range: $120–$250/hour, depending on scope, specialization, and results achieved. Payments will be based on accepted task output over flat hourly. - Structured as a contract-based engagement, not an employment relationship. - Compensation tied to measurable deliverables or agreed milestones. - Confidentiality, IP, and NDA terms as defined per engagement. ### 6) Application Process - Submit a brief overview of prior CUDA optimization experience, profiling results, or performance reports. - Include links to relevant GitHub repos, papers, or benchmarks if available. - Indicate your hourly rate, time availability, and preferred engagement length. - Selected experts may complete a small, paid pilot kernel optimization project ### 7) About Mercor - Mercor connects domain experts with top AI research and technology organizations through project-based contracts. - Contractors operate independently, with full flexibility over methods, timelines, and tools. - Our mission is to help top engineers and researchers access frontier technical work without rigid employment structures.