Remote | Software Engineering, Data Science, and Design Experts -- $60-$100/hour
11 days ago
San Francisco
Job Description We are sharing a specialised part-time consulting opportunity for experienced software engineering, data science, and systems design professionals with strong technical depth, real-world engineering experience, and the ability to evaluate AI-generated coding and technical reasoning outputs at a high level. This role supports an exciting collaboration with leading AI teams focused on improving the quality, usefulness, and reliability of general-purpose conversational AI systems across coding, software engineering, and technical problem-solving contexts. Selected professionals will evaluate model-generated responses to coding and engineering queries, validate technical accuracy through fact-checking and code execution, identify conceptual or logical issues, and help improve how advanced AI systems reason about code, generate solutions, and explain technical concepts across a variety of tasks and complexity levels. Key Responsibilities Professionals in this role may contribute to: Technical Evaluation & Response Review Evaluate LLM-generated responses to coding and software engineering queries for accuracy, reasoning, clarity, and completeness Assess model responses across programming, data science, and systems design tasks of varying complexity Ensure model outputs align with expected conversational behavior and system guidelines Code Validation & Fact-Checking Conduct fact-checking using trusted public sources and authoritative references Execute code and validate outputs using appropriate tools to test correctness and reliability Assess code quality, readability, algorithmic soundness, and explanation quality Annotation, Feedback & Quality Improvement Annotate model responses by identifying strengths, weaknesses, and factual or conceptual inaccuracies Identify subtle bugs, logical flaws, inefficiencies, edge cases, and misleading explanations Apply consistent evaluation standards using defined taxonomies, benchmarks, and detailed evaluation guidelines Produce reproducible evaluation artifacts that help improve model performance and reliability Ideal Profile Strong candidates may have: A BS, MS, or PhD in Computer Science or a closely related field 5+ years of real-world experience in software engineering, data science, systems design, or related technical roles Expertise in at least two relevant programming languages such as Python, Java, C++, C, JavaScript, Go, Rust, Ruby, SQL, PowerShell, Bash, Swift, Kotlin, R, TypeScript, or HTML/CSS The ability to independently solve HackerRank or LeetCode medium- and hard-level problems Experience contributing to well-known open-source projects, including merged pull requests Significant experience using LLMs while coding and a strong understanding of their strengths and failure modes Strong attention to detail and comfort evaluating complex technical reasoning and subtle implementation flaws Fluent English language skills Preferred qualifications Prior experience with RLHF, model evaluation, or data annotation work Track record in competitive programming Experience reviewing code in production environments Familiarity with multiple programming paradigms or technical ecosystems Ability to explain complex technical concepts clearly to non-expert audiences Why This Opportunity Contribute specialised technical expertise to a high-impact AI collaboration Help improve how advanced AI systems reason about code, software engineering, and technical problem-solving Work on evaluation and model improvement tasks that directly shape AI systems used by developers worldwide Flexible remote work with strong hourly compensation Contract Details Independent contractor role Fully remote with flexible scheduling Open to both US-based and non-US-based professionals Full-time or part-time contract work options available Hourly compensation of $60–$100 per hour Fluent English language skills required Projects may be extended, shortened, or concluded early depending on project needs and performance Weekly payments via Stripe or Wise Work will not involve access to confidential or proprietary information from any employer, client, or institution Please note: We are unable to support H1-B or STEM OPT candidates at this time Start date: Immediate About the Platform This opportunity is available through a leading AI-driven work platform that connects domain experts with frontier AI research projects. Experts contribute to improving advanced AI systems by providing specialised expertise across real-world workflows, structured evaluation, model training support, and domain-specific technical reasoning. By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: