Sr. Machine Learning Ops Engineer (Director)
hace 17 horas
Los Angeles
Job Description:\n\nABOUT CIM GROUP: CIM is a community-focused real estate and infrastructure owner, operator, lender, and developer. Our team of experts works together to identify and create value in real assets, benefiting the communities in which we invest. Back in 1994, our three founders focused on projects in Southern California neighborhoods. Today, we are a diverse team of 900+ employees with projects across the Americas. Our projects have delivered jobs; created comfortable places to live, work, and relax; and provided necessary and sustainable infrastructure. Our focus on enhancing communities is unwavering, and we strive to make an even greater impact in the years to come. Join us and make an impact today! POSITION PURPOSE: The Senior ML Ops Engineer leads the design and maintenance of scalable, secure infrastructure for ML model deployment, lifecycle management, and Generative AI enablement. This role is responsible for building and operating the firm's ML Ops platform on Databricks, with a strategic focus on productionizing GenAI/LLM solutions including Retrieval-Augmented Generation (RAG) systems and vector database implementations. The Senior ML Ops Engineer ensures models transition from development to production while meeting regulatory and compliance standards. This role collaborates closely with Data Science, Platform Engineering, Information Architecture, and business vertical teams (Fund Accounting, Investor Relations, Investments) to accelerate ML-driven insights, enhance model accuracy, and govern the ML/AI ecosystem. Beyond technical execution, the role defines ML Ops strategy and architecture, addressing the "last mile" challenge of AI value realization by automating and scaling ML models and GenAI applications as tangible business assets. This role serves as the key pillar that enhances efficiency, boosts model accuracy, accelerates time-to-market for AI solutions, and ensures the scalability and robust governance of machine learning and generative AI initiatives.RESPONSIBILITIES:ML Model Deployment & Platform Management- Lead the design, implementation, and ongoing maintenance of scalable ML infrastructure on Databricks, including ML flow for experiment tracking, model registry, and model serving endpoints.- Oversee the development of the ML Ops platform and automated pipelines for deploying, monitoring, and maintaining models within production environments.- Implement robust solutions for model versioning, systematic retraining, and comprehensive artifact management using Databricks Unity Catalog for ML governance.- Design and manage Databricks Feature Store for consistent feature engineering across training and inference pipelines. Generative AI & LLM Operations- Architect and implement Retrieval-Augmented Generation (RAG) systems for document Q&A, enabling business teams to query fund documents, investor letters, and market research.- Design, deploy, and manage vector database solutions (Databricks Vector Search, Pinecone, or similar) for semantic search and retrieval across enterprise documents.- Lead LLM fine-tuning and customization initiatives, training models like Claude or open-source alternatives with CIM proprietary data while ensuring data privacy and compliance.- Develop and optimize document processing pipelines including PDF parsing, chunking strategies, and embedding generation for RAG applications.- Implement prompt engineering best practices and LLM evaluation frameworks to ensure output quality, relevance, and factual accuracy.- Build guardrails and safety measures for GenAI applications, including hallucination detection, output validation, and source attribution. Automation & CI/CD Pipelines- Design and implement extensive automation across the ML workflow, covering model training, testing, validation, and deployment using Databricks Workflows and Asset Bundles.- Set up robust CI/CD pipelines for both traditional ML models and GenAI applications, leveraging GitHub Actions, Azure DevOps, or similar tools.- Automate complex data and model workflows utilizing orchestration tools such as Airflow, Prefect, or Databricks Workflows. Monitoring, Performance & Reliability- Implement comprehensive monitoring and alerting systems for real-time tracking of model performance, data quality, and GenAI output quality.- Utilize specialized tools (Evidently AI, WhyLabs, Prometheus/Grafana) to proactively detect model drift, data quality anomalies, and RAG retrieval degradation.- Develop evaluation frameworks for GenAI applications including relevance scoring, faithfulness metrics, and human feedback loops.- Troubleshoot issues within production environments, including debugging model deployment failures, RAG retrieval issues, and LLM response quality problems. Data & Feature Engineering Support- Build and maintain sophisticated feature stores on Databricks, ensuring precise alignment between training and inference data pipelines.- Collaborate with data engineers and information architects to build robust ETL pipelines that feed into the Databricks Lakehouse.- Design embedding pipelines and vector index management strategies for RAG applications, including incremental updates and versioning. Security, Compliance & Trustworthy AI- Integrate robust security measures directly into ML Ops and GenAI pipelines, including access controls via Unity Catalog and data encryption.- Implement Trustworthy AI guardrails addressing bias detection, explainability, prompt injection prevention, and responsible AI practices.- Ensure GenAI applications handling sensitive fund and investor data comply with regulatory requirements and internal policies.- Collaborate with Legal and Compliance to establish AI governance policies and audit trails for model decisions. Collaboration & Business Partnership- Engage in extensive collaboration with data scientists, platform engineers, information architects, and DevOps teams to ensure seamless ML/AI integration.- Partner with business teams (Fund Accounting, FP&A, Investor Relations, Sales, Investments) to identify high-value AI use cases and translate business needs into technical solutions.- Communicate complex AI concepts in business terms, managing expectations and demonstrating ROI of ML/GenAI initiatives.- Provide technical mentorship to team members, including refactoring data scientist code for production readiness.EDUCATION/EXPERIENCE REQUIREMENTS: (including certification, licenses, etc.)Required:- Bachelor's or Master's degree in Computer Science, Engineering, Information Systems, or a related field.- 7+ years of experience as an ML Ops Engineer, ML Engineer, or similar role with production deployment responsibility.- Expert-level proficiency in Python, complemented by strong skills in Bash scripting.- Extensive experience designing and implementing cloud solutions on Azure (required) or GCP.- Deep expertise with Docker and Kubernetes for containerizing and orchestrating ML workloads.- Hands-on experience with CI/CD tools such as GitHub Actions, Jenkins, GitLab CI, or Azure DevOps.- Strong SQL proficiency and practical experience with Databricks platform.- Experience with workflow orchestration tools (Airflow, Prefect, or Databricks Workflows) and monitoring tools (Prometheus, Grafana, Evidently AI). GenAI/LLM Technical Requirements (NEW):- Demonstrated experience building and deploying RAG (Retrieval-Augmented Generation) systems in production environments.- Hands-on experience with vector databases (Databricks Vector Search, Pinecone, Weaviate, Chroma, or Milvus).- Experience with LLM APIs and frameworks (OpenAI, Anthropic Claude, LangChain, LlamaIndex).- Understanding of embedding models, chunking strategies, and retrieval optimization techniques.- Knowledge of prompt engineering best practices and LLM evaluation methodologies. Databricks Platform Requirements (NEW):- Experience with ML flow for experiment tracking, model registry, and model serving.- Familiarity with Databricks Feature Store and Unity Catalog for ML governance.- Understanding of Delta Lake and Lakehouse architecture for ML data pipelines.- Experience with Databricks Model Serving endpoints and inference optimization. Preferred:- Experience with LLM fine-tuning techniques (LoRA, QLoRA, full fine-tuning) on proprietary data.- Familiarity with ML frameworks including TensorFlow, PyTorch, Scikit-learn, XGBoost.- Experience with model serialization (ONNX) and inference optimization.- Prior experience within financial services, fintech, or private equity sectors.- Experience building ML/AI infrastructure from scratch in entrepreneurial environments.- Relevant certifications: Azure AI Engineer Associate, Databricks ML Professional, Google Cloud ML Engineer.ABOUT YOU:The ideal candidate demonstrates proven experience with model pipeline and registry tools, including the ability to detect and proactively prevent model drift, automate comprehensive model monitoring, and consistently ensure model accuracy. Experience with RAG systems, vector databases, and LLM deployment is essential for this role. Key Competencies:- Entrepreneurial Mindset: Comfortable building from scratch in environments without mature processes; thrives with ambiguity and takes initiative.- Business Partnership: Exceptional ability to translate complex AI/ML concepts for non-technical stakeholders across Fund Accounting, Investor Relations, and Investment teams.- Cross-Functional Collaboration: Proven track record of building relationships and influence without authority across technical and business functions.- Business Value Orientation: Focuses on ROI and business outcomes, not just technical elegance; can prioritize initiatives based on business impact.- Trustworthy AI Champion: Strong awareness of responsible AI practices, including bias mitigation, explainability, security, and compliance requirements.- Continuous Learner: Stays current with rapidly evolving GenAI landscape; actively experiments with new technologies and approaches. WHAT CIM OFFERS: At CIM, we believe our success stems from our collective efforts, and we are committed to providing well-rounded support and resources for our employees. In addition to a competitive compensation plan, CIM offers a comprehensive benefits program for employees to thrive both inside and outside of work. Eligible employees can enjoy a wide range of benefits, including: • A variety of Medical, dental, and vision benefit plans • Health Savings Account with a generous employer contribution • Company paid life and disability insurance • 401(k) savings plan, with company match • Comprehensive paid time off, including: vacation days, 10 designated holidays, sick time, and bereavement leave • Up to 16 hours of volunteer time off • Up to 16 weeks of Paid Parental Leave • Ongoing professional development programs • Wellness program, including monthly and quarterly prizes • And more! Actual base salary considers several factors including but not limited to geography, job-related knowledge, experience, and budget. The start of the salary range is typically associated with the minimum experience required. At CIM, base pay is one part of the total compensation package. For this role, bonus compensation may be a significant part of the total compensation. The anticipated base salary range for the position in Los Angeles, CA is $175,000 - $225,000. HOW WE FEEL ABOUT DIVERSITY AND INCLUSION: At CIM Group, we believe that the unique perspectives and backgrounds of our employees enhance everything we do. We are committed to fostering an inclusive environment where diversity is not only respected but celebrated. We strive to ensure that our workplace is free from discrimination and harassment, allowing everyone to contribute meaningfully and feel a sense of belonging. As an equal opportunity employer, we strictly prohibit any form of unlawful discrimination and adhere to the laws enforced by the EEOC. Our goal is to provide a safe and supportive environment where all employees can grow and make impactful contributions together. *Applicants with disabilities may be entitled to reasonable accommodation under the terms of the Americans with Disabilities Act and certain state or local laws. A reasonable accommodation is a change in the way things are normally done which will ensure an equal employment opportunity without imposing undue hardship on CIM Group. Please inform our Talent team if you need any assistance completing any forms or to otherwise participate in the application process. CIM is committed to maintaining the confidentiality and privacy of your personal and financial information. Please click here for our Privacy Policy. #LI-BL1 HOW WE FEEL ABOUT DIVERSITY AND INCLUSION: At CIM Group, we believe that the unique perspectives and backgrounds of our employees enhance everything we do. We are committed to fostering an inclusive environment where diversity is not only respected but celebrated. We strive to ensure that our workplace is free from discrimination and harassment, allowing everyone to contribute meaningfully and feel a sense of belonging. As an equal opportunity employer, we strictly prohibit any form of unlawful discrimination and adhere to the laws enforced by the EEOC. Our goal is to provide a safe and supportive environment where all employees can grow and make impactful contributions together. *Applicants with disabilities may be entitled to reasonable accommodation under the terms of the Americans with Disabilities Act and certain state or local laws. A reasonable accommodation is a change in the way things are normally done which will ensure an equal employment opportunity without imposing undue hardship on CIM Group. Please inform our Talent team if you need any assistance completing any forms or to otherwise participate in the application process. CIM does not accept unsolicited resumes from Agencies. Any unsolicited resumes received from Agencies will be considered property of CIM and no fees will be due or paid. If you wish to become an approved Agency with CIM or any of its Affiliates, please contact a member of the CIM Talent Acquisition Team.