Senior DevOps Engineer - AI Platform
hace 2 días
Westlake Village
JOB DETAILS: Sr DevOps Engineer - AI platform Location – Westlake, Village, CA (Onsite Work) Contract Duration – 6 months contract to hire full time employment Hourly Rate: $60 - $72/hr on W2 contract. Job Description: Responsibilities: The Sr DevOps Engineer - AI platform will: • Design, implement, and manage scalable and resilient infrastructure on AWS., • Architect and maintain Windows/Linux based environments, ensuring seamless integration with cloud platforms., • Develop and maintain infrastructure-as-code(IaC) using both AWS Cloudformation/CDK and Terraform/OpenTofu., • Develop and maintain Configuration Management for Windows & Linux servers using Chef., • Design, build, and optimize CI/CD pipelines using GitLab CI/CD for .NET applications., • Integrate and support AI services, including orchestration with AWS Bedrock, Google Agentspace, and other generative AI frameworks, ensuring they can be securely and efficiently consumed by platform services., • Enable AI/ML workflows by building and optimizing infrastructure pipelines that support large-scale model training, inference, and deployment across AWS and GCP environments., • Automate model lifecycle management (training, deployment, monitoring) through CI/CD pipelines, ensuring reproducibility and seamless integration with development workflows., • Collaborate with AI engineering teams to deliver scalable environments, standardized APIs, and infrastructure that accelerate AI adoption at the platform level., • Implement observability, security, data privacy and cost-optimization strategies specifically for AI workloads, including monitoring and resource scaling for inference services., • Implement and enforce security best practices across the infrastructure and deployment processes., • Collaborate closely with development teams to understand their needs and provide DevOps expertise., • Troubleshoot and resolve infrastructure and application deployment issues., • Implement and manage monitoring and logging solutions to ensure system visibility and proactive issue detection., • Clearly and concisely contribute to the development and documentation of DevOps standards and best practices., • Stay up-to-date with the latest industry trends and technologies in cloud computing, DevOps, and security., • Provide mentorship and guidance to junior team members. Qualifications: • Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience)., • 5+ years of experience in a DevOps or Site Reliability Engineering (SRE) role., • 1+ year(s) of experience with AI services & LLMs., • Extensive hands-on experience with Amazon Web Services (AWS), • Solid understanding of Windows/Linux Server administration and integration with cloud environments., • Proven experience with infrastructure-as-code tools, specifically AWS CDK and Terraform., • Strong experience designing and implementing CI/CD pipelines using GitLab CI/CD., • Experience deploying and managing .NET applications in cloud environments., • Deep understanding of security best practices and their implementation in cloud infrastructure and CI/CD pipelines., • Solid understanding of networking principles (TCP/IP, DNS, load balancing, firewalls) in cloud environments., • Experience with monitoring and logging tools (e.g., NewRelic, CloudWatch)., • Strong scripting skills (e.g., PowerShell, Python, Ruby, Bash)., • Excellent problem-solving and troubleshooting skills., • Strong communication and collaboration skills., • Experience with containerization technologies (e.g., Docker, Kubernetes) is a plus., • Relevant AWS and/or GCP certifications are a plus., • Experience with the configuration management tool Chef Preferred Qualifications: • Knowledge of and a strong understanding of Powershell and Python Scripting, • Strong background with AWS EC2 features and Services (Autoscaling and WarmPools), • Understanding of Windows server Build process using tools like Chocolaty for packages and Packer for AMI/Image generation., • Extensive hands-on experience with Amazon Web Services (AWS)