Platform Engineer
hace 2 días
Barcelona
About the role As an AI Platform Engineer , you will invent and industrialize how Project team uses Databricks, machine learning, and data. You will build, deploy, and evolve next-generation platform capabilities at scale, partnering with data scientists, ML engineers, and platform teams to deliver a best-in-class developer experience on the Lakehouse ¿Listo para inscribirse? Antes de hacerlo, asegúrese de leer todos los detalles pertenecientes a este trabajo en la descripción a continuación. What candidate is needed • Proficient in cloud operations on AWS, with strong understanding of scaling infrastructure and optimizing cost/performance., • Proven hands-on experience with Databricks on AWS: workspace administration, cluster and pool management, job orchestration (Jobs/Workflows), repos, secrets, and integrations., • Strong experience with Databricks Unity Catalog: metastore setup, catalogs/schemas, data lineage, access control (ACLs, grants), attribute-based access control, and data governance., • Expertise in Infrastructure as Code for Databricks and AWS using Terraform (databricks and aws providers) and/or AWS CloudFormation; experience with Databricks asset bundles or CLI is a plus., • Experience implementing CI/CD and GitOps for notebooks, jobs, and ML assets using GitHub and GitHub Actions (or GitLab/Jenkins), including automated testing and promotion across workspaces., • Ability to structure reusable libraries, package and version code, and enforce quality via unit/integration tests and linting. Proficiency with SQL for Lakehouse development., • Experiment tracking, model registry, model versioning, approval gates, and deployment to batch/real-time endpoints (Model Serving)., • AWS IAM/STS, PrivateLink/VPC, KMS encryption, Secrets, SSO/SCIM provisioning, and monitoring/observability (CloudWatch/Datadog/Grafana)., • Experience with DevOps practices to enable automation strategies and reduce manual operations., • Experience or awareness of MLOps practices; building pipelines to accelerate and automate machine learning will be viewed favorably., • Excellent communication, cross-functional collaboration, and stakeholder management skills., • Design and implement scalable Databricks platform solutions to support analytics, ML, and GenAI workflows across environments (dev/test/prod)., • Administer and optimize Databricks workspaces: cluster policies, pools, job clusters vs. all-purpose clusters, autoscaling, spot/fleet usage, and GPU/accelerated compute where applicable., • Implement Unity Catalog governance: define metastores, catalogs, schemas, data sharing, row/column masking, lineage, and access controls; integrate with enterprise identity and audit., • Build IaC for reproducible platform provisioning and configuration using Terraform; manage config-as-code for cluster policies, jobs, repos, service principals, and secret scopes., • Implement CI/CD for notebooks, libraries, DLT pipelines, and ML assets; automate testing, quality gates, and promotion across workspaces using GitHub Actions and Databricks APIs., • Standardize experiment structure, implement model registry workflows, and deploy/operate model serving endpoints with monitoring and rollback., • Develop and optimize Delta Lake pipelines (batch and streaming) using Auto Loader, Structured Streaming, and DLT; enforce data quality and SLAs with expectations and alerts., • Optimize cost and performance: rightsize clusters and pools, enforce cluster policies and quotas, manage DBU consumption, leverage spot/fleet, and implement chargeback/showback reporting., • Integrate observability: metrics/logs/traces for jobs, clusters, and model serving; configure alerting, on-call runbooks, and incident response to reduce MTTR., • Ensure platform security and compliance: VPC design, PrivateLink, encryption at rest/in transit, secrets management, vulnerability remediation, and audit readiness; align with internal security standards and, where applicable, GxP controls., • Collaborate with cross-functional teams to integrate the Databricks platform with data sources, event streams, downstream applications, and AI services on AWS., • Conduct technical research, evaluate new Databricks features (e.g., Lakehouse Federation, Vector Search, Mosaic AI), and propose platform improvements aligned to roadmap., • Hands-on Databricks administration on AWS, including Unity Catalog governance and enterprise integrations., • Strong AWS foundation: networking (VPC, subnets, SGs), IAM roles and policies, KMS, S3, CloudWatch; EKS familiarity is a plus but not required for this Databricks-focused role., • Proficiency with Terraform (including databricks provider), GitHub, and GitHub Actions., • Strong Python and SQL; experience packaging libraries and working with notebooks and repos., • Experience with MLflow for tracking and model registry; experience with model serving endpoints preferred. xcskxlj, • Familiarity with Delta Lake, Auto Loader, Structured Streaming, and DLT., • Experience implementing DevOps automation and runbooks; comfort with REST APIs and Databricks CLI., • Git and GitHub proficiency; code review and branching strategies.