Senior Data Engineer
hace 8 horas
Málaga
As Senior Data Lead Engineer in GTB (SCIB), from our Málaga office you will drive the evolution of our data, AI and BI platforms on the cloud. You will: · Lead the Data, AI & BI roadmap for GTB ensuring scalability, resilience, security and cost efficiency. · Design and evolve our data lakehouse architecture. · Define and build domain-oriented data products aligned with data mesh principles (data-as-a-product, SLAs, contracts, domain ownership). · Build and maintain data ingestion, ETL and transformation pipelines, including CDC-based and event-driven ingestion. · Integrate cloud platforms with the SCIB on-premise data lake, ensuring cataloguing, lineage, governance and security follow CDAIO best practices. · Implement and enforce data governance, data rules, data cleaning/normalisation and data guardrails. · Provide high-quality, well-modelled datasets and semantic layers for BI, KPI definition and data visualisationin collaboration with BI and business teams. · Enable AI/ML and LLM use cases (training, feature engineering, RAG, fine-tuning, guardrails, monitoring). · Promote engineering best practices and act as a technical leader and mentor for data, ML and BI engineers. · Work closely with GTB and SCIB stakeholders to prioritise and deliver high-impact data and AI initiatives. EXPERIENCE · 5+ years in Data Engineering / Data Platform / AI Engineering / Advanced Analytics, ideally in large or regulated organisations. · Proven experience designing and building cloud data platforms and data lakehouse architectures (preferably AWS). · Hands-on experience with Databricks or EMR for large-scale data transformation. · Strong background in data ingestion and ETL. · Experience with CDC-based pipelines and event-driven architectures for ingesting operational/transactional data. · Experience integrating on-premise data lakes with cloud platforms in hybrid architectures. · Experience enabling AI/ML solutions in production. · Hands-on involvement in data governance, data quality, data rules and guardrails. · Experience working with BI and business stakeholders on KPI design and data modelling for reporting/visualisation. EDUCATION · Bachelor’s degree (or higher) in Computer Science, Engineering, Mathematics or a related technical discipline. · Additional training in Data Engineering, AI/ML or Analytics is a plus. SKILLS & KNOWLEDGE · AWS: S3, Lake Formation, Glue (Jobs, Crawlers, Data Wrangler), EMR. · Data formats and lakehouse: Parquet, Apache Iceberg / Delta-style, curated layers (raw, curated, semantic). · Databricks: Spark (PySpark/Scala), notebooks, clusters, jobs, Delta tables, performance optimisation, MLflow, feature store. · Strong SQL and Python for data processing, ETL and automation. · Experience with data quality, lineage and observability (metrics, logging, alerts, data tests). · Design of data lakehouse architectures (separation of storage/compute, multi-zone design). · CDC patterns to ingest and keep synchronised changes from GTB operational systems. · Understanding of data mesh: domain ownership, data-as-a-product, federated governance. · Design and operation of hybrid (on-prem + cloud) architectures integrated with SCIB’s data lake, aligned with CDAIO. · General knowledge of machine learning algorithms (regression, classification, clustering, time-series, recommendations, anomaly detection). · Experience supporting ML workflows (feature engineering, training, validation, deployment, monitoring) on Databricks ML, EMR (Spark ML) or SageMaker. · Knowledge of LLM training and adaptation (prompt engineering, fine-tuning, RAG, evaluation and feedback loops). · Familiarity with Quick Suite and Amazon Bedrock to expose LLM capabilities (Q&A, summarisation, agents) with appropriate guardrails and risk controls. · Strong understanding of BI concepts and how data is consumed in dashboards and reports. · Experience working with business stakeholders to define KPIs and metrics (volumes, balances, revenues, risk, SLAs, operational KPIs). · Ability to design semantic/logical data layers optimised for BI tools (star schemas, wide tables, aggregation layers). · Awareness of data storytelling and visual best practices (drill-down, segmentation, trend/exception views). · Practical understanding of data governance in large organisations. · Experience defining and implementing data rules (validations, thresholds, completeness, timeliness, referential integrity). · Hands-on work in data cleaning, standardising and normalising datasets. · Knowledge of data guardrails: access control, masking, anonymisation, segregation of environments, safe AI/analytics usage. SOFT SKILLS · Strong communication skills, able to explain data, AI and architecture topics to technical and non-technical audiences. · Ability to influence and align multiple teams without direct authority. · Proven leadership and mentoring of data, ML and BI engineers. · Proactive, hands-on, outcome-oriented mindset focused on business value and reliability. · High adaptability and resilience in a complex, global and regulated environment. · Collaborative and team-oriented, building trusted relationships (business, product, risk, technology). NICE TO HAVE · AWS certifications: Data Analytics, Machine Learning, Solutions Architect. · Databricks certifications (e.g. Data Engineer, Machine Learning). · Experience with orchestration tools and CI/CD for data and ML pipelines. · Knowledge of Infrastructure as Code for data & AI platforms. · Experience with BI tools (QuickSight, Power BI, Qlik, etc.) from a data provider perspective. · Experience working with Agile methodologies and tooling (JIRA, Confluence).