ML Data Engineer #978695
5 days ago
Seffner
Job Title: Data Engineer – AI/ML Pipelines Location: Seffner, FL Work Model: Hybrid Duration: CTH Position Summary The Data Engineer – AI/ML Pipelines plays a key role in designing, building, and maintaining scalable data infrastructure that powers analytics and machine learning initiatives. This position focuses on developing production-grade data pipelines that support end-to-end ML workflows—from data ingestion and transformation to feature engineering, model deployment, and monitoring. The ideal candidate has hands-on experience working with operational systems such as Warehouse Management Systems (WMS) or ERP platforms, and is comfortable partnering closely with data scientists, ML engineers, and operational stakeholders to deliver high-quality, ML-ready datasets. Key Responsibilities ML-Focused Data Engineering • Build, optimize, and maintain data pipelines specifically designed for machine learning workflows., • Collaborate with data scientists to develop feature sets, implement data versioning, and support model training, evaluation, and retraining cycles., • Ingest, normalize, and transform data from WMS, ERP, telemetry, and other operational data sources., • Build automated, reliable, and scalable pipelines using tools such as Azure Data Factory, Airflow, or Databricks Workflows., • Implement validation frameworks, anomaly detection, and reconciliation processes to ensure high-quality ML inputs., • Work closely with data scientists, ML engineers, software engineers, and business teams to gather requirements and deliver ML-ready datasets., • Document data flows, data mappings, and pipeline logic in a clear, reproducible format., • Provide guidance and mentorship to junior engineers and analysts on ML-focused data engineering best practices. Required Qualifications Technical Skills • Strong experience building ML-focused data pipelines, including feature engineering and model lifecycle support., • Proficiency in Python, SQL, and modern data transformation tools (dbt, Spark, Delta Lake, or similar)., • Solid understanding of orchestrators and cloud data platforms (Azure, Databricks, etc.)., • Familiarity with ML operations tools such as MLflow, TFX, or equivalent frameworks., • 5+ years in data engineering, with at least 2 years directly supporting AI/ML applications or teams., • Experience designing and maintaining production-grade pipelines in cloud environments., • Bachelor’s degree in Computer Science, Data Engineering, Data Science, or a related field (Master’s preferred)., • Experience with real-time ingestion using Kafka, Kinesis, Event Hub, or similar., • Exposure to MLOps practices and CI/CD for data pipelines., • Background in logistics, warehousing, fulfillment, or similar operational domains.