Data Engineer – Full Stack
12 hours ago
New York
Job Description: Design and implement end-to-end data pipelines (ETL/ELT) that ingest, process, and curate large-scale enterprise data, including telemetry/vehicle data and other structured/unstructured sources. Build and maintain Gen AI pipelines — including embedding generation, vector store indexing, retrieval-augmented generation (RAG), and LLM orchestration — to enable intelligent search, summarization, and conversational analytics over enterprise data. Migrate and modernize data assets to a centralized data platform (e.g., BigQuery) using principled data lake/warehouse architectures (Bronze/Silver/Gold or Medallion architecture) to power analytics, reporting, and AI/ML workloads. Architect scalable data models and data warehouses, optimizing for query performance, maintainability, cost efficiency, and downstream AI consumption. Develop and operate robust orchestration pipelines using Airflow/Astronomer or Schedule Query, with secure, reproducible CI/CD workflows (Terraform + Git) for both data and AI artifacts. Integrate LLM APIs and AI services (e.g., Vertex AI, OpenAI, LangChain) into data workflows to automate data enrichment, classification, anomaly narratives, and natural-language interfaces. Build and maintain reliable data and model quality checks, lineage, and monitoring with observability tools (e.g., Splunk, Looker/Grafana/Tableau/Power BI dashboards) to rapidly detect and resolve data and AI pipeline issues. Implement data governance, security, and compliance controls (data lineage, access controls, PII/PHI protection, prompt injection safeguards, responsible AI guardrails) in collaboration with security and privacy teams. Lead the design and delivery of analytics-ready and AI-ready data assets for cross-functional teams, including dashboards, alerts, self-service analytics, and AI-powered insight tools. Evaluate, prototype, and productionize emerging Gen AI capabilities (agents, function calling, fine-tuning, multimodal models) to solve business problems and improve platform intelligence. Mentor and coach junior engineers on data engineering, AI/ML integration patterns, prompt engineering best practices, and documentation standards. Collaborate with data scientists, ML engineers, product managers, and business stakeholders to translate requirements into scalable data and AI solutions and timely insights. Monitor cost and capacity planning for cloud and AI resources; optimize storage, compute, and token usage across GCP services (BigQuery, Dataflow, Dataproc, GCS, Vertex AI). Participate in on-call rotations and incident response to maintain high availability of data and AI services. Requirements: A bachelor's degree 5+ years of experience in data engineering, data platforms, or a similar role. 3+ years of hands-on experience with Google Cloud Platform (BigQuery, Cloud Storage, Dataflow, Dataproc; Schedule Query or equivalent scheduling/orchestration) or AWS. 1+ years of experience working with Generative AI technologies — including LLMs, embeddings, vector databases, RAG architectures, or AI orchestration frameworks (e.g., LangChain, Semantic Kernel, LlamaIndex). 1+ year experience building Semantic Data layer to serve AI agents. Practical experience building and operating data pipelines with orchestration tools (Airflow/Astronomer; Schedule Query). Experience with infrastructure-as-code and CI/CD (Terraform, Git, and related tooling). Demonstrated ability to design and implement analytics-ready data assets and dashboards; familiarity with BI tools (Looker, Tableau, Power BI, Grafana) for monitoring and reporting. Strong communication skills and ability to work effectively with cross-functional teams (engineering, analytics, product, security). Benefits: Immediate medical, dental, vision and prescription drug coverage Flexible family care days, paid parental leave, new parent ramp-up programs, subsidized back-up child care and more Family building benefits including adoption and surrogacy expense reimbursement, fertility treatments, and more Vehicle discount program for employees and family members and management leases Tuition assistance Established and active employee resource groups Paid time off for individual and team community service A generous schedule of paid holidays, including the week between Christmas and New Year’s Day Paid time off and the option to purchase additional vacation time