Big Data Developer
21 hours ago
New York
We’re looking for a seasoned Senior Data Engineer with strong Hadoop to design, build, and scale data pipelines and platforms powering analytics, AI/ML, and business operations. You’ll own end-to-end data engineering—from ingestion and transformation to performance optimization—across large-scale distributed systems and modern cloud data platforms. Key Responsibilities • Design & Build Data Pipelines: Architect, develop, and maintain robust ETL/ELT pipelines for batch and streaming data using Hadoop ecosystem, Spark, and Airflow., • Big Data Architecture: Define and implement scalable big data architectures, ensuring reliability, fault tolerance, and cost efficiency., • Data Modeling: Develop and optimize data models for Data Warehouse and Operational Data Store (ODS); ensure conformed dimensions and star/snowflake schemas where appropriate., • SQL Expertise: Write, optimize, and review complex SQL/HiveQL queries for large datasets; enforce query standards and patterns., • Performance Tuning: Optimize Spark jobs, SQL queries, storage formats (e.g., Parquet/ORC), partitioning, and indexing to improve latency and throughput., • Data Quality & Governance: Implement data validation, lineage, cataloging, and security controls across environments., • Workflow Orchestration: Build and manage DAGs in Airflow, ensuring observability, retries, alerting, and SLAs., • Cross-functional Collaboration: Partner with Data Science, Analytics, and Product teams to deliver reliable datasets and features., • Best Practices: Champion coding standards, CI/CD, infrastructure-as-code (IaC), and documentation across the data platform. Required Qualifications • 7+ years of hands-on data engineering experience building production-grade pipelines., • Strong experience with Hadoop (HDFS, YARN), Hive SQL/HiveQL, Spark (Scala/Java/PySpark), and Airflow., • Expert-level SQL skills with the ability to write and tune complex queries on large datasets., • Solid understanding of Big Data architecture patterns (e.g., lakehouse, data lake + warehouse, CDC)., • Deep knowledge of ETL/ELT and DW/ODS concepts (slowly changing dimensions, partitioning, columnar storage, incremental loads)., • Proven track record in performance tuning for large-scale systems (Spark jobs, shuffle optimizations, broadcast joins, skew handling)., • Strong programming background in Java and/or Scala (Python is a plus). Preferred Skills • Experience with AI-driven data processing (feature engineering pipelines, ML-ready datasets, model data dependencies)., • Hands-on with cloud data platforms (AWS, GCP, or Azure)—services like EMR/Dataproc/HDInsight, S3/GCS/ADLS, Glue/Dataflow, BigQuery/Snowflake/Redshift/Synapse., • Exposure to NoSQL databases (Cassandra, HBase, DynamoDB, MongoDB)., • Advanced data governance & security (row/column-level security, tokenization, encryption at rest/in transit, IAM/RBAC, data lineage/catalog)., • Familiarity with Kafka (topics, partitions, consumer groups, schema registry, stream processing)., • Experience with CI/CD for data (Git, Jenkins/GitHub Actions, Terraform), containerization (Docker, Kubernetes)., • Knowledge of metadata management and data observability (Great Expectations, Monte Carlo, OpenLineage). Life at Capgemini: Capgemini supports all aspects of your well-being throughout the changing stages of your life and career. For eligible employees, we offer: Flexible work Healthcare including dental, vision, mental health, and well-being programs Financial well-being programs such as 401(k) and Employee Share Ownership Plan Paid time off and paid holidays Paid parental leave Family building benefits like adoption assistance, surrogacy, and cryopreservation Social well-being benefits like subsidized back-up child/elder care and tutoring Mentoring, coaching and learning programs Employee Resource Groups Disaster Relief Disclaimer: Capgemini is an Equal Opportunity Employer encouraging diversity in the workplace. All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law. This is a general description of the Duties, Responsibilities and Qualifications required for this position. Physical, mental, sensory or environmental demands may be referenced in an attempt to communicate the manner in which this position traditionally is performed. Whenever necessary to provide individuals with disabilities an equal employment opportunity, Capgemini will consider reasonable accommodations that might involve varying job requirements and/or changing the way this job is performed, provided that such accommodations do not pose an undue hardship. Capgemini is committed to providing reasonable accommodations during our recruitment process. If you need assistance or accommodation, please reach out to your recruiting contact. Click the following link for more information on your rights as an Applicant ___