Enterprise DataLake Tech Lead - Dallas, TX (Hybrid) - 12 Months Contract - Direct CLient
2 days ago
Dallas
The Enterprise Data Lake (EDL) Technical Lead is responsible for owning the design, implementation, and engineering leadership of the enterprise data lake platform. This role ensures the platform is scalable, reliable, secure, and consumable across analytics, reporting, and operational workloads. The Tech Lead drives technical decisions and collaborates closely with Architecture, Product Management, Data Engineering, Security, and downstream consumers to deliver robust data platform capabilities. Position Overview The Enterprise Data Lake (EDL) Technical Lead is responsible for owning the design, implementation, and engineering leadership of the enterprise data lake platform. This role ensures the platform is scalable, reliable, secure, and consumable across analytics, reporting, and operational workloads. The Tech Lead drives technical decisions and collaborates closely with Architecture, Product Management, Data Engineering, Security, and downstream consumers to deliver robust data platform capabilities. Key Responsibilities Technical Leadership • Own the design and implementation of enterprise-scale data lake solutions on Azure/AWS Cloud, • Define technical standards and best practices for data platform components, • Drive technical decision-making across data ingestion, storage, processing, and governance layers, • Provide technical mentorship and guidance to data engineering and platform teams, • Collaborate with Architecture team on overall architectural alignment and strategy, • Own the design and implementation of enterprise data catalog and metadata management layer, • Build and operate data catalog using DataHub or equivalent metadata platform, • Design and implement automated metadata ingestion, lineage tracking, and data discovery capabilities, • Design and implement data governance policies, data quality rules, and compliance frameworks, • Enable self-service data discovery and access management for downstream consumers, • Establish data ownership, stewardship models, and metadata standards across the enterprise, • Design and implement real-time and batch data ingestion pipelines using Apache Kafka, • Optimize data processing workflows on Databricks platform (Delta Lake, Spark optimization, Unity Catalog), • Design integration patterns between RDBMS sources and data lake (CDC, batch ETL, replication), • Optimize database performance, query tuning, and indexing strategies across relational and distributed systems, • Implement infrastructure as code using Terraform for automated provisioning and management, • Design and deploy containerized data services on Kubernetes clusters, • Develop data platform services and tooling using C# and Go, • Ensure platform scalability, reliability, and security across all data lake components, • Implement monitoring, logging, and observability solutions for data infrastructure, • Optimize Linux-based systems for data processing workloads, • Establish CI/CD pipelines for data platform deployments, • Ensure platform security, compliance, and data governance standards (GDPR, SOC2, etc.), • Drive cost optimization and performance tuning initiatives, • Implement database backup/recovery strategies and disaster recovery planning Required Qualifications Technical Expertise Data Streaming & Processing • 5+ years with Apache Kafka (streaming architecture, Kafka Connect, Schema Registry, stream processing), • 3+ years with Databricks (Delta Lake, Apache Spark optimization, Unity Catalog, cluster management), • 3+ years with DataHub or similar metadata management platforms (Alation, Collibra, Apache Atlas), • Deep experience building and operating enterprise data catalog systems, • Expertise in automated metadata extraction, lineage tracking, and impact analysis, • Experience with data quality frameworks and metadata-driven data operations, • Knowledge of data governance policies, data classification, and compliance automation, • 5+ years with enterprise RDBMS platforms including:, • SQL Server (T-SQL, SSIS, SSRS, replication, Always On Availability Groups), • PostgreSQL (advanced query optimization, partitioning, extensions, streaming replication), • MySQL (replication, clustering, performance tuning), • Strong SQL skills (complex queries, stored procedures, window functions, query optimization), • Database design principles (normalization, indexing strategies, schema design, partitioning), • Change Data Capture (CDC) patterns and implementation (Debezium, Azure Data Factory, AWS DMS, custom solutions), • 5+ years with Azure or AWS Cloud including:, • Azure: Data Lake Storage (Gen2), Event Hubs, AKS, Azure SQL, Key Vault, Azure AD, Monitor, • AWS: S3, MSK/Kinesis, EKS, RDS/Aurora, Secrets Manager, IAM, CloudWatch, • Cloud-native data services, networking, security, and IAM, • 3+ years with Kubernetes (deployment strategies, scaling, monitoring, service mesh, Helm), • Strong proficiency in any programming languages like C#, Go, • Expert-level SQL across multiple database platforms, • Experience with Python for data engineering tasks (preferred), • 7+ years in data engineering, platform engineering, or database engineering roles, • 3+ years in technical leadership capacity (Tech Lead, Principal Engineer), • Proven track record of delivering large-scale data infrastructure projects, • Experience leading teams of 5-10+ engineers, • Strong architectural design and system thinking capabilities, • Experience migrating legacy RDBMS workloads to modern data lake architectures, • Demonstrated ability to balance technical excellence with business needs Preferred Qualifications Additional Technologies • Experience with cloud ETL services (Azure Data Factory, AWS Glue, Azure Stream Analytics, AWS Lambda), • Knowledge of managed database features (Azure SQL elastic pools/hyperscale, AWS RDS/Aurora serverless), • Knowledge of additional streaming technologies (Apache Flink), • Experience with database sharding and horizontal partitioning strategies at scale, • Familiarity with NoSQL databases (Cosmos DB, DynamoDB, MongoDB, Cassandra, Redis), • Experience with Apache Iceberg, • Knowledge of data observability tools