7190-A - Data Catalog Developer
23 days ago
New York
Job Description , on-site Job Title: Data Catalog Developer Company: Direct Recruit Agency Contract Details: Full-time, on-site We are seeking a highly skilled and experienced Data Catalog Developer to join our team at Direct Recruit Agency. As a Data Catalog Developer, you will be responsible for designing, developing, and maintaining data catalogs for our clients. This is a full-time, on-site position that offers a competitive salary and benefits package. • Expertise in Collibra is a must., • Will be building Collibra Data Catalog-, • Experience in the new Collibra software - Edge, • Top must-have hard skills:, • Expertise in Collibra Data Management, Data Asset, Data Governance, and BAU support of Collibra Data Catalog., • Edge Server experience is a must. Collibra Ranger certification preferred., • Top nice-to-have hard skills: Databricks, AWS ( S3, Glue, Aurora Postgres, Athena), SQL, • Top soft skills: Communication, Problem-Solving, Collaboration, Attention to Detail., • Team size: 10 Key aspects of the role:, • Development of Data Catalog, build Collibra workflows, and Integrate Edge server with various data sources, authentication, and access controls., • Day-to-day Expectations: Data Catalog build out, Metadata Synchronization, Lineage Harvester, • Interview Plan/Process: Two virtual Interviews and one final On-site interview, • Citizenship: USC only Your role as a Senior Data Engineer • Work on migrating applications from an on-premises location to the cloud service providers., • Develop products and services on the latest technologies through contributions in development, enhancements, testing, and implementation., • Develop, modify, and extend code for building cloud infrastructure, and automate using CI/CD pipeline., • Partners with business and peers in the pursuit of solutions that achieve business goals through an agile software development methodology., • Perform problem analysis, data analysis, reporting, and communication., • Work with peers across the system to define and implement best practices and standards., • Assess applications and help determine the appropriate application infrastructure patterns., • Use the best practices and knowledge of internal or external drivers to improve products or services. Qualifications: What we are looking for: • Bachelor's degree in Computer Science, Information Systems, or a related field, • Minimum of 3 years of experience as a Data Catalog Developer or in a similar role, • Hands-on experience in building ETL using Databricks SaaS infrastructure., • Experience in developing data pipeline solutions to ingest and exploit new and existing data sources., • Expertise in leveraging SQL, programming languages like Python, and ETL tools like Databricks, • Perform code reviews to ensure requirements, optimal execution patterns, and adherence to established standards., • Computer Science or Equivalent, • Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue)., • Advanced understanding of Container Orchestration services, including Docker and Kubernetes, and a variety of AWS tools and services., • Good understanding of AWS Identity and Access Management, AWS Networking, and AWS Monitoring tools., • Proficiency in CI/CD and deployment automation using GITLAB pipeline., • Proficiency in Cloud infrastructure provisioning tools, e.g., Terraform., • Proficiency in one or more programming languages, e.g., Python, Scala., • Experience in Starburst, Trino, and building SQL queries in a federated architecture., • Good knowledge of Lake house architecture., • Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark and Scala)., • Build data ingestion workflows from various sources (structured, semi-structured, and unstructured)., • Develop reusable components and frameworks for efficient data processing., • Implement best practices for data quality, validation, and governance., • Collaborate with data architects, analysts, and business stakeholders to understand data requirements., • Tune Spark jobs for performance and scalability in a cloud-based environment., • Maintain robust data lake or Lakehouse architecture., • Ensure high availability, security, and integrity of data pipelines and platforms., • Support troubleshooting, debugging, and performance optimization in production workloads. If you are a highly motivated and skilled Data Catalog Developer looking for a challenging and rewarding opportunity, we encourage you to apply for this position. Join our dynamic team at Direct Recruit Agency and be a part of our mission to provide top-notch data solutions to our clients.