Senior Data Engineer - Mandarin Speaker
2 days ago
Santa Cruz de Tenerife
About INFINNI INFINNI is the infrastructure behind the creator economy. We're already the leading CRM in our category, with thousands of creators and creator businesses running their operations on our platform. Now we're building the next layer of products: AI-powered tools, big-data capabilities, and infrastructure purpose-built for creators. We're a global team of 180+ builders at the rare inflection point where a category leader becomes a category-defining platform. The decisions we make in the next 18 months will shape the industry. About the Role: Senior Data Engineer We're looking for a hands-on data engineer who owns the full data lifecycle — ingestion, streaming, curation, quality, and delivery — and can communicate clearly with the product and business teams around them. This is a builder role. You'll extend and productionise our existing data platform alongside our Wuhan engineering team, turning raw creator data into reliable pipelines, AI-powered products, and sellable data assets. The strategy of what to build and what it costs is already owned — what we need is someone who executes it with precision. A critical part of this role is acting as the technical bridge between our Wuhan data and development team and our governance and strategy team — translating requirements, reviews, and documentation fluently across Chinese and English. Full professional fluency in both languages is a hard requirement, not a bonus. What You'll Own Data Platform & Pipeline Engineering • Extend, maintain, and optimise ELT/ETL pipelines feeding our analytics stack — primarily Apache Doris, with ClickHouse and Elasticsearch, and raw data on an S3 data lake (Iceberg / Paimon)., • Develop and maintain modular, tested, and documented dbt models across staging, intermediate, and mart layers., • Orchestrate and automate data workflows using DolphinScheduler (or equivalent scheduler)., • Own CI/CD practices for data pipelines — PR reviews, environment validation (DEV / QA / PROD), and production releases. Streaming & Change-Data-Capture • Build and maintain real-time data flows using Flink CDC / StreamPark and Kafka to propagate changes from application databases into the analytics layer the moment they happen., • Ensure streaming pipelines are reliable, low-latency, and aligned with downstream consumption patterns. Data Mining & Curation • Work across structured and semi-structured creator data sources to extract, enrich, and curate high-quality datasets., • Design dimensional models that support analytical and product use cases., • Identify and resolve data inconsistencies before they reach downstream consumers or creator-facing products. Data Quality & Governance • Build and maintain data quality frameworks covering completeness, validity, uniqueness, and referential integrity., • Define and enforce SLAs for data freshness, reliability, and pipeline health; own alerting and incident response., • Implement governance controls inside pipelines — data-quality tests, PII masking, and access rules — working under the data-governance lead., • Apply privacy techniques for shared and sold data — pseudonymisation, k-anonymity, and de-identification — so we can serve market benchmarks without exposing individual creators or fans. ML Models & AI Products • Deploy, serve, and monitor models in production in partnership with our Wuhan AI/data-science team — you own the pipelines, deployment, and reliability; they lead model research and algorithm design., • Integrate model outputs into our paid intelligence features (churn prediction, pricing suggestions, affiliate matching) and maintain retraining and quality-monitoring loops., • Work with the AI team to bring NLP/LLM capabilities — the AI chat assistant and smart translation — into production reliably. AI Content Labelling — Pipeline & Integration • Build and operate the pipelines and human-in-the-loop review workflow that turn the AI team's image/text labelling models into clean, production content tags., • Manage the tag taxonomy, embeddings store, and quality checks so content can be packaged and sold by theme and category — labelling the content itself, not personal attributes of creators or fans., • Partner with the Wuhan AI team on model selection and evaluation; you own data flow, throughput, cost, and reliability rather than the model internals. Data as a Product • Build the serving layer for external data products: APIs, data feeds, and a metrics layer that customers pay for on a recurring basis., • Define and maintain data contracts — written specifications of what each feed contains, its freshness guarantees, and its schema., • Design pipelines with cost awareness (FinOps): keep the running cost of each dataset below the revenue it generates. BI & Reporting • Build and maintain dashboards in Superset and Grafana for internal teams and client-facing reporting., • Translate raw data into clear, actionable charts that leadership and clients actually use. Technical Bridge • Serve as the primary connector between the Wuhan data and development team and the governance and strategy team — translating requirements, technical reviews, and documentation across Chinese and English., • Ensure both sides have a clear, shared understanding of what is being built, why, and when. Who You Are Experience • 4+ years of hands-on experience in data engineering or analytics engineering., • Strong proficiency in dbt (Core or Cloud) — you write real models, not just run them., • Advanced SQL and working Python for data engineering tasks., • Hands-on experience with OLAP / analytics databases — Apache Doris or ClickHouse preferred; Elasticsearch a plus., • Familiarity with S3-based data lake formats — Iceberg or Paimon., • Experience with streaming and CDC tooling — Flink CDC, StreamPark, Kafka, or close equivalents., • Experience with a pipeline scheduler — DolphinScheduler, Airflow, or similar., • Git-based workflow and CI/CD applied to data pipelines., • Experience building or deploying ML models in production (MLOps, feature stores, model monitoring). Mindset & Skills • You think end-to-end: raw source → clean model → reliable product → measurable outcome., • Strong ownership mindset: you define the problem, build the solution, and monitor the outcome., • Clear communicator — able to explain technical trade-offs to product and business stakeholders in both languages., • Comfortable working on a live, evolving platform alongside a distributed team — not expecting a blank page., • Experience in B2B SaaS, CRM, marketplace, or creator economy products is a strong advantage. Languages • Full professional fluency in English (written and spoken), • Full professional fluency in Chinese / Mandarin (written and spoken) These are non-negotiable requirements. The core function of this role depends on fluency in both languages. Nice to Have • Experience with creator economy platforms, influencer marketing, or creator CRM tools., • Background in data observability tooling (Monte Carlo, Great Expectations, or similar)., • Familiarity with NLP / LLM fine-tuning or applied recommendation systems., • Any product management experience — roadmapping, spec writing, discovery — is a genuine plus. What We Offer Competitive Compensation Salary package and equity opportunity that reflect your technical seniority and cross-functional impact, including significant tax advantages under Spain's Beckham Law (Ley Beckham). Relocation to the Canary Islands This role is based in Tenerife, Canary Islands. Relocation is a requirement. We provide a comprehensive package covering flights and moving costs, temporary housing on arrival, NIE and residency paperwork support, and dedicated Beckham Law application assistance to significantly reduce your personal income tax rate in Spain. Async-First Culture We're built around async-first collaboration with strong real-time rituals and a global team that ships fast across multiple time zones.