Senior Data Scientist – Entity Resolution
1 day ago
Chicago
We’re working with a fast-growing, data-driven SaaS company that’s building a large-scale platform used by enterprise organizations to make smarter, more strategic decisions across complex datasets. Due to their continued growth, they are hiring a Senior Data Scientist with a strong data science orientation to play a critical role in scaling and modernizing AI platform. This role is weighted approximately 80% toward data science and 20% toward data engineering, which is ideal for someone with deep, hands-on experience building and training ML and NLP models and who is equally comfortable operationalizing those models within production data pipelines. You will bring strong architectural thinking, thrive in complex environments, and enjoy mentoring others while collaborating across teams, geographies, and disciplines. A central focus of this role is Entity Resolution, which is the process of identifying, linking, and merging records across disparate data sources that refer to the same real-world entity. This involves resolving inconsistencies, handling missing data, and eliminating duplicates to create a single, accurate, and trustworthy supplier profile, often referred to as a “golden record” or 360-degree view. Their current systems leverage Lucene-based search and XGBoost ML models, and they are exploring the use of LLMs to further enhance these capabilities. The ideal candidate will improve and reimagine their existing legacy entity resolution systems, bringing experience with ML-based approaches to matching and deduplication at scale. As a Senior Data Scientist, you will drive, shape, and execute their long-term data and data science strategy, design resilient and scalable data architectures, and champion technical excellence across our data ecosystem. You will work closely with Product and the Engineering teams to ensure their data systems support business growth, advance our matching capabilities, and enable data-driven decision-making. This client is investing heavily in cloud-native technologies. This role will be instrumental in leveraging modern data services and ML capabilities, optimizing cost, and ensuring our data platform is secure, reliable, and scalable. This is a full time, permanent position on a remote basis. Responsibilities: • Design, build, and iterate on ML-based entity resolution systems that match, link, and deduplicate records across disparate data sources to produce trusted golden records., • Build, train, and refine NLP and ML models (e.g., XGBoost, search ranking models) for matching, classification, and data enrichment, with a focus on improving accuracy and recall., • Evaluate and integrate emerging approaches, including LLMs, into entity resolution and data intelligence workflows., • Own the full ML model lifecycle: feature engineering, training, evaluation, monitoring, feedback loops, and iterative tuning in partnership with data engineering and product teams., • Translate model results into business impact and clearly communicate tradeoffs, performance metrics, and recommendations to non-technical stakeholders., • Build and maintain data products end-to-end, operationalize them within production data pipelines, and ensure they deliver reliable, scalable results., • Execute and influence a cohesive data strategy that aligns with company objectives and supports analytics, reporting, and downstream product use cases., • Own complex data modeling initiatives, including dimensional and analytical models that support business intelligence and advanced analytics., • Drive continuous improvement by optimizing data pipelines, query performance, reliability, observability, and cost efficiency., • Partner with Infrastructure, Product, and Engineering teams to ensure data systems meet best practices, security standards, and business needs., • Create and maintain comprehensive technical documentation, including architecture diagrams, data flow maps, runbooks, and operations procedures., • Troubleshoot and resolve complex, cross-system data issues and incidents. Must-Have Skills: • Bachelor’s degree in Data Science, Computer Science, Machine Learning, Statistics, Engineering, or a related field., • 5+ years of progressive experience in data science and/or data engineering, with demonstrated ownership of ML-based systems in production environments, • Hands-on experience building NLP and LLM-based models in Python for real-world data science applications., • Strong understanding of ML model lifecycle considerations, including evaluation, monitoring, feedback loops, and iterative tuning in partnership with data engineering and product teams., • Strong ability to translate model results into business impact and communicate tradeoffs to non-technical stakeholders., • Direct experience building or significantly improving entity resolution or search ranking systems, including ML-based approaches to record matching, linking, and deduplication at scale., • Proficiency with ML frameworks and tools such as XGBoost, scikit-learn, PyTorch, or TensorFlow, and familiarity with search technologies such as Lucene/Elasticsearch., • Demonstrated ability to build and maintain data products end-to-end by operationalizing models within production data pipelines, not solely tuning them., • Advanced proficiency with Python and SQL for both data science and data engineering workflows., • Experience with Snowflake and cloud-native data platforms (Azure, AWS, GCP, or multi-cloud environments)., • Familiarity with data modeling, ETL/ELT processes, and modern data warehousing principles., • Experience working in an agile development environment and collaborating through ticketing systems such as Jira and Github., • Ability to communicate technical concepts clearly to technical and non-technical teams and influence decision-making., • Strong problem-solving skills with the ability to troubleshoot and resolve ambiguous, high-impact issues., • A results-oriented mindset with a demonstrated history of driving process improvements and technical excellence., • Ability to work independently while also serving as a trusted technical partner and mentor to others., • Ability to take vague requirements and turn them into technical roadmaps. Benefits: • Comprehensive health benefits (medical, dental, vision), • Remote work environment, • 401k match, • Unlimited PTO