Dallas
Job Description Job Title: Data Scientist – Machine Learning, Big Data, GenAI (8–10 Years Experience) Location: Remote Employment Type: Contract About the Role We are seeking a highly experienced Data Scientist with 8–10 years of expertise delivering production-grade AI/ML solutions at scale. This role requires deep technical proficiency in Machine Learning, Big Data, Generative AI, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG), combined with hands-on cloud experience (AWS, Azure, or GCP) and migration expertise for modernizing data and AI platforms. The ideal candidate can lead projects end-to-end, from architecture design to deployment, while mentoring teams, optimizing for performance and cost, and ensuring alignment with business objectives. Key Responsibilities • Design, develop, and deliver end-to-end ML/AI solutions in cloud-native environments from design to deployment and monitoring., • Architect and implement Generative AI solutions leveraging LLMs (e.g., GPT, LLaMA, Claude, Mistral) and RAG pipelines with vector search., • Build and optimize Big Data pipelines using Apache Spark, PySpark, and Delta Lake integrated with cloud storage (AWS S3, Azure Data Lake, GCP Cloud Storage)., • Design and maintain data lakehouse architectures with Databricks, Snowflake, or Delta Lake., • Deploy scalable MLOps pipelines using MLflow, SageMaker, Azure ML, or Vertex AI with Docker, Kubernetes (EKS, AKS, GKE), and CI/CD., • Implement and manage vector databases (Pinecone, FAISS, Milvus, Weaviate, ChromaDB) for RAG applications., • Oversee ETL/ELT workflows and pipeline orchestration using Airflow, dbt, or Azure Data Factory., • Migration projects, on-prem to cloud, cross-cloud, or legacy platform upgrades (e.g., Hadoop to Databricks, Hive to Delta Lake) , ensuring data integrity and minimal downtime., • Integrate streaming data solutions using Apache Kafka and real-time analytics frameworks., • Conduct feature engineering, hyperparameter tuning, and model optimization for performance and scalability., • Mentor junior data scientists and guide best practices for AI/ML development and deployment., • 8–10 years in data science, machine learning, and AI/ML solution delivery., • Strong hands-on expertise in at least one major cloud platform (AWS, Azure, or GCP) with proven production deployments., • Proficiency in Python, PySpark, and SQL., • Proven experience with Apache Spark, Hadoop ecosystem, and Big Data processing., • Hands-on experience with Generative AI, Hugging Face Transformers, LangChain, or LlamaIndex., • Expertise in RAG architectures and vector databases (Pinecone, FAISS, Milvus, Weaviate, ChromaDB)., • Experience with MLOps workflows using MLflow, Docker, Kubernetes, and CI/CD tools (Jenkins, GitHub Actions, GitLab CI)., • Migration experience involving AI/ML workloads, big data pipelines, and data platforms to modern cloud-based architectures., • Knowledge of data services (AWS S3, Redshift; Azure Synapse; GCP BigQuery) and infrastructure-as-code (Terraform, CloudFormation, ARM templates)., • Familiarity with streaming technologies (Kafka) and query engines (Hive, Presto, Trino)., • Experience with knowledge graphs and semantic search., • Background in NLP, transformer architectures, and deep learning frameworks (TensorFlow, PyTorch)., • Exposure to BI tools (Power BI, Tableau, Looker)., • Domain expertise in finance, healthcare, or e-commerce.