Senior Data Engineer
hace 2 días
New York
Job DescriptionPlease note: This role is Hybrid, with 3-4 days in office per weekThe Opportunity We are hiring a Senior Data Engineer to build and operate the pipelines that turn real world financial information into reliable, queryable data. You will process unstructured documents like tax codes and white papers, semi structured documents like tax forms and insurance policies, and structured datasets like user information and asset values. Your work will support retrieval, knowledge graphs, agents, analytics, and machine learning. You will also own ingestion jobs from APIs to scheduled downloads, and maintain multiple collections in a vector database. As Monstro's first dedicated Data Engineer, you will shape the foundation of our data platform at a company where data is the product. We are a data-intensive fintech handling the most sensitive consumer and institutional financial data at global scale, which means your standards for consistency, security, and reliability will directly impact our ability to deliver trust and value to our users. You'll set the tone for how data is ingested, transformed, secured, and served across the company, with clear ownership and a seat at the table in defining the future of Monstro's architecture. If you like clear ownership, high standards, and shipping, this role is for you. About Monstro Monstro, headquartered in New York, is an innovative AI-enhanced fintech company redefining personal finance management through a global financial platform. Our platform offers individuals a comprehensive view of their entire financial, legal, and tax life, delivering actionable insights, tailored recommendations, and the ability to take direct action. For financial institutions, Monstro unlocks unparalleled access to personalized customer data and insights. Leveraging Monstro's data-driven approach allows institutions to maximize revenue, boost customer retention, and attract new clients while lowering acquisition costs. By efficiently servicing clients across all wealth tiers and providing direct, actionable solutions, Monstro is transforming the financial services landscape. We are democratizing access to high-quality financial advice, enabling institutions to optimize operations, empower their clients, and achieve measurable growth in an evolving digital economy. Responsibilities * Document and data processing * Build and own scalable pipelines that parse and normalize unstructured sources for retrieval, knowledge graphs, and agents. * Conceive and implement novel processes for processing thousands of types of unstructured documents with accuracy and consistency * Process semi structured sources into consistent, validated schemas. * Transform structured datasets for analytics, features, and retrieval workloads. * Vector database and retrieval * Create, version, and maintain multiple collections in a vector database. * Manage embeddings, metadata, and lifecycle, and tune chunking and filters for relevance and latency. * Ingestion, orchestration, and infrastructure • Design and implement robust multi-modal document processing systems that handle heterogeneous file formats (PDFs, images, HTML, XML) with automatic schema inference, content extraction validation, and graceful degradation for malformed inputs, maintaining 99.9% pipeline uptime SLA. Own ingestion from APIs, file drops, partner feeds, and scheduled jobs with monitoring, retries, and alerting. * Implement data quality checks for schema, ranges, and nulls, and document lineage and SLAs. * Own and harden the infrastructure that supports ETL pipelines, storage systems, and serving services. * Data platform engineering * Stand up and harden object, relational, document, and vector stores with the right indexing and partitioning. * Build reusable libraries and services for parsing, enrichment, and embedding generation. • "Design and implement robust multi-modal document processing systems that handle heterogeneous file formats (PDFs, images, HTML, XML) with automatic schema inference, content extraction validation, and graceful degradation for malformed inputs, maintaining 99.9% pipeline uptime SLA. Security, compliance, and governance * Handle sensitive financial and personal data with access controls, auditing, and retention policies. * Collaboration and impact * Partner with product and engineering to ship features that depend on reliable data. * Document standards, coach teammates, and contribute to future hiring. * Bonus * Create and maintain Model Context Protocol endpoints or similar interfaces for controlled database access. Qualifications • Experience, • Minimum 2 years in a dedicated Data Engineering role at an AI-native startup or 4+ years of experience in traditional Data Engineering, with ~8+ years of experience in Tech overall., • Proven ownership of end-to-end pipelines (ingestion → transformation → serving), including scalable sourcing processes, ETL pipelines, and serving services., • Experience owning and operating infrastructure in production environments., • Technical skills, • Strong Python and SQL., • Hands on document parsing and ETL across PDFs, HTML, JSON, and XML., • Experience operating vector databases such as pgvector, Pinecone, or Weaviate, with multiple collections., • Building and scheduling ingestion via APIs, web downloads, and cron or an orchestrator, plus cloud storage and queues., • Understanding of embeddings, chunking strategies, metadata design, and retrieval evaluation., • Solid data modeling, schema design, indexing, and performance tuning across storage types., • Quality and security, • History of implementing data quality checks, observability, and access controls for sensitive data., • Track record of delivering high-consistency systems for mission-critical data pipelines., • Soft skills, • Ownership mindset, clear written communication, and effective collaboration with product and engineering.Why Monstro?, • Ownership and Impact: As our first Data Engineer, you'll own the foundation of Monstro's data platform and set the standards for how we ingest, transform, and serve data at scale., • Data-First Mission: Join a company where data is the product, handling the most sensitive consumer and institutional financial information with precision and trust., • Collaborate with Experts: Work closely with an accomplished executive team that has a proven history of scaling successful startups to significant exits., • Scalability and Growth: Be part of a high-growth startup, with the chance to grow into a leadership position as the company scales., • Mission-Driven: Join a company focused on democratizing access to financial management and education for individuals and institutions globally.