Senior Backend Engineer
2 days ago
New York
WHAT WE DO FileScience is the business continuity layer for businesses that run on cloud platforms. Organizations across legal, healthcare, accounting, and architecture rely on us to stay operational through platform outages, accidental deletions, ransomware, and migration windows. When ransomware encrypts an entire M365 tenant, when a departing employee's account is auto-deleted and takes six months of client work with it, when the incident hits at 4pm Friday: we're the reason work doesn't stop. Under the hood, this is a data infrastructure problem at scale. You'd help build the ingestion engine and the recovery path that mirrors it: moving large volumes of customer data back into live tenants without dropping bytes, corrupting relationships, or violating tenant invariants. Some workloads run for minutes. Some run for weeks. THE ROLE FileScience is past the point where the founding team can keep scaling the platform alone. The core systems work. They got us here. Now they need to scale into a more mature, higher-throughput, more observable, more recoverable version of themselves. We are hiring a senior backend engineer to partner directly with the founding engineers on the systems that matter most: ingestion, scheduling, recovery, connector architecture, observability, CI/CD, and the internal tooling that lets a small team move quickly without losing correctness. This is a senior IC role with real architectural and product ownership. You will be expected to understand ambiguous production problems, shape the architecture, make tradeoffs, ship the implementation, and live with the consequences. You would be joining early enough to shape how the engineering team works, not just what it builds. The scope of the role grows with the company. WHAT YOU'D WORK ON These are real problems, in the codebase or in active design. You would not own all of this at once. The initial focus will depend on your background, but these are the systems the role can grow into. • Connectors and resource modeling across cloud platforms. Our connectors sit on top of APIs we do not control. Pagination that silently truncates. Cursors that advance after partial failures. Provider schemas that change and invalidate yesterday's assumptions. The goal is not "call the API." The goal is to build a durable model of customer data across cloud systems that were not designed to be modeled cleanly, and keep that model correct over time., • Multi-tenant scheduling under per-provider throttle constraints. Six cloud sources, each with its own throttle model: per-minute caps, daily budgets, concurrency limits, tenant-level quotas, undocumented backoff behavior, and provider-specific failure modes. Maximize throughput, stay fair across tenants, never breach a cap. The shape of the problem keeps changing every time a cloud ships an API update., • Recovery into live tenants. Backup is half the system. Restoring data into a live tenant means dealing with active user changes, object collisions, missing parents, permission drift, renamed folders, deleted users, schema changes, and partial restores. The recovery path mirrors the ingestion problem, but with higher stakes: instead of observing state, you are changing it., • Observability for week-long workloads. Standard distributed tracing was built for request-scoped operations. Our workloads run for days or weeks across multiple providers and tenants. Tracing, orchestration state, structured logs, and domain-specific correlation all have to work together to distinguish provider throttling from queue starvation from a poisoned resource from normal slowness., • Verification and developer leverage. Six external dependencies ship breaking changes when they feel like it. We use agentic tools throughout the dev loop. Both raise the bar for verification. Provider simulators, contract tests at connector boundaries, replay suites from production edge cases, staged gates from five-second pre-commit hooks to nightly runs. The goal is not process. The goal is leverage: make the safe path the fast path., • Internal tooling for a small team that ships above its weight. Deployment, local dev, test data generation, connector validation, replay harnesses, cloud fixtures, observability workflows, operational runbooks. If something is painful twice, it probably deserves automation. If a class of bug shows up repeatedly, it probably deserves a gate. We want someone who enjoys building the product and the machine that builds the product. HOW WE WORK We move quickly, but not casually. We like engineers who can read deeply, argue clearly, build pragmatically, and own the result. Some days that means designing a scheduler. Some days it means writing a migration. Some days it means digging three layers into a provider SDK to understand why a cursor advanced when it should not have. Some days it means deleting an abstraction that sounded good two months ago. Lightweight planning, no process theater. No heavy PM layer converting vague reality into tidy tickets. You will work directly with the founding team, customers, production systems, and the code. The tradeoff is that you get real context and real ownership. WHO WE'RE LOOKING FOR Someone who learns by building. The kind of engineer who reads a paper or a book and then goes and implements the idea. Who gets excited about hard systems problems and wants a team where the work stays interesting. This role assumes a few years of professional backend experience. We do not have a hard cutoff, but the work requires production-level judgment that usually comes from building and operating real systems. You might be a fit if you have: • Built backend or distributed systems where correctness mattered: data pipelines, recovery systems, workflow engines, schedulers, sync engines, observability systems, or infrastructure that customers depended on., • Owned production code beyond the first release. You have seen your own design survive traffic, incidents, migrations, weird edge cases, and the second or third version of the product., • Shaped problems, not just implemented tickets. You can figure out what needs to be built, explain the tradeoffs, and bring other engineers with you., • Worked deeply in Python at production scale., • Built on AWS systems such as ECS, Lambda, DynamoDB, SQS, CloudWatch, S3, or similar infrastructure., • Designed around partial failure, retries, idempotency, backpressure, caching, concurrency limits, queueing, or distributed state., • Debugged issues where the first obvious explanation was wrong., • Read code below the abstraction layer when the abstraction starts leaking., • Communicated clearly with a small team without needing heavy PM structure around you., • Helped other engineers get better, whether through code review, architecture discussions, pairing, or just raising the bar on what "done" looks like., • Gone unusually deep on at least one technical area because you genuinely wanted to understand it. Especially welcome: non-traditional paths, unusually deep side projects, production scars, open-source work, infra/tooling obsession, or anything else that shows you can go from curiosity to working systems. We care more about depth, judgment, and ownership than pedigree. LOGISTICS This is an NYC-based role. We are hiring for in-person partnership with the founding team. That does not mean corporate facetime or sitting in an office because someone said so. It means working shoulder-to-shoulder often enough to build quickly, argue through systems design in real time, and keep the feedback loop tight. That might be a coworking space, a Brooklyn apartment, a coffee shop, or wherever the team is getting good work done that day. If you want a pure async remote role, this is not the right fit. Compensation: $100,000-$150,000 base + equity participation + benefits. We know this is not big-company cash comp. The tradeoff is early ownership, meaningful equity participation, and the chance to shape core systems while the company is still small. We will walk through the full comp picture in conversation. BENEFITS • 100% employer-paid medical, dental, and vision coverage with co-pay reimbursement., • 2% interest rate reduction on your home mortgage., • 10 days vacation, 5 personal days, 5 sick days. Additional time awarded with tenure. Full benefits package at filescience.io/careers HOW TO APPLY Apply here. Optional: if there’s a technical problem you’ve gone unusually deep on, we’d love to hear about it. Production bug, side project, paper you implemented, system you reverse-engineered, infrastructure problem you kept improving after everyone else moved on - anything that shows how you think when something catches your attention. Send it to .