Valencia
About Us Staq is a leading Banking-as-a-Service (BaaS) and embedded finance platform, transforming the way businesses integrate banking and financial services. At Staq, we empower our clients to innovate, expand, and streamline their financial services offerings, leveraging our cutting-edge platform. Our mission is to bridge the gap between traditional banking and the digital era, providing seamless, scalable, and secure financial solutions. The Role We are building the intelligence layer that will power an AI-powered financial assistant and serve as the SDK that other banking applications plug into. The long-term vision is an AI-native bank where every customer interaction, recommendation, and financial operation is orchestrated through this platform. That means the agent runtime, automation engine, recommendation systems, and tool execution framework all need to be built as reusable, production-grade infrastructure — not one-off features for a single product. The objective is to build, harden, and ship the intelligence platform across multiple products simultaneously. You will be building the systems that make AI actually work in finance: agents that reason about money, automations that run reliably on people’s financial data, recommendations that are genuinely useful, and tool execution that is safe and observable. This is systems engineering meets applied AI. Key Responsibilities Agent Runtime & Orchestration • Build and maintain production AI agent flows using Python and LangGraph, including multi-step planning, tool selection, and context assembly, • Author and evolve Agent Cards that define agent capabilities, context requirements, and output contracts for each product domain, • Implement the agent-side integration with Temporal workflows — the AGENT_STEP and AGENT_LOOP activity interfaces that the Java orchestrator calls into, • Own prompt engineering, template management, and context window optimization across all agent flows, • Design and implement automation flows that go beyond conversational agents — scheduled financial health checks, proactive alerting, background data analysis, and event-driven triggers, • Build reliable, deterministic automation pipelines that can execute multi-step financial operations with proper error handling, compensation logic, and human-in-the-loop escalation, • Build and iterate on recommendation engines that surface personalized financial insights, product suggestions, and actionable next-best-actions to users, • Design the data contracts and feature pipelines that feed recommendations, working with domain services for banking, credit, and subscription data, • Own the integration with sandboxed execution environments (E2B) where agents run tools against real financial APIs and data sources, • Implement and maintain MCP (Model Context Protocol) tool definitions, ensuring agents can safely invoke financial operations within policy-controlled boundaries, • Build comprehensive test harnesses for agent behavior — deterministic scenario tests, regression suites, and evaluation benchmarks, • Own the reliability engineering of the agent runtime: graceful degradation when LLMs misbehave, proper retry logic, timeout handling, and circuit breakers, • Everything you build must be reusable. Zeen is the first product, but the intelligence layer is an SDK — other banking applications will build on top of the same agent patterns, tool integrations, and automation frameworks, • Maintain and evolve the shared contracts (Agent Cards, tool schemas, risk gate interfaces) that allow new products to onboard onto the platform with minimal custom work, • Python (primary), with integration touchpoints to Java microservices, • LangGraph for agent orchestration; Temporal Cloud (Java SDK) as the durable workflow engine, • OPA/Rego for policy enforcement across four risk gate stages (pre-LLM, post-LLM, pre-tool, post-tool), • E2B sandboxed containers for tool execution; MCP for tool protocol, • OpenTelemetry for observability; structured artifact logging, • LLM providers via a gateway abstraction (model-agnostic), • Fintech domain: Plaid integrations, banking/credit/subscription data What We Are Looking For Must Have • 3+ years building production AI/ML systems (not just notebooks — deployed, monitored, maintained), • Strong Python fundamentals and experience with async patterns, error handling, and production-grade code, • Hands-on experience with LLM application development — prompt engineering, context engineering, tool/function calling, and structured outputs, • Experience building at least one of: recommendation systems, automation pipelines, or multi-step agent workflows, • Understanding of evaluation and testing for non-deterministic systems — you know that “it works on my prompt” is not a test strategy, • Comfort working with financial data where correctness and reliability matter more than speed of iteration Strong Signals • Experience with agent frameworks (LangGraph, LangChain, AutoGen, CrewAI) in production, not just prototypes, • Familiarity with memory systems for AI agents — short-term and long-term memory architectures, retrieval-augmented generation, and context window management strategies, • Experience with prompt management at scale — versioning, templating, A/B testing, and systematic prompt optimization workflows, • Familiarity with sandboxed code execution, MCP, or tool-use patterns for LLM agents, • Background in fintech, financial data, or regulated industries, • Experience with recommendation engines (collaborative filtering, content-based, hybrid approaches), • Familiarity with workflow orchestration systems (Temporal, Airflow, Prefect) and how AI fits into durable execution patterns, • Experience with LLM observability and performance tracking — call latency profiling, token usage monitoring, cost attribution, and tracing through multi-step agent flows What This Role Is Not This is not a pure ML research position. We are not training foundation models. You will be building application-layer AI systems on top of LLMs and integrating them into a financial services platform that real people depend on for real money. The challenge is in the systems engineering, reliability, and product thinking — not in publishing papers.