Senior DevOps Engineer
hace 16 horas
Valencia
Reglapp is a AI first multi-product platform built for managing taxes for businesses, guide you through residency applications, and connect you with trusted partners across Spain. We run everything on our own servers - no clouds for now. We're hiring our first dedicated DevOps engineer to own this infrastructure end to end. You'll be the technical authority on everything that runs outside the application code: servers, networking, CI/CD, databases, observability, security, and the processes that keep all of it healthy. The stack you'll inherit • CRM - Frappe v16 + Vue 3 frontend, MariaDB, Redis, custom Python apps, • Billing API - FastAPI, Postgres, SQLAlchemy async, Stripe, Holded, • Portal - Next.js (standalone), Supabase, DocuSeal, • AI service - FastAPI, pgvector, OpenAI, Telegram bot Servers and networking • Bare-metal / VPS fleet: provisioning, hardening, patching, capacity planning, • Firewalls, VPN/bastion access, key rotation, • Unify and harden GitLab pipelines across all repositories, • Introduce immutable image tagging, environment promotion, manual gates for production, • Stand up a real staging environment separated from dev and prod, • Move toward zero-downtime deployments and automated rollbacks, • Codify server configuration, Compose stacks, Traefik, environment files, • Ansible for configuration management; Terraform where it earns its keep (DNS, registrars, object storage), • Backup strategies for multiple Postgres instances (including pgvector), MariaDB, and MinIO, • Tested restore procedures with defined RTO/RPO, • Replication, monitoring, slow-query analysis, capacity forecasting, • Build out a logs + metrics + traces stack (Prometheus / Grafana / Loki / Tempo or equivalents), • Wire up the existing OpenTelemetry instrumentation in Next.js and FastAPI services, • Alerting on business-critical signals (payment webhooks, queue depth, request latency), • Centralized secret management (Vault / SOPS / age) and rotation policy, • Container security: image scanning, minimal bases, non-root, resource limits, • Operationalize GitLab SAST and Secret Detection - make findings actionable, not noise, • Change management for migrations, feature flags, and gradual rollouts, • Definition of Ready / Done for infrastructure changes with the engineering team, • 5+ years in DevOps / SRE / Infrastructure Engineering with real production ownership, • Deep Linux administration: systemd, networking (iptables/nftables, routing), performance tuning, troubleshooting with strace / perf / bpftrace, • Docker and Docker Compose in production - not just development. Multi-stage builds, image optimization, runtime security, resource constraints, • GitLab CI/CD at production scale: pipeline-as-code, reusable templates, build-time optimization, • Reverse proxies and TLS: Traefik / Nginx / Caddy with automated Let's Encrypt, • Postgres in production: tuning, backups (pgBackRest / WAL-G / pg_basebackup), replication, incident investigation, • Infrastructure as Code: Ansible required; Terraform a strong plus, • Bash and Python for production-quality automation, • Networking fundamentals: DNS, CDN, TLS, understanding the path from client to container and back, • Observability: hands-on with Prometheus + Grafana + Loki (or equivalents) and OpenTelemetry, • Secret management: Vault / SOPS / age in production, • Experience leading incident response and writing blameless postmortems, • Frappe / ERPNext experience (bench, patches, background jobs), • MariaDB in production, • pgvector and operating AI workloads (rate limiting, retries, batch APIs), • Stripe or other payment integrations with a focus on webhook reliability, • MinIO / S3-compatible storage, lifecycle policies, replication, • Nomad / Docker Swarm / k3s - with a clear opinion on when each beats Compose, • Performance engineering across Node.js and Python runtimes, • Prior work as the first DevOps hire at a company - building processes from scratch rather than inheriting them