Senior DevOps Engineer
hace 2 días
Chazo
Reglapp is a AI first multi-product platform built for managing taxes for businesses, guide you through residency applications, and connect you with trusted partners across Spain. We run everything on our own servers - no clouds for now. We're hiring our first dedicated DevOps engineer to own this infrastructure end to end. You'll be the technical authority on everything that runs outside the application code: servers, networking, CI/CD, databases, observability, security, and the processes that keep all of it healthy. The stack you'll inherit CRM - Frappe v16 + Vue 3 frontend, MariaDB, Redis, custom Python apps Billing API - FastAPI, Postgres, SQLAlchemy async, Stripe, Holded Portal - Next.js (standalone), Supabase, DocuSeal AI service - FastAPI, pgvector, OpenAI, Telegram bot Platform - Docker Compose, Traefik + Let's Encrypt, MinIO, GitLab CI/CD, GitLab Container Registry What you'll own Servers and networking Bare-metal / VPS fleet: provisioning, hardening, patching, capacity planning Firewalls, VPN/bastion access, key rotation Network design across services and environments CI/CD and release engineering Unify and harden GitLab pipelines across all repositories Introduce immutable image tagging, environment promotion, manual gates for production Stand up a real staging environment separated from dev and prod Move toward zero-downtime deployments and automated rollbacks Decouple database migrations from application deploys Infrastructure as Code Codify server configuration, Compose stacks, Traefik, environment files Ansible for configuration management; Terraform where it earns its keep (DNS, registrars, object storage) Repeatable, documented bootstrap of a new server from zero Databases and storage Backup strategies for multiple Postgres instances (including pgvector), MariaDB, and MinIO Tested restore procedures with defined RTO/RPO Replication, monitoring, slow-query analysis, capacity forecasting Off-site copies and disaster-recovery drills Observability Build out a logs + metrics + traces stack (Prometheus / Grafana / Loki / Tempo or equivalents) Wire up the existing OpenTelemetry instrumentation in Next.js and FastAPI services Alerting on business-critical signals (payment webhooks, queue depth, request latency) On-call rotation, runbooks, blameless postmortem culture Security Centralized secret management (Vault / SOPS / age) and rotation policy Container security: image scanning, minimal bases, non-root, resource limits Operationalize GitLab SAST and Secret Detection - make findings actionable, not noise Network isolation between services and environments Process Change management for migrations, feature flags, and gradual rollouts Definition of Ready / Done for infrastructure changes with the engineering team Architecture documentation, runbooks, and onboarding material Requirements 5+ years in DevOps / SRE / Infrastructure Engineering with real production ownership Deep Linux administration: systemd, networking (iptables/nftables, routing), performance tuning, troubleshooting with strace / perf / bpftrace Docker and Docker Compose in production - not just development. Multi-stage builds, image optimization, runtime security, resource constraints GitLab CI/CD at production scale: pipeline-as-code, reusable templates, build-time optimization Reverse proxies and TLS: Traefik / Nginx / Caddy with automated Let's Encrypt Postgres in production: tuning, backups (pgBackRest / WAL-G / pg_basebackup), replication, incident investigation Infrastructure as Code: Ansible required; Terraform a strong plus Bash and Python for production-quality automation Networking fundamentals: DNS, CDN, TLS, understanding the path from client to container and back Observability: hands-on with Prometheus + Grafana + Loki (or equivalents) and OpenTelemetry Secret management: Vault / SOPS / age in production Experience leading incident response and writing blameless postmortems Comfortable working without a cloud provider - you know what cloud abstracts away and how to deliver the same capabilities on bare metal Nice to have Frappe / ERPNext experience (bench, patches, background jobs) MariaDB in production pgvector and operating AI workloads (rate limiting, retries, batch APIs) Stripe or other payment integrations with a focus on webhook reliability MinIO / S3-compatible storage, lifecycle policies, replication Nomad / Docker Swarm / k3s - with a clear opinion on when each beats Compose Performance engineering across Node.js and Python runtimes Prior work as the first DevOps hire at a company - building processes from scratch rather than inheriting them