Senior DevOps / Platform Engineer
1 day ago
Chazo
About Us We build and operate a fully-automated Speech Analytics SaaS platform running on Kubernetes across AWS and GCP. Our infrastructure processes ~160,000 hours of audio monthly with 99%+ uptime SLA, serving enterprise customers with mission-critical analytics needs. Our platform is built on modern, cloud-native technology: Kubernetes, Argo ecosystem, MongoDB, ElasticSearch, and 100% Terraform-driven Infrastructure as Code. We auto-scale from dozens to over 1,000 kubernetes nodes based on demand. Beyond our core SaaS product, we deliver managed solutions (Autopilot and Copilot platforms) and build AI-based services packaged as containerized, Terraform-ready modules for seamless integration into customer cloud environments (AWS, GCP, Azure). We're a team that values strong engineering practices, automation-first mindset, and operational excellence. 🚀 About the Role We're looking for a Senior DevOps / Platform Engineer to help design, automate, and operate our cloud-native platform. You'll work across AWS and GCP, manage Kubernetes at scale, implement highly-automated CI/CD workflows, and collaborate with engineering teams to ensure reliable delivery of SaaS features and AI-driven products. What makes this role unique: • Real ownership and autonomy - you'll be a key technical decision-maker, • Work directly with leadership on platform strategy, • Hands-on with cutting-edge cloud-native and AI/ML workloads Location: Fully remote (Spain-based) 🛠 Key Responsibilities Infrastructure & Cloud • Design, build, and maintain multi-cloud infrastructure on AWS and GCP, • Operate and optimize Kubernetes clusters (GKE, EKS) at scale (up to ~1K nodes), • Lead infrastructure modernization and cloud migration initiatives, • Manage Argo Workflows and ArgoCD for GitOps-based deployments, • Build and maintain end-to-end Infrastructure as Code with Terraform (modularized, reusable, multi-cloud), • Develop internal automation tooling and scripts (Python, Bash, Go), • Deploy and manage production MongoDB, ElasticSearch, and other core services, • Package and deploy workloads using Helm, Docker, and GitOps pipelines, • Ensure 99%+ uptime SLA through robust monitoring and incident response, • Build comprehensive observability across all platform components, • Implement security best practices and compliance requirements, • Drive post-incident reviews and continuous improvement ✅ Requirements Must Have • 5+ years as a DevOps, SRE, or Platform Engineer in production environments, • Strong hands-on Kubernetes experience (GKE and/or EKS) managing clusters at scale, • Expert-level Terraform and Infrastructure as Code workflows, • Multi-cloud experience with both AWS and GCP, • Proven experience with CI/CD, GitOps, ArgoCD, Argo Workflows, • Solid Docker and Helm expertise for containerized deployments, • Strong scripting/programming skills in Python and Bash, • Experience running production-grade, scalable, and secure cloud systems, • Programming for tooling development (Python, bash, Go, …), • Experience with observability stacks (Prometheus, Grafana, Elastic, OpenTelemetry), • Hands-on with AI/ML workloads in containerized environments, • MongoDB and ElasticSearch operations at scale, • Experience with cost optimization strategies in cloud environments, • Contributions to open-source DevOps/platform projects Compensation & Benefits • Competitive salary package, • Fully remote work with flexible hours, • Real ownership - your decisions shape the platform's future, • Work directly with leadership on technical strategy, • Continuous learning with modern cloud-native, DevOps, and AI tooling, • Opportunity to mentor and grow the team as we scale, • Engineering-driven culture that values automation and best practices, • Async-first communication (we respect work-life balance), • Blameless post-mortems and learning from incidents, • Initial call (30 min), • Technical interview (60 min) 🚀 How to Apply Apply here or send your CV and a brief note about what excites you about this role to