Vice President of SRE
1 month ago
Own service reliability frameworks, including SLOs, SLIs, and error budgets, embedding them into engineering culture. Partner with Observability, Infrastructure, and Product teams to deliver 360° visibility across GPU clusters, fabrics, and services.