Vice President of SRE
il y a 1 mois
Own service reliability frameworks, including SLOs, SLIs, and error budgets, embedding them into engineering culture. Partner with Observability, Infrastructure, and Product teams to deliver 360° visibility across GPU clusters, fabrics, and services.