IT Operations Lead- Incident Management
3 days ago
Buffalo
**NO 3rd Parties or Sponsorship! Role Title- IT Operations Lead- Incident Management Duration- 12+ months Location- Buffalo NY Preferred or remote with travel. Role Description: The IT Operations Lead is responsible for planning and orchestrating releases across environments, safeguarding production stability, and driving continuous improvement through disciplined Incident and Problem Management. This role partners closely with Test Management, Change Management/Comms, Governance, and Service Management to deliver reliable outcomes for strategic portfolios and regulatory commitments. Key Responsibilities Release Management • Own the release calendar, scope, and readiness criteria across dev, test, UAT, and production environments., • Chair Scoping and Go/No-Go Tollgate meetings; ensure controls, sign offs, and rollback plans are in place., • Coordinate deployments with engineering, QA, business SMEs, and Change Management; align with governance and risk requirements., • Maintain deployment runbooks, environment plans, and dependency maps; drive automation and CI/CD best practices., • Track release outcomes (defect leakage, change failure rate, MTTR, deployment frequency) and report to leadership., • Incident Management, • Lead major incident response (P1/P2): mobilize resolvers, manage comms, and restore service quickly., • Operate the incident command process war room facilitation, real-time decisioning, stakeholder updates, and post-restoration verification., • Ensure high-quality incident records, accurate impact/time-to-recover metrics, and effective business communication., • Problem Management, • Drive root cause analysis (RCA) and corrective actions (CA) for recurring issues., • Maintain the Known Error Database (KEDB) and trend analysis; proactively eliminate failure modes and reduce risk., • Partner with engineering and testing to prioritize fix-forward items and embed learnings into release plans., • Governance, Controls & Compliance, • Align releases and service restoration activities with SOX/GLBA, auditability standards, and internal governance., • Ensure adherence to Change Management policies, risk assessments, and production deployment controls., • Provide quarterly control attestations and evidence for audits., • Stakeholder Engagement & Communication, • Serve as the single point of contact for portfolio leaders on release readiness and service stability., • Draft executive-ready communications (pre-release advisories, outage notifications, post-incident reports)., • Build transparent, trust-based relationships with Finance, Operations, PMO, and vendor partners. Requirements: • Bachelor's degree in Information Systems, Computer Science, Engineering, or equivalent experience., • ITIL 4 (Managing Professional or Strategic Leader)., • DevOps certifications (e.g., DASA, DevOps Institute) or SRE training., • Project/program certifications (PMP, SAFe, Scrum) are a plus., • Key Performance Indicators (KPIs), • Change Failure Rate, Change Related Defects and Incidents., • Time-to-Respond, Time-to-Restore, Recurring Incidents., • % of releases with complete controls & evidence; audit findings remediated on time., • Stakeholder satisfaction scores and communication effectiveness., • Tools & Technologies, • ITSM: ServiceNow, JIRA Service Management, • CI/CD: GitHub Actions, • Reporting: PowerBI, • Documentation & collaboration: Microsoft Teams, SharePoint, Confluence, • What Success Looks Like (First 6-12 Months), • A predictable, well-governed release cadence with clear readiness criteria., • Faster restoration through a major incident playbook and trained responders., • Fewer repeat incidents due to actionable RCA/CA and KEDB adoption., • Automation of deployment and evidence capture for audit-ready releases., • Clear, proactive communications that build stakeholder confidence.