Director of IT Infrastructure and Engineering
1 day ago
Charlotte
Job Description Director of Platform Engineering & Operations Location: Charlotte, NC (Onsite) Reporting to: Chief Technology Officer (CTO) South End Charlotte, Software company. In Office Role Summary The Director of Platform Engineering & Operations is responsible for COMPANY’s entire technology platform—overseeing both customer-facing systems and internal infrastructure to ensure 24x7 availability, security, and scalability across Azure cloud and on-premise environments. This hands-on leadership role balances technical execution and strategic management, including building a high-performing team, driving operational excellence, implementing security controls, and supporting the company’s rapid growth. Key Responsibilities Leadership & Strategy · Build, mentor, and retain a team of 8 engineers across infrastructure/network, DevOps/SRE, and desktop/end-user support, providing technical coaching, career development, and performance management. · Own platform strategy, roadmap, and execution to meet business goals and customer SLAs · Define and track operational KPIs (availability, MTTR, change success rate, incident volume, cloud cost efficiency) and present regular updates to the CTO and executive team. · Take full ownership of platform strategy, roadmap, and execution aligned with business objectives, product needs, and customer SLAs. · Establish operational cadence: incident reviews, change advisory board, service desk metrics, team retrospectives, and continuous improvement culture. Platform Operations & Architecture · Own the design, implementation, and 24x7 operation of COMPANY’s hybrid infrastructure (Azure + on-premise) supporting both production and internal corporate systems. · Ensure high availability, scalability, performance, security, and cost efficiency across all environments. · Hands-on architecture and implementation of cloud infrastructure, networking, identity management (Azure AD/Entra, RBAC), storage, backup, monitoring, and observability. · Drive cloud optimization initiatives: rightsizing, reserved capacity, architectural improvements, and cost governance across Azure workloads. · Define and enforce platform standards for networking, security, identity, logging, alerting, and operational discipline. DevOps & Site Reliability Engineering · Lead DevOps and SRE transformation: implement CI/CD pipelines, Infrastructure as Code (Terraform, ARM/Bicep), containerization (Kubernetes), and modern deployment practices · Hands-on implementation of Kubernetes clusters, container orchestration, service mesh, and cloud-native architecture patterns · Establish SRE principles: error budgets, SLOs/SLIs, blameless postmortems, observability (metrics/logs/traces), and reliability engineering culture · Build and optimize CI/CD tooling and workflows to improve release velocity, reduce deployment risk, and increase developer productivity · Implement robust change management processes (risk assessment, testing, communication, rollback procedures) that balance speed, safety, and audit readiness Information Security & Compliance · Implement security and compliance controls, including access management, logging and monitoring, vulnerability management, incident response, and audit evidence collection. · Establish security best practices across infrastructure: network segmentation, firewall rules, encryption (data at rest/in transit), secrets management, privileged access management. · Lead incident response for infrastructure and platform issues, including root cause analysis, remediation, and process improvements. · Own Disaster Recovery strategy and execution: define RPO/RTO targets, architect multi-region and hybrid DR solutions, develop runbooks, and conduct regular DR testing · Ensure backup and restore capabilities across all critical systems with documented procedures and validated recovery processes Desktop & End-User Support · Oversee desktop, endpoint, and telecom services (laptops, mobile devices, productivity tools, collaboration platforms, voice/conferencing) to deliver reliable, secure employee experiences · Implement IT service management practices (incident, request, problem, asset management) with clear SLAs and user satisfaction metrics · Manage vendor relationships across infrastructure, telecom, SaaS, and managed services—evaluate contracts, optimize licensing, and ensure service quality Required Qualifications · 10+ years of progressive experience in IT infrastructure and operations, with at least 3–5 years in a leadership role managing teams delivering hybrid cloud environments. · Deep expertise with Microsoft Azure including compute (VMs, App Services, Functions), networking (VNets, NSGs, load balancers), identity (Azure AD/Entra, RBAC), security, monitoring, and cost management. · Proven track record architecting and operating highly available, mission-critical systems supporting 24x7 customer-facing platforms at enterprise scale. · Strong background in security and compliance, with experience implementing controls · Demonstrated leadership of DevOps/SRE teams with hands-on experience building CI/CD pipelines, managing Kubernetes clusters, implementing Infrastructure as Code (Terraform, ARM/Bicep), and operating observability platforms · Solid understanding and ownership of change management processes (ITIL or similar) including change advisory boards, risk assessment, and audit-ready documentation. · Hands-on experience designing and executing Disaster Recovery strategies in cloud and data center environments, including DR testing and runbook development. · Experience overseeing desktop/end-user support and telecom services in a growing, distributed organization. · Proven ability to recruit, develop, and retain high-performing technical teams with a coaching-oriented leadership style · Excellent communication and stakeholder management skills—ability to translate technical complexity into business impact for executive and non-technical audiences · Thrives in fast-paced, dynamic environments with rapidly changing priorities and ambiguity · Strong ownership mentality: you take accountability for outcomes, drive issues to resolution, and lead by example Preferred Skills · Experience in B2B SaaS, telematics, fleet management, IoT, or other real-time, data-intensive platforms serving enterprise customers · Familiarity with ITSM tools (Jira Service Management, ServiceNow), configuration management databases (CMDB), and IT asset management practices · Experience with observability and monitoring platforms (Datadog, New Relic, Prometheus/Grafana, Azure Monitor, Application Insights) · Background supporting real-time GPS tracking, vehicle telematics, or IoT device management platforms · Relevant certifications: Microsoft Certified: Azure Solutions Architect Expert, Azure Administrator Associate, CISSP, CISM, ITIL Foundation or higher · Experience scaling infrastructure to support rapid business growth (2x–3x revenue in 2–3 years) · Prior experience operating in regulated or compliance-driven environments (SOC 2, ISO 27001, HIPAA, FedRAMP) · Hands-on experience with Azure Kubernetes Service (AKS), Azure DevOps, GitHub Actions, or similar CI/CD platforms · Understanding of fleet management industry compliance requirements (FMCSA, ELD mandates, hours-of-service regulations)