Site Reliability Engineer
3 days ago
Decatur
Job Description Position Summary We are seeking a highly experienced Site Reliability Engineer (SRE) to help design, operate, and continuously improve a highly available, secure, and cost-efficient multi-cloud platform. This role is deeply hands-on and execution-focused, with a strong bias toward action, measurable outcomes, and incremental progress. The ideal candidate has deep expertise in AWS and Azure, strong instincts for efficient architecture, and a passion for observability, automation, and cost optimization (FinOps). You will work in a high-compliance environment, partnering closely with engineering, product, and operations teams to ensure platform health, reliability, and scalability. This role also requires leadership through influence—guiding teams, setting technical direction, and raising operational maturity—while remaining directly engaged in delivery. Essential Duties and Responsibilities Reliability & Platform Health • Design, implement, and operate reliable, scalable, and resilient systems across AWS and Azure, • Establish and improve SLOs, SLIs, error budgets, and incident response practices, • Lead root cause analysis and drive corrective actions to prevent recurrence, • Continuously improve platform uptime, performance, and operational maturity Observability & Operational Excellence • Design and maintain best-in-class observability (metrics, logs, traces, alerting), • Ensure actionable alerts with low noise and high signal, • Use data to identify reliability risks, performance bottlenecks, and efficiency opportunities, • Drive incremental improvements using clear goals and measurable outcomes FinOps & Cost Optimization (Top Priority) • Own and drive cloud cost optimization initiatives across AWS and Azure, • Partner with engineering and leadership to align cost with business value, • Implement cost visibility, forecasting, and accountability practices, • Identify architectural and operational improvements that reduce waste without sacrificing reliability or security Security & Compliance • Operate within highly regulated environments with strong security controls, • Support compliance efforts (SOC 2, CJIS preferred, NIST-aligned practices), • Embed reliability and compliance requirements into platform design and operations, • Partner with security teams to ensure secure-by-default systems Automation & AI-Enabled Efficiency • Automate operational workflows using Infrastructure as Code, CI/CD, and tooling, • Leverage AI tools for analysis, incident investigation, cost insights, capacity planning, and operational efficiency, • Continuously seek opportunities to move faster and smarter through automation and intelligent tooling Leadership & Collaboration • Act as a technical leader and trusted partner to engineering teams, • Guide and mentor others on reliability, observability, and cost-efficient design, • Influence architecture and operational decisions through data and collaboration, • Drive initiatives end-to-end with accountability and ownership Required Qualifications • Bachelor's Degree in Computer Science or Engineering, • 5+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering, • Deep hands-on experience operating production systems in AWS and Azure, • Strong background in cloud architecture, reliability engineering, and automation, • Proven experience with observability platforms (metrics, logging, tracing, alerting), • Demonstrated success driving cloud cost optimization / FinOps initiatives, • Experience working in high-compliance environments (SOC 2 required; CJIS a plus), • Strong scripting or programming skills (e.g., Python, Go, Bash, or similar), • Experience with Infrastructure as Code (e.g., Terraform, CloudFormation, ARM/Bicep) Preferred Qualifications • CJIS compliance experience, • Experience supporting SaaS platforms serving public sector or regulated customers, • Exposure to multi-region, high-availability architectures, • Experience implementing or maturing FinOps practices at scale, • Prior experience mentoring or leading cross-functional technical initiatives What Success Looks Like • Platform reliability and performance improve measurably over time, • Cloud costs are visible, controlled, and optimized without compromising outcomes, • Incidents are fewer, better managed, and result in lasting improvements, • Teams adopt better operational practices through your guidance and example, • Incremental progress toward clear goals is made consistently and transparently Why This Role Matters This role sits at the intersection of reliability, cost, security, and speed. You will have the opportunity to materially impact how the platform scales, how efficiently we operate, and how quickly we adapt—using modern cloud practices, strong engineering judgment, and AI-enabled insights. Physical Demands and Work Environment This role requires the employee to maintain a stationary and upright position consistently. Employees must be able to move frequently within an office environment to utilize office machinery and other resources. The employee should be able to communicate information and concepts consistently and effectively for mutual understanding, including conveying precise details during these interactions. For accurate task execution, it is essential that the employee consistently maintains consistent specific vision abilities, especially the capability to discern close-up details within a few feet of the observer. Seldom does this role entail the transportation of items weighing up to 15 pounds to meet various demands. Note This job description in no way states or implies that these are the only duties to be performed by the employee(s) incumbent in this position. Employees will be required to follow any other job-related instructions and to perform any other job-related duties requested by any person authorized to give instructions or assignments. All duties and responsibilities are essential functions and requirements and are subject to possible modification to reasonably accommodate individuals with disabilities. To perform this job successfully, the incumbents will possess the skills, aptitudes, and abilities to perform each duty proficiently. Some requirements may exclude individuals who pose a direct threat or significant risk to the health or safety of themselves or others. The requirements listed in this document are the minimum levels of knowledge, skills, or abilities. This document does not create an employment contract, implied or otherwise, other than an “at-will” relationship. Coreforce is an Equal Opportunity Employer, drug free workplace, and complies with ADA regulations as applicable. The companies in the COREFORCE organization are innovative technology leaders, delivering groundbreaking digital systems tailored for frontline professionals who rely on speed, accuracy, easy-to-access data, and transparency in their work. COREFORCE is an equal-opportunity employer that promotes justice, advances equity, values diversity, and fosters inclusion. COREFORCE is committed to hiring the best talent – regardless of race, creed, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, genetic information, veteran status, or any other characteristic protected by applicable laws, regulations, and ordinances. If you have a disability or special need that requires assistance or accommodation, please email .