Site Reliability Engineer Lead
20 days ago
Conshohocken
Job Description About Us At Finance of America, we help homeowners unlock the joy that comes from realizing the full potential of their retirement. Many people have significant wealth tied up in their homes and want to use it meaningfully in their next chapter. Our unique range of reverse mortgages allow homeowners 55+ to access that wealth while maintaining control over their home and financial future. With options tailored to their unique goals, we provide the financial flexibility they need to move forward with confidence. Finance of America is guided by five values: We are customer obsessed, they are why we exist. We raise the bar. We take extreme ownership. We practice genuine collaboration. And we unleash our excellence. Together we are actualizing our vision to be the most beloved brand for homeowners in their next chapter. To learn more about us, visit Purpose of Role Responsible for establishing and implementing SRE best practices across our cloud-native infrastructure and DevOps CI/CD pipeline to improve the resilience, performance, and scalability of our systems while driving operational excellence and enhancing incident response capabilities. Works closely with Engineering, DevOps, Security, and Compliance teams to develop robust monitoring, alerting, automation, and risk mitigation strategies using a modern toolset spanning AWS, Datadog, JSM (Jira Service Management), Bitbucket, GitHub, and more. Key Responsibilities and Expectations • Establishes and enhances monitoring processes to ensure high system reliability and availability., • Designs and implements end-to-end observability metrics, logs, and traces using Datadog, New Relic, Elastic Search, and native AWS tools., • Develops dashboards and SLO/SLI metrics to monitor system health, application performance, and infrastructure availability., • Identifies and resolves reliability and availability gaps through proactive monitoring and data-driven analysis. Develops and streamlines incident response strategies, including efficient root cause analysis to minimize downtime and prevent recurrence., • Enhances and streamlines incident response workflows leveraging Jira Service Management (JSM)., • Establishes actionable runbooks and lead post-incident reviews to uncover systemic issues and track remediation., • Implements effective alerting strategies to reduce noise and response time., • Implements automation processes and scalability practices to improve system efficiency and agility., • Architects and implements automation for infrastructure management, CI/CD pipelines, and repetitive operational tasks., • Supports evaluation and onboarding of DevOps tools to optimize CI/CD processes., • Improves system scalability through efficient provisioning, deployment, and orchestration tools (Octopus Deploy, TeamCity, Pantheon)., • Enhances system observability through comprehensive metrics collection and analysis to proactively address potential issues., • Ensures systems adhere to compliance requirements and actively manage risks to maintain business continuity., • Assesses and implements new technologies and processes to establish best practices in site reliability engineering., • Defines and rolls out SRE practices, including reliability, and error budgets., • Develops and maintains reliability scorecards, operational health reports, and executive dashboards., • Collaborates with security and compliance teams to ensure systems meet governance, risk, and regulatory requirements., • Integrates monitoring and alerting tools with JSM workflows by closely working with JSM development team, for automated ticketing and escalation paths., • Works with version control and code repositories (Bitbucket, GitHub, Azure DevOps) to track reliability across builds and releases., • Recommends new tools and methodologies to enhance system resilience, performance, and traceability., • Minimum 10 years in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles., • Hands-on experience with the following tools and technologies, or similar SRE tools experience: Observability: Datadog, New Relic, Elasticsearch, AWS CloudWatch; Incident Management: JIRA Service Management, ITSM practices; CI/CD Tools: TeamCity, Octopus Deploy, Bitbucket, GitHub, Azure DevOps; Infrastructure: AWS (EC2, S3, Lambda, ECS, IAM, CloudFormation or Terraform)., • Strong programming/scripting ability in one or more: Python, Bash, PowerShell, Go., • Experience building dashboards, KPIs, and reports for engineering and executive audiences., • Excellent knowledge of SRE frameworks, including SLOs, SLIs, MTTR, error budgets, and fault tolerance., • Strong interpersonal, verbal and written communication, and organizational skills., • Flexibility to adjust to multiple demands, shifting priorities, and rapid change., • Flexibility in scheduling with a willingness to work extra non-standard hours on occasion., • Familiarity with compliance frameworks (SOC 2, ISO 27001, etc.) preferred., • Experience working in environments where the operations and infrastructure behind websites (WebOps) are managed alongside content management platforms (e.g., Pantheon) and a strong focus is placed on site speed, reliability, and user experience preferred., • Computer Science or related technical field. The base salary range for this position is ($140,000 - $175,000) inclusive of all geographical differences in the labor market. The base salary for the position will be determined based on factors such as the candidate’s work location, skills, education, and experience. In addition to those factors, we believe in the importance of pay equity and consider the internal equity of our current team members in determining any final offer. We offer a competitive benefits package including health, dental, vision, life insurance, paid time-off benefits, flexible spending account, 401(k) with employer match, and ESPP. Additional Information The application deadline for this job opportunity is 10/13/2025. The above statements are intended to describe the general nature and level of work being performed by people assigned to this classification. They are not to be construed as an exhaustive list of all responsibilities, duties, and skills required of personnel so classified. Finance of America is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, sex (including pregnancy), sexual orientation, religion, creed, age, national origin, physical or mental disability, gender identity and/or expression, marital status, veteran status or other characteristics protected by law.