Senior Infrastructure Monitoring Engineer
19 hours ago
New York
Job DescriptionJob Title: Senior Monitoring, Alerting, and Observability Engineer Skill Level: Specialist Level II Work Schedule: 35 hours/week (Mon–Fri) with on-call/after-hours support as needed Location: Primary office location with occasional travel to multiple organizational sites as required Pay rate: $$33-35.50/hr, this is a W2 temporary position expected to last 1 year. Starts May 25. Position Overview The organization is seeking a Senior Monitoring, Alerting, and Observability Engineer to support the stabilization, enhancement, and modernization of its enterprise monitoring and alerting ecosystem. This hands-on technical role is responsible for designing, implementing, and continuously improving end-to-end observability across network, server, virtualization, storage, backup, cloud, and application services. The engineer will play a key role in advancing monitoring capabilities by evolving current tools (including SolarWinds) and evaluating modern observability platforms such as Datadog, Splunk Observability, and Dynatrace. The position emphasizes actionable alerting, operational automation, and AIOps-driven analytics to improve incident response, system reliability, and security posture. This role collaborates closely with infrastructure, security, application teams, and distributed stakeholders to ensure high service availability, operational efficiency, and alignment with institutional standards, including CIS Benchmarks. Key Responsibilities • Support and enhance existing SolarWinds alerting systems, including integrations, upgrades, and feature enablement, • Design and implement a comprehensive observability strategy spanning metrics, logs, traces, and events across on-premises and cloud environments, • Lead evaluation, selection, and implementation of enterprise observability platforms (e.g., Datadog, Splunk, Dynatrace, SolarWinds), • Develop and maintain telemetry standards, tagging conventions, and service dependency mappings, • Implement effective alerting strategies to reduce noise and improve time-to-detect (MTTD) and time-to-resolve (MTTR), • Leverage AIOps capabilities such as anomaly detection, dynamic baselining, and event correlation, • Build automation solutions using scripting (PowerShell, Python) and APIs to streamline operations and enable self-service capabilities, • Integrate monitoring systems with IT service management (ITSM) workflows for ticketing, routing, and escalation, • Establish observability-as-code practices for consistent deployment and governance, • Collaborate with infrastructure and security teams to align monitoring practices with CIS Benchmarks and organizational standards, • Develop dashboards and reporting for operational insights, performance, risk posture, and compliance, • Support incident response, post-incident reviews, and continuous improvement initiatives, • Provide hands-on administration of network management platforms, including enterprise switching and data center environments, • Maintain configuration management practices (backups, version control, drift detection, standardized deployments), • Integrate network telemetry (SNMP, syslog, NetFlow/IPFIX, streaming telemetry) into observability platforms, • Ensure platform reliability through capacity planning, high availability design, and lifecycle management (patching, upgrades, maintenance), • Produce technical documentation, runbooks, and training materials for operational teamsRequired Qualifications, • Proven experience evaluating and implementing enterprise observability platforms (Datadog, Splunk, Dynatrace, SolarWinds, or similar), • Strong understanding of telemetry concepts: logs, metrics, traces, event correlation, and time-series data, • Experience with infrastructure and network monitoring (SNMP, syslog, WMI, APIs, agent-based collection), • Hands-on experience monitoring cloud environments (Azure, AWS, or GCP), • Familiarity with ITSM integration, workflow automation, and incident management processes, • Experience developing technical documentation, standards, and runbooks, • Knowledge of AIOps concepts including anomaly detection and predictive analytics, • Experience with service mapping, dependency analysis, and SLIs/SLOs, • Understanding of centralized logging, data retention, indexing, and cost optimization, • Experience monitoring Windows/Linux systems, virtualization platforms (VMware/Hyper-V), storage, and backup systems, • Knowledge of secure monitoring practices aligned with CIS Benchmarks (RBAC, encryption, secrets management), • Scripting and automation skills (PowerShell, Python; configuration management tools such as Ansible or Terraform preferred), • Strong analytical and problem-solving skills with the ability to identify root causes and implement durable solutions, • Excellent communication skills for both technical and non-technical audiences, • Ability to manage competing priorities and work independently while collaborating across teamsPreferred Qualifications, • Relevant certifications (e.g., ITIL Foundation, Security+, cloud certifications, observability platform certifications), • Experience with enterprise network operations tooling (e.g., authentication systems, certificate lifecycle, configuration automation), • Familiarity with large-scale, distributed IT environments Hardware: Organization-issued laptop, multi-factor authentication devices, console/out-of-band access tools Additional Requirements • Compliance with organizational security policies, including MFA and least-privilege access, • Background check and confidentiality adherence as required, • Provide training and knowledge transfer on monitoring standards, alerting practices, and operational runbooks, • Maintain comprehensive documentation and support structured knowledge transfer, • Occasional travel to organizational sites and participation in on-call rotations#ZR APPLY NOW! Integrated Staffing values a diverse, inclusive workforce and we provide equal employment opportunity for all applicants and employees. All qualified applicants for employment will be considered without regard to an individual’s race, color, sex, gender identity, gender expression, religion, age, national origin or ancestry, citizenship, physical or mental disability, medical condition, family care status, marital status, domestic partner status, sexual orientation, genetic information, military or veteran status, or any other basis protected by federal, state or local laws. Integrated Staffing will reasonably accommodate qualified individuals with disabilities to the extent required by applicable law. Staffing solutions that exceed expectations and build relationships.