Observability and AI Enterprise Architect
hace 1 día
Edison
Role: Observability and AI Enterprise Architect Location: Edison, NJ Fulltime Position Travel: 50% Job Description Client Cloud Unit is looking for an experienced Observability and AI Enterprise Architect to design and implement enterprise-grade observability and AI solutions that provide deep visibility into infrastructure, applications, networks and transform IT Operations. This role requires expertise in leading observability platforms and hands-on experience in IT operations, combined with the ability to integrate AI-driven solutions for IT Operations (AIOps) using cutting-edge technologies such as LLMs, agentic frameworks, and industry-leading platforms like Anthropic, OpenAI, Bedrock, Gemini, and others. • Design and deploy observability frameworks leveraging tools such as Grafana, Dynatrace, Prometheus, ELK, Splunk, etc. Define best practices for monitoring, alerting, and visualization across hybrid and multi-cloud environments., • Develop strategies for monitoring KPIs tied to business outcomes (e.g., sales performance, supply chain efficiency, customer experience)., • Collaborate with business and IT teams to identify key metrics and integrate them into dashboards and alerting systems., • Implement AIOps solutions using industry-leading platforms like OpenAI, AWS Bedrock, Google Gemini, Anthropic, and similar technologies., • Develop predictive analytics and anomaly detection models to proactively identify and resolve operational issues., • Integrate observability tools with ITSM platforms and automation workflows. Enable automated root cause analysis and remediation using AI/ML models., • Provide observability strategies for infrastructure (servers, storage, cloud), applications (microservices, APIs), and networks (LAN/WAN, SD-WAN). Collaborate with DevOps, SRE, and IT operations teams to ensure end-to-end visibility and reliability., • Establish observability standards, KPIs, and SLAs for performance and availability. Ensure compliance with security and regulatory requirements in monitoring solutions., • Develop scalable architecture using LLMs, agentic frameworks, and multi-modal AI technologies., • Build AI-powered analytics platforms for IT operations analysis, anomaly detection, and predictive insights., • Architect and deploy intelligent chatbots for IT support and self-service capabilities., • Integrate AI solutions with existing IT operations tools and workflows., • 10-13 years of relevant experience, • Hands-on experience with Grafana, Dynatrace, and other monitoring platforms., • Practical experience implementing AI-based solutions for anomaly detection, predictive maintenance, and automated remediation. Familiarity with OpenAI, Bedrock, Gemini, Anthropic, or similar AI platforms., • Strong understanding of infrastructure, application architectures, and networking. Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes)., • Proficiency in Python, Bash, or similar scripting languages for automation and integration., • Strong experience with LLMs (OpenAI, Anthropic, Gemini, Bedrock) and agentic AI solutions., • Hands-on experience in designing AI architectures for enterprise IT environments., • Proficiency in Python or similar languages for AI model integration and automation.