Observability and AI Enterprise Architect
1 day ago
Edison
Required Skills: • Proficiency in Python, Bash, or similar scripting languages for automation and integration., • Strong experience with LLMs (OpenAI, Anthropic, Gemini, Bedrock) and agentic AI solutions, • Hands-on experience with Grafana, Dynatrace, and other monitoring platforms., • Practical experience implementing AI-based solutions for anomaly detection, predictive maintenance, and automated remediation. Familiarity with OpenAI, Bedrock, Gemini, Anthropic, or similar AI platforms. Our client is looking for an experienced Observability and AI Enterprise Architect to design and implement enterprise-grade observability and AI solutions that provide deep visibility into infrastructure, applications, networks and transform IT Operations. This role requires expertise in leading observability platforms and hands-on experience in IT operations, combined with the ability to integrate AI-driven solutions for IT Operations (AIOps) using cutting-edge technologies such as LLMs, agentic frameworks, and industry-leading platforms like Anthropic, OpenAI, Bedrock, Gemini, and others. • Design and deploy observability frameworks leveraging tools such as Grafana, Dynatrace, Prometheus, ELK, Splunk, etc. Define best practices for monitoring, alerting, and visualization across hybrid and multi-cloud environments., • Develop strategies for monitoring KPIs tied to business outcomes (e.g., sales performance, supply chain efficiency, customer experience)., • Collaborate with business and IT teams to identify key metrics and integrate them into dashboards and alerting systems., • Implement AIOps solutions using industry-leading platforms like OpenAI, AWS Bedrock, Google Gemini, Anthropic, and similar technologies., • Develop predictive analytics and anomaly detection models to proactively identify and resolve operational issues., • Integrate observability tools with ITSM platforms and automation workflows. Enable automated root cause analysis and remediation using AI/ML models., • Provide observability strategies for infrastructure (servers, storage, cloud), applications (microservices, APIs), and networks (LAN/WAN, SD-WAN). Collaborate with DevOps, SRE, and IT operations teams to ensure end-to-end visibility and reliability., • Establish observability standards, KPIs, and SLAs for performance and availability. Ensure compliance with security and regulatory requirements in monitoring solutions., • Develop scalable architecture using LLMs, agentic frameworks, and multi-modal AI technologies., • Build AI-powered analytics platforms for IT operations analysis, anomaly detection, and predictive insights., • Architect and deploy intelligent chatbots for IT support and self-service capabilities., • Integrate AI solutions with existing IT operations tools and workflows., • Implement automated remediation and root cause analysis using AI/ML models. Qualifications: • 10-13 years of relevant experience, • Hands-on experience with Grafana, Dynatrace, and other monitoring platforms., • Practical experience implementing AI-based solutions for anomaly detection, predictive maintenance, and automated remediation. Familiarity with OpenAI, Bedrock, Gemini, Anthropic, or similar AI platforms., • Strong understanding of infrastructure, application architectures, and networking. Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes)., • Proficiency in Python, Bash, or similar scripting languages for automation and integration., • Strong experience with LLMs (OpenAI, Anthropic, Gemini, Bedrock) and agentic AI solutions., • Hands-on experience in designing AI architectures for enterprise IT environments., • Proficiency in Python or similar languages for AI model integration and automation.