DevOps & Site Reliability Lead-Retail Devops
Job Title: DevOps & Site Reliability Lead-Retail Devops
Duration- Fulltime Permanent
Location: Deerfield IL, 60015 (Onsite from Day1)
Job Description:
Must Have Technical/Functional Skills
Cloud & Platform Engineering (Expert Level)
o Containers & Orchestration (AKS, Docker)
o Networking (VNETs, Private Endpoints, Application Gateway, Load Balancers)
o Storage, Azure Key Vault, Azure Monitor, Log Analytics
o Error budgets and reliability metrics
Duration- Fulltime Permanent
Location: Deerfield IL, 60015 (Onsite from Day1)
Job Description:
Must Have Technical/Functional Skills
Cloud & Platform Engineering (Expert Level)
- Deep expertise in Microsoft Azure, including:
o Containers & Orchestration (AKS, Docker)
o Networking (VNETs, Private Endpoints, Application Gateway, Load Balancers)
o Storage, Azure Key Vault, Azure Monitor, Log Analytics
- Proven experience designing enterprise-grade, highly available cloud platforms
- Strong understanding of hybrid and multi-cloud architectures (AWS / GCP exposure preferred)
- Advanced experience with Azure DevOps and CI/CD pipeline architecture
- Infrastructure automation using Terraform (modules, state management, governance)
- Strong scripting skills (PowerShell, Bash)
- GitOps concepts, branching strategies, release orchestration
- Site Reliability Engineering (Leadership Level) Ownership of platform reliability, resiliency, and performance Definition and governance of:
o Error budgets and reliability metrics
- Advanced observability strategy, designing and implementation:
- Incident response leadership, RCA facilitation, and long-term remediation planning Experience operating 99.9% 99.99% availability systems
- Leadership-level experience with AKS-based platforms, ingress, and scaling strategies
- Understanding of microservices, API-led and event-driven architectures
- Familiarity with Azure Integration Services (Service Bus, Event Hub, API Management)
- Secure cloud design using Key Vault, managed identities, RBAC
- Cost optimization (FinOps mindset) across cloud infrastructure
- Act as Lead SRE for client's Retail platforms, owning reliability and stability outcomes
- Define and enforce SRE standards, best practices, and operating models
- Architect and govern highly available, scalable cloud platforms
- Lead the design and implementation of CI/CD and IaC strategies
- Establish proactive monitoring, alerting, and incident prevention mechanisms
- Own major incident leadership, RCA execution, and corrective action tracking
- Partner with application, security, and architecture teams to build reliability by design
- Drive automation to reduce toil and improve operational efficiency
- Mentor and coach SRE and DevOps engineers across teams
- Influence roadmap decisions with a reliability, scalability, and cost lens
