Skip to main content

DevOps & Site Reliability Lead-Retail Devops

Deerfield, IL
Permanent
Job Title: DevOps & Site Reliability Lead-Retail Devops
Duration- Fulltime Permanent
Location: Deerfield IL, 60015 (Onsite from Day1)
Job Description:
Must Have Technical/Functional Skills
Cloud & Platform Engineering (Expert Level)
  • Deep expertise in Microsoft Azure, including:
o Compute (VMs, App Services, Azure Container Apps)
o Containers & Orchestration (AKS, Docker)
o Networking (VNETs, Private Endpoints, Application Gateway, Load Balancers)
o Storage, Azure Key Vault, Azure Monitor, Log Analytics
  • Proven experience designing enterprise-grade, highly available cloud platforms
  • Strong understanding of hybrid and multi-cloud architectures (AWS / GCP exposure preferred)
DevOps & Engineering Excellence
  • Advanced experience with Azure DevOps and CI/CD pipeline architecture
  • Infrastructure automation using Terraform (modules, state management, governance)
  • Strong scripting skills (PowerShell, Bash)
  • GitOps concepts, branching strategies, release orchestration
  • Site Reliability Engineering (Leadership Level) Ownership of platform reliability, resiliency, and performance Definition and governance of:
o SLIs, SLOs, SLAs
o Error budgets and reliability metrics
  • Advanced observability strategy, designing and implementation:
o Metrics, logs, traces, alerts, dashboards using Dynatrace
  • Incident response leadership, RCA facilitation, and long-term remediation planning Experience operating 99.9% 99.99% availability systems
Containers, APIs & Integration
  • Leadership-level experience with AKS-based platforms, ingress, and scaling strategies
  • Understanding of microservices, API-led and event-driven architectures
  • Familiarity with Azure Integration Services (Service Bus, Event Hub, API Management)
Security, Compliance & Cost
  • Secure cloud design using Key Vault, managed identities, RBAC
  • Cost optimization (FinOps mindset) across cloud infrastructure
Roles & Responsibilities
  • Act as Lead SRE for client's Retail platforms, owning reliability and stability outcomes
  • Define and enforce SRE standards, best practices, and operating models
  • Architect and govern highly available, scalable cloud platforms
  • Lead the design and implementation of CI/CD and IaC strategies
  • Establish proactive monitoring, alerting, and incident prevention mechanisms
  • Own major incident leadership, RCA execution, and corrective action tracking
  • Partner with application, security, and architecture teams to build reliability by design
  • Drive automation to reduce toil and improve operational efficiency
  • Mentor and coach SRE and DevOps engineers across teams
  • Influence roadmap decisions with a reliability, scalability, and cost lens

Job Type: Permanent

Job ID: 254265818