Skip to main content

USA_Developer

Atlanta, GA
Permanent

Posted

Pay Rate Range: $75.71 - 77.94/hr.
GBAM Req ID: (phone number removed)

Job Description:

Experience: Minimum of [10] years of experience in cloud operations engineering or a related field, with a strong focus on Azure cloud infrastructure, Kubernetes, and site reliability engineering.

Technical Skills:
Proficiency in Azure cloud services and tools.
. Enable embedded SRE model across squads
. SRE reviews for all changes goal is to implement 90% or more incident free changes
. Build instrumentation to have visibility and detect any issues from changes
. Preventive problem management Identify opportunities to fix the issues permanently and track it for closure
. Work with business ops members to get issues fixed
. Toil Reduction
. Identify opportunities for automation to preempt or facilitate information flow to business teams
. Demonstrate strong programming skills and thorough knowledge of systems
Strong knowledge of Infrastructure as Code (IaC) tools such as Azure Resource Manager (ARM) templates, Terraform, or Ansible.
Experience with monitoring and logging tools (Azure Monitor, Log Analytics, Application Insights).
Expertise in deploying and managing Kubernetes clusters, particularly within Azure Kubernetes Service (AKS).
Familiarity with containerization tools (Docker). Expertise in incident management and response.
Strong scripting skills (PowerShell, Python, or similar). Soft Skills: Excellent problem-solving and analytical skills.
Strong communication and collaboration abilities.
Ability to work independently and as part of a team.

Roles & Responsibilities
Operational Strategy Development: Design and implement strategies to optimize operational processes and improve system reliability and performance within the Azure cloud environment.
Infrastructure Management: Oversee the management and provisioning of Azure cloud
infrastructure using tools like Azure Resource Manager (ARM) templates, Terraform, or Ansible.
Kubernetes Management: Deploy, manage, and optimize Kubernetes clusters within Azure Kubernetes Service (AKS) to ensure high availability and scalability.
Monitoring and Incident Management: Implement monitoring solutions and establish incident management protocols to ensure high availability and reliability of Azure services and Kubernetes clusters.
Performance Optimization: Analyze system performance and implement improvements to enhance scalability and efficiency in the Azure cloud and Kubernetes environments.
Collaboration: Work closely with development, QA, and operations teams to ensure seamless integration and delivery of software within the Azure and Kubernetes environments.
Security Best Practices: Implement security best practices in operational processes, Azure infrastructure management, and Kubernetes cluster configurations.
Documentation: Create and maintain comprehensive documentation for operational processes, Azure infrastructure configurations, Kubernetes deployments, and incident management procedures.
Training and Mentorship: Provide training and mentorship to team members on Azure operational practices, Kubernetes management, tools, and methodologies.

Generic Managerial Skills, If any N/A

Key Words to search in Resume
Certifications in relevant Azure technologies (e.g., Microsoft Certified: Azure Solutions Architect Expert, Microsoft Certified: Azure DevOps Engineer Expert).
Experience in developing and delivering training programs.
Knowledge of CI/CD tools and practices.

Essential Skills: Site Reliability Engineering (SRE)

Desirable Skills: Azure data factory| Databricks| Azure Synapse Analytics|Python| Sql |Pyspark Databricks

Keyword: ~Azure data factory| Databricks| Azure Synapse Analytics|Python| Sql |Pyspark Databricks~
Skills: Digital : Site Reliability Engineering (SRE)

Experience Required: 8-10 years

Skills: Category Name Required Importance Experience SkillCategoryTest1_MN Digital : Site Reliability Engineering (SRE) Yes 1 7+ years

Job Type: Permanent

Job ID: 254740161