Dev Ops Engineer
Dev Ops Engineer
Pay Rate: $88.00 - $95.00
Experience Level: Senior (5+ years)
Job Summary
An innovative organization is looking for an experienced Resiliency Engineer to support critical infrastructure recovery and disaster recovery initiatives focused on improving platform resiliency, availability, and automation. This role will assess core infrastructure environments and develop scalable resiliency solutions aligned to Recovery Time Objectives (RTOs) and enterprise cloud standards.
The ideal candidate has strong experience in cyber security, recovery testing, infrastructure resiliency engineering (IRE), and Infrastructure as Code (IaC), with hands-on expertise in Python, Ansible, and Terraform. Responsibilities include designing automation frameworks and resiliency blueprints, identifying and mitigating infrastructure risks, supporting cloud and AI initiatives, and ensuring standards compliance through implementation and drift management processes.
This position requires a proactive engineer who thrives in complex environments and is passionate about building highly available, secure, and automated infrastructure solutions.
Project Overview
Assess our core infrastructure and develop automation and blueprints for resiliency solutions that will allows for us to address our agreed RTOs during DR scenarios and in the event of an actual DR.
Contractor's Role
The ideal Resiliency Engineer candidate will develop resiliency strategies and best practices for our products and services. Introducing automation or IaC whenever possible to maintain a consistent experience. Ensure the highest levels of resiliency and availability of our infrastructure platforms. Analyze and identify areas of potential risk and design solutions to mitigate those risks. Develop blueprints and align with our cloud journey and review boards to ensure standards are being met prior to implementation in the cloud - as well as AI development and Ansible and Drift Management.
Qualifications (must haves)
You must have a strong background in Windows, Linux, and Cloud Technologies, as well as knowledge of resiliency best practices and tools AI development and Ansible and Drift Management
Nice to have (optional)
Ideal candidates have a mix of technical and business skills, and a passion for problem solving. Have working experience with IaaS, SaaS, and PaaS.
Daily Tasks & Responsibilities
> Monitor performance of our products and services and take action as necessary to maintain resiliency and availability.
> Research and evaluate new technologies and tools to enhance resiliency and reliability.
> Collaborate with technical and business teams to ensure requirements are met and to identify areas of improvement.
> Monitor industry trends and best practices in resiliency and recommend changes as needed.
> Monitor incident trends and provide options to enhance our resiliency options to prevent further occurrences/outages.
> Document and maintain processes, procedures, and standards related to resiliency and site reliability.
