Technology and Data Engineer 5 Contingent
Posted
Principal Engineer (SRE/DevOps in leading and hands on capacity)
Location: Irving, TX/Charlotte, NC/Minneapolis, MN
Key Responsibilities
Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution
Design and build advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, and Prometheus
Proactively identify risks through gap analysis, anomaly detection, and predictive alerting, preventing production incidents before they occur
Troubleshoot complex production issues across distributed microservices environments, reducing MTTR through deep technical expertise
Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring solutions
Support applications running on OpenShift and cloud-native platforms, with a focus on reliability and scalability
Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support
Participate in 24x7 on-call rotation, demonstrating urgency and ownership during incidents
Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering practices
Act as a trusted technical leader, able to quickly switch priorities and manage competing demands in a high-pressure environment
What Were Looking For
A genuine, hands-on engineer who can operate across multiple roles (SRE, DevOps, Production Support)
Strong ability to shift priorities quickly and respond with urgency in critical situations
Deep understanding of application support in cloud environments, especially OpenShift
Experience in the financial services industry strongly preferred
Prior development experience is a plus, particularly in Java-based ecosystems
Required Qualifications:
10+ years of Platform and production support
5 years of Redhat Linux, OpenShift, Kubernetes, Java, microservices, Spring Boot, Python experience
5 years of Observability dashboard creation experience - Grafana, Splunk, SPLOC, AppDynamics
5 years of Observability alerts and Incident handling - AIOPS, Service now, Bigpanda etc
4 years of React.js, Apache, Kafka, relational databases experience
4 years of distributed systems, microservices architectures, and cloud native platforms experience
Location: Irving, TX/Charlotte, NC/Minneapolis, MN
Key Responsibilities
Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution
Design and build advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, and Prometheus
Proactively identify risks through gap analysis, anomaly detection, and predictive alerting, preventing production incidents before they occur
Troubleshoot complex production issues across distributed microservices environments, reducing MTTR through deep technical expertise
Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring solutions
Support applications running on OpenShift and cloud-native platforms, with a focus on reliability and scalability
Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support
Participate in 24x7 on-call rotation, demonstrating urgency and ownership during incidents
Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering practices
Act as a trusted technical leader, able to quickly switch priorities and manage competing demands in a high-pressure environment
What Were Looking For
A genuine, hands-on engineer who can operate across multiple roles (SRE, DevOps, Production Support)
Strong ability to shift priorities quickly and respond with urgency in critical situations
Deep understanding of application support in cloud environments, especially OpenShift
Experience in the financial services industry strongly preferred
Prior development experience is a plus, particularly in Java-based ecosystems
Required Qualifications:
10+ years of Platform and production support
5 years of Redhat Linux, OpenShift, Kubernetes, Java, microservices, Spring Boot, Python experience
5 years of Observability dashboard creation experience - Grafana, Splunk, SPLOC, AppDynamics
5 years of Observability alerts and Incident handling - AIOPS, Service now, Bigpanda etc
4 years of React.js, Apache, Kafka, relational databases experience
4 years of distributed systems, microservices architectures, and cloud native platforms experience
