Skip to main content

Robotic Reliability Systems Engineer

Wilmington, MA
Permanent
TEC Group  Inc.

Posted

Robotic Reliability Systems Engineer

What We Are Looking For

Seeking a Robotic Reliability (Systems) Engineer to drive the reliability, performance, and scalability of our autonomous warehouse platform powered by mobile robots. This is a high-impact, hands-on engineering role focused on solving complex system-level challenges across large-scale robotic fleets deployed at customer sites .

This role sits at the intersection of robotics software, hardware integration, and operational performance. The primary objective is to diagnose, resolve, and prevent system-level issues, ensuring our robotic systems operate reliably and consistently meet customer performance KPIs.
We are looking for a technically strong, data-driven engineer who thrives in complex, real-world environments and can translate ambiguous system behaviors into structured analysis and actionable engineering improvements.

What You'll Be Doing
  • Fleet-Scale System Reliability
    • Identify, triage, and root-cause system-level issues impacting large-scale robotic fleets.
    • Drive improvements in system reliability, availability, and performance across thousands of deployed robots.
    • Define and monitor system performance guardrails tied to customer KPIs (throughput, error rates, recovery time, uptime).
    • Partner with field teams to debug and resolve production issues in live environments.
  • End-to-End Systems Debugging & Integration
    • Work across robotics software, hardware, controls, perception, and infrastructure to diagnose complex system interactions.
    • Debug issues spanning embedded systems, distributed services, real-time control loops, and operational workflows.
    • Collaborate with cross-functional teams to drive fixes and long-term solutions.
    • Contribute to system design improvements that enhance robustness, fault tolerance, and scalability.
  • Data-Driven Performance Optimization
    • Analyze robot logs, telemetry, and diagnostics data to identify failure modes and performance bottlenecks.
    • Build and use tools (SQL, Python, dashboards) to investigate trends and validate hypotheses.
    • Develop mechanisms for regression detection, failure trend analysis, and performance monitoring.
    • Drive continuous improvement through structured experiments and data-backed decisions.
  • Operational Excellence & Continuous Improvement
    • Own reliability metrics and contribute to improving system observability and debuggability.
    • Document failure modes, learnings, and standard operating procedures for issue resolution.
    • Support release validation and help ensure changes meet reliability and performance expectations.
    • Act as a technical escalation point for complex system issues.
What You'll Need
  • 5+ years of experience in robotics, automation, or complex distributed systems engineering.
  • Strong systems engineering mindset with experience in robotics control software, real-time systems, and hardware-software integration.
  • Demonstrated experience in structured root-cause analysis and failure investigation.
  • Proficiency in data analysis and scripting (Python, SQL, or similar).
  • Experience working with logs, telemetry systems, and large-scale operational data.
  • Familiarity with Linux environments and version control systems (Git).
  • Experience working in production environments with deployed systems (not just lab prototypes).
  • Strong problem-solving skills and ability to work across ambiguous, cross-functional system boundaries.
  • Experience in Agile development environments.

Job Type: Permanent

Job ID: 254738902