Senior Software Engineer Agentic Systems Moveworks
Job Description
The Role
We're building the runtime infrastructure that powers Moveworks' AI agents the systems that orchestrate, execute, and deliver agent responses to millions of enterprise users in real time. This is not an ML role. This is a distributed systems engineering role at the heart of the agentic AI wave.
Our AI agents can plan, execute multi-step workflows, call tools, wait on human input, and resume all while maintaining correctness, observability, and low latency. The systems that make this possible are what you'll build and own.
What you get to do in this role:
- Agent orchestration engine A state machine that manages long-running agent sessions, coordinating planning, execution, and user interaction across multiple LLM calls and tool invocations
- Distributed session management Lease-based ownership using DynamoDB conditional writes, heartbeat protocols, and crash recovery via checkpointing
- Event-driven message pipeline SQS FIFO queues for ordered delivery, Kafka consumers for event processing, and real-time streaming via gRPC and Socket.IO
- Structured concurrency Python asyncio TaskGroups running multiple concurrent tasks per session (message polling, lease heartbeats, output publishing, orchestrator execution) with fail-fast semantics and graceful cancellation
- Observability infrastructure OpenTelemetry instrumentation, distributed trace context propagation across async boundaries, custom span lifecycle management for sessions that span minutes
- Caching and state layers Redis, DynamoDB KV stores with per-org/per-bot scoping, batch read optimization, and hot-reload configuration
