AI Systems Engineer
AI Systems Engineer
Boston, MA Onsite 4 days per week
Role Summary
Join the AI Studio of an innovative construction industry client in Boston as an AI Systems Engineer, a hybrid role responsible for architecting and building both:
You will help define how AI is built, deployed, observed, and scaled across the client's national operations.
Responsibilities
AI & Agentic Systems Product Engineering & Deployment
Qualifications
Boston, MA Onsite 4 days per week
Role Summary
Join the AI Studio of an innovative construction industry client in Boston as an AI Systems Engineer, a hybrid role responsible for architecting and building both:
- The distributed systems backbone that powers enterprise-scale AI, and
- The agentic and LLM-driven capabilities transforming construction workflows
You will help define how AI is built, deployed, observed, and scaled across the client's national operations.
Responsibilities
AI & Agentic Systems Product Engineering & Deployment
- Design and implement production-grade RAG architectures
- Build and deploy multi-model AI agents leveraging AWS Bedrock and LLM providers (Claude, GPT, Llama, Titan, etc.)
- Implement dynamic model routing strategies based on task complexity, cost, and latency
- Develop multi-agent orchestration frameworks enabling collaborative workflows (planner, retriever, executor, summarizer)
- Design safe tool invocation patterns and guardrails for enterprise AI agents
- Optimize inference pipelines for cost, performance, and reliability
- Implement evaluation frameworks to measure model performance, hallucination rates, and response quality
- Design fallback and degradation strategies for model outages or latency spikes
- Architect and evolve service-oriented and event-driven systems supporting AI workloads
- Design REST/GraphQL APIs with clear versioning, authentication, and backward compatibility strategies
- Implement asynchronous processing pipelines using queues, event buses, and workflow orchestration
- Ensure reliability through idempotent consumers, retry strategies, circuit breakers, and dead-letter queues
- Make informed tradeoffs between relational, NoSQL, and vector storage systems
- Build services that are observable, traceable, and production-ready
- Define and document architectural standards for AI platform services
- Implement LLMOps: cost monitoring, latency optimization, usage analytics, and model versioning
- Enforce security, governance, and access standards in line with enterprise policies
- Work closely with product managers, site AI engineers, and data scientists to iterate rapidly in Agile sprints
- Communicate technical progress clearly to non-technical stakeholders; contribute to internal AI playbooks and templates
Qualifications
- 6+ years of professional software engineering experience (not including vibe coding)
- Demonstrated experience designing distributed or service-oriented systems in production
- Strong backend engineering skills in Python, and at least one of Java, NodeJS, Rust or Kotlin
- Experience building and deploying event-driven architectures (SNS/SQS, Kafka, EventBridge, etc.)
- Experience integrating LLMs into production systems (Bedrock, OpenAI, Anthropic, etc.).
- Hands-on experience with RAG pipelines, vector databases and building multi-agent AI systems
- Deep understanding of:
- Distributed system failure modes
- API lifecycle management
- Concurrency and consistency tradeoffs
- LLM cost, latency, and reliability constraints
- Tuning AI Agents for accuracy and performance
- Experience building internal AI platforms or shared infrastructure
- Exposure to large-scale SaaS or mission-critical systems
- Experience designing multi-agent or orchestration frameworks
- Experience with Databricks Lakehouse architecture
- Prior experience in construction, manufacturing, or operational industries
