Skip to main content

AI Systems Engineer

Boston, MA
Permanent
AI Systems Engineer
Boston, MA Onsite 4 days per week

Role Summary
Join the AI Studio of an innovative construction industry client in Boston as an AI Systems Engineer, a hybrid role responsible for architecting and building both:
  1. The distributed systems backbone that powers enterprise-scale AI, and
  1. The agentic and LLM-driven capabilities transforming construction workflows
This role sits at the intersection of platform engineering and applied AI. You will design scalable APIs, event-driven services, and reliable infrastructure while also implementing multi-model AI agents, retrieval pipelines, and AI orchestration frameworks that operate in real-world production environments.

You will help define how AI is built, deployed, observed, and scaled across the client's national operations.

Responsibilities

AI & Agentic Systems Product Engineering & Deployment
  • Design and implement production-grade RAG architectures
  • Build and deploy multi-model AI agents leveraging AWS Bedrock and LLM providers (Claude, GPT, Llama, Titan, etc.)
  • Implement dynamic model routing strategies based on task complexity, cost, and latency
  • Develop multi-agent orchestration frameworks enabling collaborative workflows (planner, retriever, executor, summarizer)
  • Design safe tool invocation patterns and guardrails for enterprise AI agents
  • Optimize inference pipelines for cost, performance, and reliability
  • Implement evaluation frameworks to measure model performance, hallucination rates, and response quality
  • Design fallback and degradation strategies for model outages or latency spikes
Distributed Systems & Platform Architecture
  • Architect and evolve service-oriented and event-driven systems supporting AI workloads
  • Design REST/GraphQL APIs with clear versioning, authentication, and backward compatibility strategies
  • Implement asynchronous processing pipelines using queues, event buses, and workflow orchestration
  • Ensure reliability through idempotent consumers, retry strategies, circuit breakers, and dead-letter queues
  • Make informed tradeoffs between relational, NoSQL, and vector storage systems
  • Build services that are observable, traceable, and production-ready
  • Define and document architectural standards for AI platform services
  • Implement LLMOps: cost monitoring, latency optimization, usage analytics, and model versioning
  • Enforce security, governance, and access standards in line with enterprise policies
Collaboration & Technical Leadership
  • Work closely with product managers, site AI engineers, and data scientists to iterate rapidly in Agile sprints
  • Communicate technical progress clearly to non-technical stakeholders; contribute to internal AI playbooks and templates

Qualifications
  • 6+ years of professional software engineering experience (not including vibe coding)
  • Demonstrated experience designing distributed or service-oriented systems in production
  • Strong backend engineering skills in Python, and at least one of Java, NodeJS, Rust or Kotlin
  • Experience building and deploying event-driven architectures (SNS/SQS, Kafka, EventBridge, etc.)
  • Experience integrating LLMs into production systems (Bedrock, OpenAI, Anthropic, etc.).
  • Hands-on experience with RAG pipelines, vector databases and building multi-agent AI systems
  • Deep understanding of:
    • Distributed system failure modes
    • API lifecycle management
    • Concurrency and consistency tradeoffs
    • LLM cost, latency, and reliability constraints
    • Tuning AI Agents for accuracy and performance
Preferred
  • Experience building internal AI platforms or shared infrastructure
  • Exposure to large-scale SaaS or mission-critical systems
  • Experience designing multi-agent or orchestration frameworks
  • Experience with Databricks Lakehouse architecture
  • Prior experience in construction, manufacturing, or operational industries

Job Type: Permanent

Job ID: 253343804