AI Systems Engineer

Posted by 1872 Consulting

Boston, MA

Permanent

AI Systems Engineer
Boston, MA Onsite 4 days per week

Role Summary
Join the AI Studio of an innovative construction industry client in Boston as an AI Systems Engineer, a hybrid role responsible for architecting and building both:

The distributed systems backbone that powers enterprise-scale AI, and

The agentic and LLM-driven capabilities transforming construction workflows

This role sits at the intersection of platform engineering and applied AI. You will design scalable APIs, event-driven services, and reliable infrastructure while also implementing multi-model AI agents, retrieval pipelines, and AI orchestration frameworks that operate in real-world production environments.

You will help define how AI is built, deployed, observed, and scaled across the client's national operations.

Responsibilities

AI & Agentic Systems Product Engineering & Deployment

Design and implement production-grade RAG architectures
Build and deploy multi-model AI agents leveraging AWS Bedrock and LLM providers (Claude, GPT, Llama, Titan, etc.)
Implement dynamic model routing strategies based on task complexity, cost, and latency
Develop multi-agent orchestration frameworks enabling collaborative workflows (planner, retriever, executor, summarizer)
Design safe tool invocation patterns and guardrails for enterprise AI agents
Optimize inference pipelines for cost, performance, and reliability
Implement evaluation frameworks to measure model performance, hallucination rates, and response quality
Design fallback and degradation strategies for model outages or latency spikes

Distributed Systems & Platform Architecture

Architect and evolve service-oriented and event-driven systems supporting AI workloads
Design REST/GraphQL APIs with clear versioning, authentication, and backward compatibility strategies
Implement asynchronous processing pipelines using queues, event buses, and workflow orchestration
Ensure reliability through idempotent consumers, retry strategies, circuit breakers, and dead-letter queues
Make informed tradeoffs between relational, NoSQL, and vector storage systems
Build services that are observable, traceable, and production-ready
Define and document architectural standards for AI platform services
Implement LLMOps: cost monitoring, latency optimization, usage analytics, and model versioning
Enforce security, governance, and access standards in line with enterprise policies

Collaboration & Technical Leadership

Work closely with product managers, site AI engineers, and data scientists to iterate rapidly in Agile sprints
Communicate technical progress clearly to non-technical stakeholders; contribute to internal AI playbooks and templates

Qualifications

6+ years of professional software engineering experience (not including vibe coding)
Demonstrated experience designing distributed or service-oriented systems in production
Strong backend engineering skills in Python, and at least one of Java, NodeJS, Rust or Kotlin
Experience building and deploying event-driven architectures (SNS/SQS, Kafka, EventBridge, etc.)
Experience integrating LLMs into production systems (Bedrock, OpenAI, Anthropic, etc.).
Hands-on experience with RAG pipelines, vector databases and building multi-agent AI systems
Deep understanding of:
- Distributed system failure modes
- API lifecycle management
- Concurrency and consistency tradeoffs
- LLM cost, latency, and reliability constraints
- Tuning AI Agents for accuracy and performance