Skip to main content

Infrastructure admin for AI services

Aliso Viejo, CA
Permanent

Posted

Job title: Infrastructure admin for AI services (Azure & AWS)
location: Remote
$50/hr


Key Responsibilities
  • Design, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and Azure
  • Manage compute resources such as EC2, Azure Virtual Machines, GPU instances, EKS, VPC, ECS, S3, Lambda, Route 53, and Kubernetes clusters
  • Provision and configure storage, networking, and security services for AI platforms
  • Ensure high availability, scalability, and reliability of AI environments
  • Deploy and maintain AI/ML services such as Amazon SageMaker, Azure Microsoft Foundry, and Azure Machine Learning
  • Support data scientists and ML engineers by providing optimized infrastructure for model training and deployment
  • Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, ARM templates / Bicep, and Docker Files
  • Automate and set up environment provisioning, patching, and scaling
  • Deploy and manage containerized AI workloads using Docker, Kubernetes, Amazon EKS, Azure Kubernetes Service (AKS), and ECS
  • Monitor system health, performance, and resource utilization using CloudWatch, Azure Monitor, Datadog / Prometheus
  • Optimize infrastructure for cost, performance, and GPU utilization
  • Implement cloud security best practices including IAM / RBAC management, network security groups, encryption, and secrets management
  • Ensure compliance with organizational and regulatory standards
  • Integrate AI infrastructure with CI/CD pipelines
  • Support automated deployment of models and AI services
Required Qualifications
  • Bachelor s degree in Computer Science, Information Systems, or related field
  • 5+ years experience in infrastructure administration or cloud engineering
  • Strong hands-on experience with AWS cloud services and Microsoft Azure cloud services
  • Experience supporting AI/ML infrastructure or data platforms
  • Proficiency with Linux administration and scripting (Python, Bash, PowerShell, Terraform, Terragrunt)
  • Experience with Docker and Kubernetes
  • Experience with GitHub Actions
  • Experience with LLM infrastructure set up
  • Experience working in a centralized team with triaging capabilities

Job Type: Permanent

Job ID: 254737906