Senior DevOps Engineer Site Reliability Engineer
A leading B2B SaaS platform in the cross-border e-commerce sector, is expanding its North America operations. We're seeking a Senior DevOps Engineer / Site Reliability Engineer (SRE) to architect and maintain our unified global O&M (operations and maintenance) platform.
This is a newly created role supporting our North America team's contribution. You'll work directly with our Middle Platform Director, Technical Experts, and CEO in a collaborative, remote-first environment, Can be located anywhere in the US.
KEY RESPONSIBILITIES:
Design, develop, and maintain unified operation and platform management systems covering resource management, monitoring & alerting, configuration management, and automated operation & maintenance
Build and operate observability platforms and CI/CD pipelines; develop self-healing systems and automated incident response processes to realize intelligent O&M
Establish DevOps standards and best practices; promote standardization of DevOps toolchains (technology selection, version management)
Provide platform-level technical support for product and engineering teams; resolve complex system issues, reduce technical debt, and lead infrastructure and architecture upgrades
Promote SRE concepts and engineering practices; organize technical sharing and training; build a reliability engineering system
Conduct technical research and innovation; track cloud-native/DevOps industry trends; evaluate new technologies and drive continuous modernization of O&M platforms
REQUIRED QUALIFICATIONS:
Currently residing in California or North Carolina, USA
US Green Card or US Citizenship (work authorization; no sponsorship available)
Fluent in Mandarin Chinese (working language; close collaboration with domestic R&D required)
Bachelor's degree or above in Computer Science or related field
4-6 years of hands-on experience in DevOps/SRE/Platform Engineering
Proficient in at least one major cloud platform (AWS/Azure/GCP) with deep understanding of VPC, EC2, EKS/K8s, RDS, IAM
Proficient in Linux, networking, containers (Docker/Kubernetes), load balancing, and service governance
Skilled in IaC (Infrastructure as Code) tools: Terraform, Ansible, Helm
Experience building CI/CD pipelines: Jenkins, Argo CD, CodeBuild, etc.
Familiar with monitoring/logging/tracing: Prometheus, Grafana, ELK, OpenTelemetry
Proficient in at least one development/scripting language: Python, Shell, Go
Excellent system design, analysis, and troubleshooting skills
Strong cross-team communication and collaboration abilities
PREFERRED QUALIFICATIONS:
Master's degree in Computer Science or related field
Experience with global platforms, cross-border SRE, multi-cloud O&M
Led platform reconstruction, self-healing systems, or observability initiatives
Go development, service mesh, chaos engineering, capacity planning experience
Demonstrated success improving system availability, reducing incident rates, increasing automation
Global technical vision and cross-cultural collaboration experience
Result-oriented, self-driven, experienced in technical evangelism/sharing
COMPENSATION:
Base Salary: $140,000 - $160,000 annually (top candidates may receive 5-10% upward adjustment)
401(k): Dollar-for-dollar match, up to 4% of salary
Medical Insurance
PTO: 12 days annually
Social Security & Housing Fund: Contributed per US legal requirements
WORK ENVIRONMENT:
Location: Silicon Valley, CA OR Raleigh, NC (homebase available)
Department: Tech O&M Department
Working Style: Remote-first
Hours: 8 hours per day, weekends off
Travel: No business travel required
Expected Start: ASAP
Interview Process: Round 1 (Online): Middle Platform Director + Technical Expert | Round 2 (Online): Head of HR | Round 3 (Online): CEO/Founder
