Site Reliability Engineer - Backend Java (Sunnyvale, CA, 94086), (Austin, TX, 78753) | 04/01/2026 Easy Apply Job Code : JPC - 68029 Job Description Role: SRE withJava Backend JC# 156006 & 156007 Location: Sunnyvale, CA & Austin, TX (3 days/week onsite) Full-time Salary: 125-145K (Slightly Negotiable for the SUPERSTAR) Skills Core Java Advanced Java Advanced Java 8 Amazon Web Services (AWS) Amazon Web Services EKS (AWS EKS) Kubernetes DevOps / SRE Key Responsibilities Architect and drive large-scale migrations of business-critical services to AWS and Kubernetes-based platforms Define and implement GitOps-first deployment strategies using ArgoCD, with Spinnaker for advanced delivery workflows Design, build, and operate production-grade AWS EKS platforms at scale Establish best practices for CI/CD, deployment automation, and release strategies (blue/green, canary, progressive delivery) Design and maintain reusable Helm charts and standardized deployment patterns Develop and maintain Python-based tooling and automation for deployment, operations, and reliability Provide deep Linux systems expertise, including performance tuning, debugging, and incident mitigation Own and support production systems, including on-call participation, incident response, and root cause analysis Partner with SRE and Security teams to embed reliability, scalability, and security into platform design Drive architectural reviews, author design documents, and influence long-term platform and migration roadmaps Mentor engineers and raise the bar for DevOps and platform engineering practices Minimum Qualifications 10+ years of experience as a Cloud / DevOps / Platform Engineer supporting production systems Proven experience leading AWS migrations for large, high-traffic, business-critical platforms Strong hands-on expertise with: Linux systems (performance tuning, networking, troubleshooting) Python for automation, tooling, and operational workflows AWS (EKS, VPC, IAM, EC2, ALB/NLB, CloudWatch, S3, RDS) Kubernetes (EKS) in production environments ArgoCD and GitOps deployment models Spinnaker for continuous delivery Helm for application packaging and release management Experience operating and supporting production environments with on-call responsibility Experience with Infrastructure as Code (Terraform and/or CloudFormation) Strong understanding of distributed systems, networking, and cloud security Ability to lead through influence and collaborate across engineering disciplines ================================================W2============================================================================= We have an immediate opportunity for Site Reliability Engineer (SRE) with a strong Java backend foundation to design, build, and operate highly scalable and reliable systems. The ideal candidate will bring deep expertise in Core and Advanced Java (including Java 8), along with strong hands-on experience in AWS and Kubernetes (EKS). In this role, you will architect and lead large-scale migrations of business-critical services to AWS and Kubernetes platforms, while defining GitOps-first deployment strategies using ArgoCD and leveraging Spinnaker for advanced delivery workflows. You will design and operate production-grade EKS platforms, establish CI/CD best practices, and implement modern release strategies such as blue/green, canary, and progressive delivery. Additionally, you will create reusable Helm charts, build Python-based automation for operations and reliability, and apply strong Linux systems knowledge for performance tuning, debugging, and incident resolution. You will take ownership of production systems, including on-call support, incident response, and root cause analysis, ensuring high availability and system resilience. Collaboration is key, as you will work closely with SRE and Security teams to embed reliability, scalability, and security into platform design. The role also involves driving architectural reviews, authoring design documentation, and shaping long-term platform and migration roadmaps. Strong expertise in AWS services (EKS, VPC, IAM, EC2, ALB/NLB, CloudWatch, S3, RDS), Kubernetes in production, Infrastructure as Code (Terraform or CloudFormation), and distributed systems is essential. You will also mentor engineers and elevate DevOps and platform engineering practices while influencing cross-functional teams through strong technical leadership and collaboration. For immediate consideration please contact: Nathan Technical Recruiter PRIMUS Global Services Inc. Direct: 972-798-2669