What you will do
- Monitor and support production and staging environments in real time, ensuring high availability, performance, and stability;
- Respond to incidents, perform triage and root cause analysis, and contribute to post-incident reviews and remediation efforts;
- Participate in an on-call rotation with defined SLAs;
- Handle ad-hoc and unplanned operational requests from Product, Support, and internal teams;
- Maintain and enhance monitoring, alerting, dashboards, logs, and metrics, and improve observability practices;
- Support CI/CD pipelines, production releases, and GitOps workflows;
- Contribute to automation efforts to reduce operational toil;
- Maintain and improve Kubernetes-based infrastructure and containerized workloads;
- Support Infrastructure as Code practices and ongoing environment improvements.
Must haves
- 2+ years of experience in Site Reliability Engineering, DevOps, or Production Operations;
- Experience with AWS supporting production environments;
- Experience supporting production SaaS applications;
- Strong understanding of CI/CD systems such as GitHub Actions, Jenkins, or CircleCI;
- Experience with GitOps and strong Git fundamentals;
- Experience using GitHub, Jira, and Confluence in collaborative environments;
- Experience with Kubernetes such as EKS or kOps;
- Experience with Docker and containerization;
- Experience with observability tools such as Grafana, Prometheus, Loki, or PagerDuty;
- Experience with scripting languages such as Bash, Python, or Go;
- Experience with Infrastructure as Code such as Terraform or Helm;
- Ability to work within structured operational processes and SLAs;
- Strong written and verbal English communication skills;
- Self-driven with a growth mindset.
Nice to haves
- AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator;
- Experience in multi-tenant SaaS environments;
- Experience working in globally distributed teams;
- Familiarity with ChatOps practices;
- Experience improving monitoring quality and reducing alert fatigue.
We are looking for an SRE Operations Engineer to keep production and staging environments running reliably across a cloud-based SaaS platform. You’ll respond to live incidents, reduce operational toil through automation, and improve observability using Kubernetes, Terraform, Grafana, and AWS. A hands-on role with real ownership across CI/CD pipelines, GitOps workflows, and on-call rotations.
About the role
The benefits of joining us
Professional growth
Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps
Competitive compensation
We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities
A selection of exciting projects
Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands
Flextime
Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.
Your AgileEngine journey starts here
2 min
Tell us about yourself
2 sec
Confirm requirements
30 - 60 min
Pass a short test
5 min
Record a short video
→ Introduce yourself on a video, instead of waiting for an interview
Live interview
Ace the technical interview with our team
→ Schedule a call yourself right away after your video is reviewed
Live interview
Final interview with your team
→ Get to know the team you will be working with
Get an offer
As quick as possible







