What you will do
- Day-to-day management of alerts, checking systems, and escalating issues as necessary;
- Be part of a team that provides 24×7 on-call support for critical SaaS events;
- Available in case of emergencies when team members are not available or need help;
- Documentation of issues and remediation steps;
- Proactively create appropriate monitors in the EKS/K8S ecosystem;
- Deploy to EKS/K8s cluster using Terraform and Helm;
- Learn and maintain existing infrastructure running under Docker Swarm;
- Improve existing infrastructure health by implementing checks and scripts to correct known issues;
- Maintenance and development of deployment code;
- Automating tasks that are currently executed manually;
- Implement/integrate new technologies in our Cloud Infrastructure;
- Collaborate with other teams and departments to provide the highest level of support and assistance;
- Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes;
- Work closely on solutions with Support, Customer Success, Migration, and Professional Services; teams to provide the best in class SaaS service to our customers;
- Perform RCA and take necessary corrective actions to prevent the recurrence of issues;
- Create and assign alert-related actions to the appropriate team after the investigation;
- Handle support requests for environment-specific actions;
- Identify and provide automation requirements to improve RCA.
Must haves
- Hands-on AWS Cloud Engineer;
- Working knowledge of EKS/Terraform/Helm;
- Working Experience with Docker and Docker Swarm;
- Good understanding of AWS IAM roles and policies;
- Logging and Monitoring AWS Resources using CloudWatch logs;
- Experience working with Linux environment;
- Proficient in Bash and/or Python scripting;
- A strong understanding of web technologies such as REST APIs;
- Working Experience with monitoring solutions, such as Grafana, and Prometheus;
- Excellent oral and written communication skills;
- Customer-facing communication skills to effectively explain issues and RCAs to them;
- Experience in Product/Application Support for SaaS-based products;
- Understanding of APIs, Databases, Systems Architecture, and Design;
- Designing, implementing, and operating in a DevSecOps;
- Upper-intermediate English Level.
AgileEngine is one of the Inc. 5000 fastest-growing companies in the US and a top-3 ranked dev shop according to Clutch. We create award-winning custom software solutions that help companies across 15+ industries change the lives of millions.
If you like a challenging environment where you’re working with the best and are encouraged to learn and experiment every day, there’s no better place — guaranteed! 🙂
About the project
The benefits of joining us
Professional growth
Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps
Competitive compensation
We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities
A selection of exciting projects
Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands
Flextime
Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.
Your AgileEngine journey starts here
Test task
We will review your CV and send you a test task via email
Intro Call
Our recruitment team will reach you to discuss available opportunities
WFH or a comfy office? Why not both?
International Projects
Technical Interview
You will have an interview with your future team lead