What you will do
- Monitor production systems and respond to alerts across infrastructure, application, and data layers;
- Perform first-level triage on incidents and support requests; escalate to developers with thorough context and diagnostics;
- Execute patching, operational tasks, and documented runbooks;
- Participate in on-call rotation and support scheduled deployments as needed;
- Conduct post-incident reviews and feed lessons back into runbooks and playbooks;
- Identify recurring issues and systemic risks before they escalate;
- Improve documentation and monitoring coverage between active support activities;
- Contribute to operational reporting and SLA dashboards;
- Manage and track SLA performance across all supported services; surface risks proactively;
- Coordinate with Help Desk / Deskside Support partner for production tasks affecting employees;
- Escalate security incidents and vulnerabilities to the vCISO partner per documented procedures.
Must haves
- 3+ years in production support, SRE, NOC, or operations engineering;
- Hands-on AWS experience with EC2/ECS, networking (VPC, security groups, ACLs), and IAM;
- Operational proficiency with PostgreSQL and / or Amazon RDS;
- Incident triage across infrastructure and application layers;
- Track record managing SLAs in a ticketed support environment such as Jira;
- Strong written communication for escalation and post-incident reporting;
- Upper-intermediate English level.
Nice to haves
- Experience with structured incident response such as ITIL or NIST;
- Familiarity with Datadog, CloudWatch, or comparable observability platforms;
- Exposure to AWS data services including Glue, S3, Athena, and EventBridge;
- Basic IaC familiarity with CloudFormation, SAM, or Terraform;
- Background in financial services or regulated environments;
- AWS certification such as SysOps Administrator or Solutions Architect;
- Experience with scripting/automation to reduce manual toil.
We are looking for a Production Support Engineer to monitor and support production systems across a multi-account AWS environment, serving as the front line of a tiered support model for a fintech platform. You will triage incidents, execute runbooks, manage SLA performance, and coordinate with engineering, help desk, and security partners. The role includes on-call rotation and structured post-incident review with a focus on continuous operational improvement.
About the role
The benefits of joining us
Professional growth
Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps
Competitive compensation
We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities
A selection of exciting projects
Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands
Flextime
Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.
Your AgileEngine journey starts here
2 min
Tell us about yourself
2 sec
Confirm requirements
30 - 60 min
Pass a short test
5 min
Record a short video
→ Introduce yourself on a video, instead of waiting for an interview
Live interview
Ace the technical interview with our team
→ Schedule a call yourself right away after your video is reviewed
Live interview
Final interview with your team
→ Get to know the team you will be working with
Get an offer
As quick as possible
