What you will do
- Lead cross-cutting infrastructure projects not tied to app/platform changes, such as domain migrations for pipelines and customer-facing sites, networking redesigns to avoid IP exhaustion, and medium-sized automation initiatives;
- Design and evolve AWS networking and environments to support more dev/test sites, future SaaS infrastructure, and sandboxes where dev teams can experiment safely;
- Define and implement a disaster recovery strategy and secondary region or DR zone to improve resilience and recovery time;
- Implement cost, reliability, observability, and monitoring improvements across services, using metrics and logs to guide optimization;
- Design, maintain, and evolve AWS-based infrastructure, including ECS, RDS/Aurora, Lambda, S3, CloudWatch, CDK, VPCs, subnets, security groups, Route 53, and load balancers;
- Upgrade AWS Aurora Postgres clusters to the latest supported versions, ensuring high availability, data integrity, and minimal downtime;
- Own and improve CI/CD pipelines using GitHub Actions for production deployments, covering containerized services and Lambda-based workloads;
- Manage infrastructure as code using AWS CDK (TypeScript) and other IaC practices to drive automation, consistency, and repeatability;
- Consolidate and optimize shared tooling, utility scripts, and reusable components across multiple repositories;
- Collaborate with engineering and leadership to define the infrastructure roadmap, influence architecture decisions, and promote DevOps culture and best practices.
Must haves
- 6+ years of experience in DevOps / Site Reliability Engineering, including ownership of multi-quarter infrastructure projects or leadership roles;
- Strong expertise with AWS services such as ECS, RDS/Aurora, Lambda, S3, CloudWatch, CDK, and core networking (VPC design, routing, subnets, security groups, NAT, DNS/Route 53, load balancers);
- Proficient with Docker, GitHub Actions, and modern CI/CD patterns for cloud-native applications;
- Deep knowledge of Postgres administration, including upgrades, backups, and performance tuning;
- Strong scripting and automation skills with TypeScript, Python, or Bash;
- Proven ability to architect scalable, secure, and reliable cloud environments, including DR strategies and cost-optimization practices;
- Experience improving observability (metrics, logs, traces, alerting) and using it to guide reliability and cost improvements;
- Excellent communication and collaboration skills, with a track record of working closely with engineers and stakeholders to execute infra roadmaps;
- Self-driven, practical, and detail-oriented, comfortable making decisions, documenting trade-offs, and delivering high-quality results with limited supervision;
- Upper-intermediate English level.
Nice to haves
- Familiarity with AI/ML workflows or cloud-based AI services;
- Experience with AWS Bedrock or similar generative AI platforms;
- Exposure to Cursor or other modern AI-enhanced developer tools;
- Understanding of security and scaling best practices for distributed environments;
- Experience with monitoring and observability tools (Datadog, Prometheus, CloudWatch, etc.).
AgileEngine is one of the Inc. 5000 fastest-growing companies in the US and a top-3 ranked dev shop according to Clutch. We create award-winning custom software solutions that help companies across 15+ industries change the lives of millions.
If you like a challenging environment where youβre working with the best and are encouraged to learn and experiment every day, thereβs no better place β guaranteed! π
About the project
The benefits of joining us
Professional growth
Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps
Competitive compensation
We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities
A selection of exciting projects
Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands
Flextime
Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office β whatever makes you the happiest and most productive.
Your AgileEngine journey starts here
2 min
Tell us about yourself
2 sec
Confirm requirements
30 - 60 min
Pass a short test
5 min
Record a short video
β Introduce yourself on a video, instead of waiting for an interview
Live interview
Ace the technical interview with our team
β Schedule a call yourself right away after your video is reviewed
Live interview
Final interview with your team
β Get to know the team you will be working with
Get an offer
As quick as possible







