Site Reliability Engineer
6 days ago
SENIOR SITE RELIABILITY ENGINEER
POSITION OVERVIEW
We are seeking an experienced Senior AWS Site Reliability Engineer to join our cross-functional
cloud platform team. Working alongside a diverse group of DevOps and Site Reliability
Engineers, you will combine deep technical expertise in AWS cloud infrastructure with strong
leadership capabilities in incident response and system reliability. In this role, you will be
instrumental in leading incident response, maintaining, optimising and scaling our cloud
infrastructure while ensuring exceptional system reliability and performance.
KEY RESPONSIBILITIES
• Lead incident response from initial detection, real-time mitigation, root cause analysis,
post-mortem documentation (using Incident IO) and implementation of lessons learned,
with a focus on continuous improvement.
• Develop and execute comprehensive incident response strategies to minimise
downtime and business impact
• Participate in a 24/7 on-call rotation to ensure continuous system availability
• Implement and maintain comprehensive observability solutions using Cloudwatch,
DataDog or similar monitoring platforms
• Maintain, improve, and optimise AWS infrastructure using Terraform while ensuring
scalability, reliability, and cost efficiency.
• Continuously assess and enhance AWS infrastructure to optimise performance and cost
effectiveness
• Monitor and optimise serverless technologies including AWS Lambda and API Gateway
for peak performance and cost efficiency
• Monitor and maintain ECS Fargate deployments for containerised applications, ensuring
optimal resource utilisation
• Collect and analyse metrics to identify resource consumption, abnormal behavior, and
potential performance bottlenecks
• Configure and manage alerting, dashboards, and automated monitoring across
distributed systems
• Foster improved collaboration between development and operations teams by
implementing SRE practices
REQUIRED QUALIFICATIONS
• Previous experience in a DevOps or SRE role
• Exceptional written and verbal communication skills
• Proven experience in incident response and 24/7 on-call responsibilities
• Expert-level knowledge of Infrastructure as Code, primarily Terraform (demonstrated
experience with other IaC tools will be highly regarded)
• Expert-level knowledge of AWS compute infrastructure
• Proficiency in automation tools and scripting languagesSENIOR SITE RELIABILITY ENGINEER
• Strong understanding of monitoring, metrics collection, and performance analysis
• Expert knowledge of observability and monitoring platforms such as DataDog, New
Relic, Prometheus, or similar tools
• Experience with log aggregation, APM (Application Performance Monitoring), and
distributed tracing
• Excellent collaboration abilities and capacity to work effectively in cross-functional
teams
• Strong analytical and problem-solving skills
• Demonstrated ability to work autonomously and take ownership
PREFERRED QUALIFICATIONS
• Experience with (highly desirable)
• Background in payments and PCI compliance environments (highly desirable)
• AWS certifications
• Experience with container orchestration and microservices architecture
• Knowledge of security best practices in cloud environments
-
Site Reliability Engineer
2 days ago
Manila, National Capital Region, Philippines CDOps Tech Full time ₱120,000 - ₱180,000 per yearAbout the OpportunityWe are seeking a seasoned and passionate Site Reliability Engineer for a high-impact contract engagement with one of our key clients, a leader in the marketing-tech sector. This is not just a typical SRE role; you will be the foundational expert responsible for spearheading the adoption of SRE culture and practices within the client's...
-
Site Reliability Engineer
2 weeks ago
Manila, National Capital Region, Philippines Russell Tobin Full time ₱120,000 - ₱180,000 per yearWe are seeking a highly skilledSite Reliability Engineering (SRE) Subject Matter Expert (SME)to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for designing, implementing, and evangelizing modern SRE capabilities that improve system reliability, scalability, and efficiency across our...
-
Site Reliability Engineer
3 days ago
Manila, National Capital Region, Philippines Broadridge Full time ₱1,200,000 - ₱2,400,000 per yearAt Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your career, while helping others along the way, come join the Broadridge team.Role OverviewAtBroadridge Trading & Connectivity Solutions, we foster a culture of empowerment, innovation, and collaboration, where...
-
Senior Site Reliability Engineer
2 weeks ago
Manila, National Capital Region, Philippines CDOps Tech Full time ₱2,000,000 - ₱2,500,000 per yearAbout the OpportunityWe are seeking a seasoned and passionate Senior Site Reliability Engineer for a high-impact contract engagement with one of our key clients, a leader in the marketing-tech sector. This is not just a typical SRE role; you will be the foundational expert responsible for spearheading theadoption of SRE culture and practiceswithin the...
-
Site Reliability Engineering Manager
1 week ago
Manila, National Capital Region, Philippines Russell Tobin Full time $60,000 - $120,000 per yearWe are seeking a highly skilledSite Reliability Engineering (SRE) Subject Matter Expert (SME)to lead and advance our observability, performance engineering, reliability, and AIOps practices. The SME will be responsible for designing, implementing, and evangelizing modern SRE capabilities that improve system reliability, scalability, and efficiency across our...
-
Site Reliability Engineer
2 days ago
Manila, National Capital Region, Philippines Broadridge Full time ₱1,200,000 - ₱2,400,000 per yearAt Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your career, while helping others along the way, come join the Broadridge team.Role OverviewWe are seeking a Site Reliability Engineer (Cloud) to lead the design, implementation, and operational support of our...
-
Site Reliability Engineer
3 days ago
Manila, National Capital Region, Philippines QualityKiosk Technologies Full time ₱1,500,000 - ₱2,500,000 per yearExperience:6 to 10 yearsLocation:MakatiAbout QualityKiosk TechnologiesQualityKiosk Technologies is one of the world's largest independent Quality Engineering (QE) providers and digital transformation enablers, helping companies build and manage applications for optimal performance and user experience. Founded in 2000, the company specializes in providing...
-
Cloud Site Reliability Engineer
2 days ago
Manila, National Capital Region, Philippines Tyler Technologies Full time $80,000 - $150,000 per yearDescriptionResponsibilitiesImplement tooling to monitor AWS EKS-based systems focusing on performance, reliability, and scalability.Ensure that architecture and deployment models are sufficient to support SLA commitments and are well prepared for future problems of scale.Leverage cloud technology and platform capabilities to provide operationally sustainable...
-
Senior Site Reliability Engineer
2 days ago
Manila, National Capital Region, Philippines Broadridge Full timeAt Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your career, while helping others along the way, come join the Broadridge team.Role OverviewWe are looking for a seasoned Site Reliability Engineer to design, implement, and maintain scalable, secure, and high-performing...
-
Senior Site Reliability Engineer
2 weeks ago
Manila, National Capital Region, Philippines Broadridge Full time ₱1,200,000 - ₱2,400,000 per yearAt Broadridge, we've built a culture where the highest goal is to empower others to accomplish more. If you're passionate about developing your career, while helping others along the way, come join the Broadridge team.Role OverviewWe are seeking a dynamic Senior Site Reliability Engineer (SRE) to lead the design, implementation, and operational support of...